You are on page 1of 57

Prepared by Arun Dushing

TABLE OF CONTENTS
FUNDAMENTALS OF VIDEO
Video components
Video Signal
Types of Analog Video Signal
• Component Video
• Composite Video
• S-Video

DIGITAL VIDEO
Analog Video Scanning Process
• Progressive scanning
• Interlaced Scanning
Color Video
Digitizing Video
Digital Video Color Sampling

VIDEO COMPRESSION
Video Compression Requirements
Coding Techniques
• Entropy coding
• Source coding
• Hybrid encoding
Methods for Compression
STEPS IN VIDEO COMPRESSION

AUDIO, VIDEO CODEC’S


SPEECH CODECS
• G.723.1
• G.711
• G.729
• GSM-AMR
VIDEO CODECS
• H.261
• H.263
• H.264
• MPEG-1
• MPEG-2
• MPEG-4
COMPARISON OF THE CODECS
Most Commonly used Video/Audio File Formats

VIDEO-CONFERENCING
Benefits of Videoconferencing
Videoconferencing Protocols
• H.320
• H.323
Videoconferencing Terms
Types of Videoconferencing
• Point-to-point
• Multipoint
• Multicast

VIDEO OVER IP
Data/ Video/ Voice in ONE Net
Video over IP Solution Structure
IP VIDEO TECHNOLOGIES
The ISDN to IP Migration for Videoconferencing
• ISDN-Only Environments
• Converged IP Environments
• IP Overlay Environments
• Hybrid Video Environments

FREQUNTLY ASKED QUESTION


Video over IP
Video components, digital video, pictures and audio, video codec’s, issues and
solutions, video conferencing, multipoint video conferencing, video protocol stack.
Multicasting.

Persistence of Vision
The rapid presentation of frames of video information to give you the illusion of
smooth motion.

Fundamentals of Video
Video is nothing but a sequence of still pictures
• To create an illusion of motion the pictures have to be played at a rate > 24
frames / sec
• A picture is divided into small areas called pixels
• Picture qualities
• Brightness : Overall / average intensity of illumination of the picture and it
determines the background level in the reproduced picture
• Contrast : The difference in the intensity between the dark parts and the bright
parts of the picture
• Detail or Resolution: The detail / Resolution depend on the number of picture
elements. Also known as the definition

• Video components
a. voltage circuit
b. Luminance
c. Color
d. Timing
• Scan rates
a. Video 525 lines interlaced
b. Computer pixels, Lines vs. Pixels
c. How may Pixels are in the frame
• Refresh Rates
a. Traditional Video 15.75 khz = 525 lives x 30 frames per sec
b. Computer graphics 640 x 480 up to 1390 x 1024 up to 110 Khz.
Video Basic
http://www.doom9.org/index.html?/video-basics.htm
http://www.maxim-ic.com/appnotes.cfm/an_pk/734

Basic Principal of Video


http://www.poynton.com/PDFs/TIDV/Basic_principles.pdf

How Television works


http://www.explainthatstuff.com/television.html

Video Technology General Information Page


http://www.epanorama.net/links/videogeneral.html#basics

Video Signal
• A picture has 4 variables, two in the spatial axes, intensity variation and one along
the temporal axis
• An electrical signal can only represent a single variable with time
• Picture is scanned horizontally in lines to produce an electrical signal corresponding
to the brightness level of the pixels along the line
• The vertical resolution of the picture is determined by the number of scanning lines

Types of Analog Video Signal


1. Component Video
2. Composite Video
3. S-Video

Component video: Higher-end video systems make use of three separate video
signals for the red, green, and blue image planes. Each color channel is sent as a
separate video signal.
(a) Most computer systems use Component Video, with separate signals for R, G,
and B signals.
(b) For any color separation scheme, Component Video gives the best color
reproduction since there is no “crosstalk“between the three channels.
(c) This is not the case for S-Video or Composite Video, discussed next. Component
video, however, requires more bandwidth and good synchronization of the
three components.

Composite video:
A composite video signal is a combination Luminance level and the line
synchronization information

Color (“chrominance") and intensity (“luminance") signals are mixed into a single
carrier wave.
a) Chrominance is a composition of two color components (I and Q, or U and V).
b) In NTSC TV, e.g., I and Q are combined into a chroma signal, and a color sub
carrier is then employed to put the chroma signal at the high-frequency end of
the signal shared with the luminance signal.
c) The chrominance and luminance components can be separated at the receiver end
and then the two color components can be further recovered.
d) When connecting to TVs or VCRs, Composite Video uses only one wire and video
color signals are mixed, not sent separately. The audio and sync signals are additions
to this one signal. Since color and intensity are wrapped into the same signal, some
interference between the luminance and chrominance signals is inevitable.

http://en.wikipedia.org/wiki/Composite_video
http://electronics.howstuffworks.com/tv9.htm

S-Video: as a compromise, (Separated video, or Super-video, e.g., in S-VHS) uses


two wires, one for luminance and another for a composite chrominance signal. As a
result, there is less crosstalk between the color information and the crucial gray-
scale information. The reason for placing luminance into its own part of the signal is
that black-and-white information is most crucial for visual perception.
In fact, humans are able to differentiate spatial resolution in grayscale images with a
much higher acuity than for the color part of color images.
As a result, we can send less accurate color information than must be sent for
intensity information | we can only see fairly large blobs of color, so it makes sense
to send less color detail.

Digital Video
Digital Video obtained by:
 Sampling an analog video signal V(t)
 Sampling the 3-D space-time intensity distribution I(x,y,t)

Analog Video Scanning Process

An analog signal f(t) samples a time-varying image. So-called “progressive" scanning


traces through a complete picture (a frame) row-wise for each time interval.
In TV, and in some monitors and multimedia standards as well, another system, called
“interlaced" scanning is used:
a) The odd-numbered lines are traced first, and then the even-numbered lines are traced.
This results in “odd" and “even" fields | two fields make up one frame.
b) In fact, the odd lines (starting from 1) end up at the middle of a line at the end of the
odd field, and the even scan starts at a half-way point.

Video Sampling
Progressive scanning: One full frames every 1/30th of a second.
Interlaced scanning: two separate fields every 1/60th of a second.
(P:1 interlacing)

Progressive scanning

Interlaced Scanning
Because of interlacing, the odd and even lines are displaced in time from each other |
generally not noticeable except when very fast action is taking place on screen, when
blurring may occur.
Scanning and Interlacing
• Even at rates > 24 frames /sec, the user will be able to see a flicker at high
intensity levels
• To avoid flicker, a single frame is displayed in two interlaced fields
• Interlaced video standards
o NTSC – 525 / 60
o PAL – 625 / 50
NTSC (National Television System Committee)
NTSC is the video system or standard used in North America and most of South America.
In NTSC, 30 frames are transmitted each second. Each frame is made up of 525 individual
scan lines.
http://en.wikipedia.org/wiki/NTSC

NTSC (National Television System Committee) TV standard is mostly used in North


America and Japan. It uses the familiar 4:3 aspect ratio (i.e., the ratio of picture width to
its height) and uses 525 scan lines per frame at 30 frames per second (fps).
a) NTSC follows the interlaced scanning system, and each frame is divided into two
fields, with 262.5 lines/field.
b) Thus the horizontal sweep frequency is 525 X 29.97 =15, 734 lines/sec, so that
each line is swept out in 63.6 u second.
c) Since the horizontal retrace takes 10.9 u sec, this leaves 52.7 sec for the active line
signal during which image data is displayed
NTSC video is an analog signal with no fixed horizontal resolution. Therefore one must
decide how many times to sample the signal for display: each sample corresponds to
one pixel output.
A “pixel clock" is used to divide each horizontal line of video into samples. The higher
the frequency of the pixel clock, the more samples per line there are.
Different video formats provide different numbers of samples per line, as listed in
following Table

Format Resolution/Lines
VHS 240
S-VHS 400-425
Betamax 500
Standard 8 mm 300
Hi-8 mm 425
Mini DV 480 (720X480)
DVD 720X480
HD-DVD up to 1920X1080

PAL (Phase Alternating Line)


PAL is the predominant video system or standard mostly used overseas. In PAL, 25 frames
are transmitted each second. Each frame is made up of 625 individual scan lines.
http://en.wikipedia.org/wiki/PAL

PAL (Phase Alternating Line) is a TV standard widely used in Western Europe, China,
India, and many other parts of the world.
PAL uses 625 scan lines per frame, at 25 frames/second, with a 4:3 aspect ratio and
interlaced fields.
(a) PAL uses the YUV color model. It uses an 8 MHz channel and allocates a bandwidth of
5.5 MHz to Y, and 1.8 MHz each to U and V. The color sub carrier frequency is fsc 4:43
MHz.
(b) In order to improve picture quality, chroma signals have alternate signs (e.g., +U and
-U) in successive scan lines, hence the name “Phase Alternating Line".
(c) This facilitates the use of a (line rate) comb filter at the receiver| the signals in
consecutive lines are averaged so as to cancel the chroma signals (that always carry
opposite signs) for separating Y and C and obtaining high quality Y signals.

Digital Levels

Video Level NTSC PAL

White 200 200


Black 70 63
Blank 60 63
Sync 4 4

Color Video

• Color video camera produces RGB output signals


• To maintain the compatibility with the monochrome receiver, the color signals
are converted into Luminance (Y) and Chrominance or Color Difference (R-Y, B-
Y) signals
• Widely used color formats
o YUV
This color space is the rescaled version of color difference signals to be
compatible with analog channel bandwidth
http://en.wikipedia.org/wiki/YUV

o YCbCr
Recommended for Digital TV broadcasting by ITU-BT.601
http://en.wikipedia.org/wiki/YCbCr
http://www.graphicsacademy.com/what_ycbcr.php

NTSC Color Bars

Digitizing Video
 A composite video signal is sampled at a rate 4 – times the fundamental
sampling frequency recommended by ITU ( 4 x 3.375 = 13.5 MHz)
 With the recommended sampling rate the number of samples during the
active line period for both NTSC and PAL will be the same
 The signal is converted into 8 – bit samples using A/D converter
 Color difference signals are sampled at a reduced rate, which is also an
integral multiple of 3.375

http://www.pctechguide.com/45DigitalVideo.htm

Digital Video Color Sampling

The advantages of digital representation for video are many. For example:
(a) Video can be stored on digital devices or in memory, ready to be processed (noise
removal, cut and paste, etc.), and integrated to various multimedia applications;
(b) Direct access is possible, which makes nonlinear video editing achievable as a simple,
rather than a complex task;
(c) Repeated recording does not degrade image quality;
(d) Ease of encryption and better tolerance to channel noise.

Since humans see color with much less spatial resolution than they see black and white, it
makes sense to “decimate" the chrominance signal.
Interesting (but not necessarily informative!) names have arisen to label the different
schemes used.
To begin with, numbers are given stating how many pixel values, per four original pixels,
are actually sent:
(a) The chroma subsampling scheme “4:4:4" indicates that no chroma subsampling is
used: each pixel's Y, Cb and Cr values are transmitted, 4 for each of Y, Cb, Cr.
(b) The scheme \4:2:2" indicates horizontal subsampling of the Cb, Cr signals by a factor
of 2. That is, of four pixels horizontally labeled as 0 to 3, all four Ys are sent, and every
two Cb's and two Cr's are sent, as (Cb0, Y0)(Cr0, Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4), and so
on (or averaging is used).
(c) The scheme “4:1:1" subsamples horizontally by a factor of 4.
(d) The scheme “4:2:0" subsamples in both the horizontal and vertical dimensions by a
factor of 2. Theoretically, an average chroma pixel is positioned between the rows and
columns as shown Fig.5.6.
Scheme 4:2:0 along with other schemes is commonly used in JPEG and MPEG (see later
chapters in Part 2).

Color Sampling
http://www.larryjordan.biz/articles/lj_sampling.html

4:2:2
At the first sample point on a line, Y (luminance), Cr (R-Y), and Cb (B-Y) samples are all
taken; at the second sample point only a Y sample is taken; at the third sample point a Y,
a Cb and a Cr are taken, and this process is repeated throughout the line
4:2:0
At the first sample site in the first line, a Y sample and a Cb sample are taken. At the
second site a Y sample only is taken, while at the third site a Y and a Cb are taken and
this is repeated across the line. Similarly Cr samples are taken in the second line

---------------------------------------------

Video compression
Goal of video compression is to minimize the bit rate in the digital representation of
the video signal while:
– Maintaining required levels of signal quality
– Minimizing the complexity of the codec
– Containing the delay

Video compression is all about reducing the number of bytes by which a video can
be transmitted or stored, without costing much on the quality. It also reduces the
time of transmitting a video over a channel, thanks to the reduced size. Compressed
video can be transmitted more economically over a smaller carrier.
Most networks handle approximately 120 Mbits/s of data. Uncompressed video
normally exceeds a network’s bandwidth capacity, does not get displayed properly,
and requires large amount of disk space for storage purposes. Therefore, it is not
practical to transmit video sequences without using compression.

Most compression methods concentrate on the differences within a frame or between


different frames for minimizing the amount of data required for storing the video
sequence. For differences within a single frame, compression techniques take
advantage of the fact that the human eye is unable to distinguish small differences in
color. In video compression, only the changes between the frames are encoded. By
ignoring redundant pixels, only the changed portion of a video sequence is
compressed, thereby reducing overall file size.

There are well defined standards and protocols describing how the how the
information should be encoded, decoded, and otherwise represented.

Video Compression Requirements


• General requirements
o format independent of frame size and frame/audio data rate
o synchronization of audio and video (and other) data
o compatibility between hardware platforms

• Further requirements for “retrieval mode” systems


o fast-forward and fast-backward searching
o random access to single images and audio frames
o independence of compressed data units for random access and editing
Coding Techniques
Entropy coding
• Lossless coding is a reversible process - perfect recoveries of data -> before and
after are identical in value. Used regardless of media’s specific characteristics. Low
compression ratios.
– Example: Entropy Coding
• Data taken as a simple digital sequence
• Decompression process regenerates data completely
• E.g. run-length coding (RLC), Huffman coding, Arithmetic coding
• Lossless encoding techniques are used in the final stage of video compression to
represent the “remaining samples” with an optimal number of bits
• Run-length coding represents the each row of samples by a sequence of lengths
that describe the successive runs of the same sample value
• Variable Length Coding ( VLC ) assigns a shortest possible bit sequence based on
the probability distribution of the sample values

Source coding
Takes advantage of the nature of the data to generate a one-way relationship
between the original and compressed information. “Lossy” techniques
• Lossy coding is an irreversible process -recovered data is degraded -> the
reconstructed video is numerically not identical to the original. Takes into account
the semantics of the data. Quality is dependent on the compression method the
compression ratio.
– Example: Source Coding
• Degree of compression depends on data content.
• E.g. content prediction technique - DPCM, delta modulation

Hybrid encoding
Uses elements from both Entropy and Source
Most techniques used in multimedia systems are hybrid
– E.g. JPEG, H.263, MPEG-1, MPEG-2, MPEG-4

Methods for Compression


 Intra-Coded Compression
o Pictures encoded in this method are called I – Pictures
o Compression is achieved by removing the redundancy along the special
axes
 Inter-Coded Compression and Prediction
o This method takes the advantage of the similarities between successive
pictures
o The next picture is predicted from a limited number of previous or future
pictures
o Pictures that are predicted in one direction are called P – Pictures and
pictures that are predicted in both directions are known as B – Pictures
http://www.wave-report.com/tutorials/VC.htm
http://www.cs.cf.ac.uk/Dave/Multimedia/node245.html#SECTION04270000000000000000
http://desktopvideo.about.com/od/glossary/g/compressformats.htm

---------------------------------------------

BASIC STEPS IN VIDEO COMPRESSION:


There exists a general sequence of steps by which a video is compressed. Apart from
these basic steps the various standards mentioned above customize their own
procedure of compression.
A video is nothing but a series of image frames. When a motion picture is displayed,
each frame is displayed for a short period of time, usually 1/24th, 1/25th or 1/30th
of a second, followed by the next frame. This creates the illusion of a moving image.

The difference between subsequent frames will be minimal. Video compression uses
this property to reduce the size of a video. A video encoder is the device that does
the compression. An encoder compared consecutive frames, picks out only the
difference and encodes that instead of encoding the entire frame. The compression is
done on Frame by Frame basis.

Below given diagram gives the basic procedure of video compression irrespective of
the standards.
RGB to YUV:
This is the first step in compressing a video sequence. RGB (Red, Green, and Blue)
and YUV (Luminance, Blue Chrome, and Red Chrome) refer to color formats by which
a video can be represented. Each frame will have a particular value for Red, Green
and Blue components. When a camera captures a video, it will be in RGB (Red, Geen,
and Blue) format. But RGB videos require more space for storage than YUV format.
Therefore to make transmission and storage easier, the video sequence is converted
from RGB to YUV. This conversion is done for each frame of the video. The formula
by which the conversion is done is given below.

Y = 0.299R + 0.587G + 0.114B


U = − 0.147R − 0.289G + 0.436B
V = 0.615R − 0.515G − 0.100B

Motion Estimation – Motion Compensation:


Motion Estimation is one of the key elements in video compression. To achieve
compression, the redundancy between adjacent frames can be exploited. That is, a
reference frame is selected, and subsequent frames are predicted from the reference
using Motion estimation.
In, Motion compensation, the current frame is subtracted from the reference
frame to create a residual frame which is then encoded.
In a series of frames, the current frame is predicted from a previous frame known
as reference frame. The current frame is divided into macroblocks, typically 16 x 16
pixels in size. However, motion estimation techniques may choose different block
sizes, and may vary the size of the blocks within a given frame.
Each macroblock of the current frame is compared with the reference frame and the
best matching block is selected. A vector referring to the displacement of the
macroblock in the reference frame with respect to the macroblock in the current
frame is determined. This vector is known as motion vector (MV).
If the comparison of the current frame is done with the previous frame, it is called
backward estimation. If it is done with the next frame, it is called forward estimation.
If it is done based on both previous and next frame, it is called bi-directional
estimation.

Reference Frame Current Frame


(Motion Estimation)

Motion Vectors
(Motion Compensation)
Residual Frame
DCT (Discrete Cosine Transform):
Discrete Cosine Transform involves in converting the frames from time domain to
frequency domain.
A DCT is performed on small blocks (8 pixels by 8 lines) of each component of the
motion compensated frame to produce blocks of DCT coefficients. The magnitude of
each DCT coefficient indicates the contribution of a particular combination of
horizontal and vertical spatial frequencies to the original picture block. The coefficient
corresponding to zero horizontal and vertical frequency is called the DC coefficient.
Quantization:
Quantization is the process of converting analog signal to digital signal. It involves
the approximation of continuous values to discrete integer values. Quantization plays
a important role in data compression. The frame that comes from the discrete cosine
transform is very high in precision. By quantizing the values to approximate integer,
the size of the frame is reduced. Instead of using large numbers, we reduce them to
inexpensive integer values by dividing hem with constant values. But there are
losses associated with the quantization.

Frame after DCT Quantizing constant

Frame after quantization


For example, if we take the first pixel of the matrix,

Inverse Quantization:
Inverse Quantization helps in reconstructing the frame which can be used as a
reference frame for Motion Estimation. The quantized frame is multiplied with the
same quantizing constant value with which it is divided during quantization.

Huffman Coding:
The quantized frame will have discrete values associated with each pixel. Huffman
coding associates each pixel value with a symbol that can be transmitted easily
through a channel. While decompression, the symbols are remapped to their
corresponding values and the frame can be reconstructed.

Once the frame is out of the Huffman Coding phase, the video stream is ready to be
transmitted.
---------------------------------------------

Audio, video codec’s


• Broadcast (high bit rate):
– MPEG-1
– MPEG-2
• Video Conferencing (low bit rate):
– H.261
– H.263
• Interactive (full range of bit rates):
– MPEG-4

SPEECH CODECS
G.723.1
G.723.1 is an optional legacy codec included in the 3rd Generation Partnership
Project (3GPP) recommendation for compatibility with standards such as H.323. A
look-ahead of 7.5 ms duration is also used. Music or tones such as DTMF or fax tones
cannot be transported reliably with this CODEC, and thus some other method such
as G.711 or out-of-band methods should be used to transport these signals. G.723.1
operates at two bit rates of 6.3 kbit/s and 5.3 kbit/s.

G.711
G.711 is an ITU-T standard for audio companding. G.711 is a standard to represent 8
bit compressed pulse code modulation (PCM) samples for signals of voice
frequencies, sampled at the rate of 8000 samples/second and 8 bits per sample.
G.711 encoder will thus create a 64 kbit/s bitstream. This codec is used to transmit
DTMF and fax tones in E1/T1 lines.
There are two main algorithms defined in the standard, mu-law algorithm (used in
North America & Japan) and a-law algorithm (used in Europe and rest of the world)

G.729
G.729 is mostly used in Voice over IP (VoIP) applications for its low bandwidth
requirement. Music or tones such as DTMF or fax tones cannot be transported
reliably with this codec, and thus use G.711 or out-of-band methods to transport
these signals. Also very common is G.729a which is compatible with G.729, but
requires less computation. This lower complexity is not free since speech quality is
marginally worsened. The annex B of G.729 is a silence compression scheme, which
has a Voice Activity Detection (VAD) module (used to detect voice activity, speech or
non speech), Comfort noise generator (CNG), a DTX module which decides on
updating the background noise parameters for non speech (noisy frames) which are
also called as SID frames.
G.729 operates at 8 kbit/s, but there are extensions, which provide also 6.4 kbit/s
and 11.8 kbit/s rates for marginally worse and better speech quality respectively.

GSM-AMR
Under 3G-324M, the adaptive multi-rate (AMR) codec is the mandatory speech
codec. AMR can operate at different rates between 12.2 and 4.75 kbps. It also
supports comfort noise generation (CNG) and a discontinuous transmission (DTX)
mode. It can dynamically adjust its rate and error control, providing the best speech
quality for the current channel conditions. The AMR codec also supports unequal
error detection and protection (UED/UEP). This scheme partitions the bit stream into
classes on the basis of their perceptual relevance. An AMR frame is discarded if
errors are detected in the most perceptually relevant data, otherwise it is decoded
and error concealment is applied.

Since the ability to suppress silence is one of the primary motivations for using
packets to transmit voice, the real time protocol (RTP) header carries both a
sequence number and a timestamp to allow a receiver to distinguish between lost
packets and periods of time when no data was transmitted. Some payload formats
define a "silence insertion descriptor" or "comfort noise" (CN) frame (like G.711
codec which is sample based; i.e. the encodings produce one or more octets per
sample) to specify parameters for artificial noise that may be generated during a
period of silence to approximate the background noise at the source. Some codecs
like G729 (It is a frame based codec because it encodes a fixed-length block of audio
into another block of compressed data, typically also of fixed length) have the silent
frames as a part of the codec frame structure and hence don’t need separate payload
format for the silent frame. When the CN payload format is used with another
payload format, different values in the RTP payload type field distinguish comfort-
noise packets from those of the selected payload format.

The RTP header for the comfort noise packet SHOULD be constructed as if the
comfort noise were an independent codec. Each RTP packet containing comfort noise
MUST contain exactly one CN payload per channel. This is required since the CN
payload has a variable length. The CN packet update rate is left implementation
specific. The CN payload format provides a minimum interoperability specification for
communication of comfort noise parameters. The comfort noise analysis and
synthesis as well as the VAD and DTX algorithms are unspecified and left
implementation-specific.
VIDEO CODECS

H.261
Designed for video phone and video conference over ISDN
• Bit rate: n x 64kbps, n [1, 30]
• QCIF (172x144), CIF (352x288)
• Coding Scheme
– DCT based compression to reduce spatial redundancy (similar to JPEG)
– Block based motion compensation to reduce temporal redundancy

H.263
Designed for low bit rate video applications
• Bit rate: 10 ~ 384kbps
• SQCIF (128x96) ~ 16CIF (1408x1152)
• Coding similar to H.261 but more efficient

H.263 is a video codec designed by the ITU-T as a low-bit rate encoding solution for
videoconferencing. It is a legacy codec that is used by existing H.323 systems and
has been kept for compatibility. It was further enhanced to codec’s such as H.263v2
(a.k.a. H.263+ or H.263 1998) and H.263v3 (a.k.a. H.263++ or H.263 2000).
http://www.h263l.com/

H.264:
This is one of the most advanced standards for video compression. This is based on
the basic compression principles like most standards but has some unique features.
The average bitrate reduction in H.264 is 50% which is higher than any other
standards mentioned above. Video conferencing, Tele-medicine, Satellite telecast are
some of the application that uses H.264.

MPEG-1
Designed for storage/retrieval of VHS quality video on CD-ROM
• Bit rate: ~1.5Mbps
• Similar Coding scheme to H.261 with:
– Random access support
– Fast forward/backward support

Standard used for the compression of moving pictures and audio. This was based on
CD-ROM video applications, and is a popular standard for transmitting video
sequences over the internet. In addition, level 3 of MPEG-1 is the most popular
standard for digital compression of audio--known as MP3. MPEG-1 is designed for
bitrates up to 1.5 Mbit/sec.

MPEG-2
Designed for broadcast quality video storage and transport
• HDTV support
• Bit rate: 2Mbps or higher (CBR/VBR)
• Two system bit streams: Packet Stream & Transport
Stream
• Used for:
– DVD
– DirecTV
– Digital CATV

This standard is mainly used in Digital Television set top boxes and DVD video. It is
based on MPEG-1, but has some special features for digital broadcast television. The
most significant enhancement from MPEG-1 is its ability to efficiently compress
interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, reducing the
need for an MPEG-3. Designed for Videos with bitrate between 1.5 and 15 Mbit/sec.

Video Compression: Deficiencies of existing standards


• Designed for specific usage
– H.263 cannot be stored (no random access)
– MPEG-1 & MPEG-2: not optimized for IP transport
• No universal file format for both local storage and network streaming
• Output cannot be reused efficiently after composition - encoded once, no versatility

Video Compression: Requirements for New Standard


• Efficient coding scheme
– Code once, use and reuse everywhere
– optimized for both local access and network streaming
• Works well in both error prone and error free environment
– Scalable for different bandwidth usage
– Video format can be changed on the fly
– Transparent to underlying transport network
• Support efficient interactivity over network
The solution is : MPEG-4

MPEG-4
• Internet in the future
– Not only text and graphics, but also audio and video
• Fast and versatile interactivity
– Zoom in; zoom out (remote monitoring)
– Fast forward and fast backward (video on demand)
– Change viewing point (online shopping, sports)
– Trigger a series of events (distance learning)
– On the fly composition
– Virtual environments
• Support both low bandwidth connections
(wireless/mobile) and high bit rates (fixed/wire line)

MPEG-4 is a standard used primarily to compress audio and video (AV) digital data.
It is more flexible than H.263 baseline and offers advanced error detection and
correction schemes.

MPEG-4 absorbs many of the features of MPEG-1 and MPEG-2 and other related
standards, adding new features such as (extended) VRML support for 3D rendering,
object-oriented composite files (including audio, video and VRML objects), support
for externally-specified Digital Rights Management and various types of interactivity.
AAC (Advanced Audio Codec) was standardized as an adjunct to MPEG-2 (as Part 7)
before MPEG-4 was issued.

MPEG-4 Standard is predominantly used for multimedia and Web compression.


MPEG-4 involves object-based compression, similar in nature to the Virtual Reality
Modeling Language. Individual objects within a scene are tracked separately and
compressed together to create an MPEG4 file. This leads to a efficient compression
that is very scalable, from low bit rates to very high. It allows developers to access
objects in a scene independently, and therefore introduce interactivity.

Most of the features included in MPEG-4 are left to individual developers to decide
whether to implement them, which is why it is divided into many parts ranging from
part1 to part 22.

http://www.webopedia.com/TERM/M/MPEG.html

http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group

-----------------------------------------

COMPARISON OF THE CODECS


There are frame and stream based codecs, compared for video codecs in Table 3.
The comparison of latency, quality and applications, show that the codec type needs
to be matched to the type of use of the system, taking into consideration the
bandwidth of the system.

In following Table, the speech codecs are compared based on the differences in their
frame duration, frame size, Bit rate and RTP payload type. The RTP payload type is
the number specified in the RFC’s for the respective codec.

Codec Compre Transfor Bit Rate Resolutio Frame Latenc Quality Applicatio
ssion m n Rate y n
MJPEG Frame- DCT 10~300 Any size 0~30 Low Broadca IP
based 0 st networks
Wavelet Frame- Wavelet 30~750 160x120 8~30 High Visually Various
based 0 ~ lossless
320x240
MPEG-4 Stream- DCT and 10 ~ 64x48 ~ 1~ Mediu Internet Wireless
based Wavelet 10000 4096x409 60 m to Digital
6 TV
H.263 Stream- DCT 30~200 128x96 ~ 10 ~ Low Video Teleconfer
based 1408x115 15 phone e-nce
2

Table 3: Comparison of video codecs


SINo: Name of the Frame Frame size Bitrate in RTP payload_type
Codec duration (bytes) Kbps
(mSec)
1 G711 5-10 48 64 0
2 G723 30 24 53.3 4
3 G723 –LO 30 24 53.3 4
4 G723 – HI 30 24 64 4
5 G729 10 10 8 18
6 G729A 10 10 8 18
7 GSMEFR 20 31 13.2 99

Table 4: Comparison of the speech codecs

Many leading major commercial DSP processors from Analog Devices, Motorola, and
Texas Instruments, Freescale, ARM (not exactly a DSP but is the core for may of the
DSP’s) are used in these gateways. The user has to analyze the processor to choose
from based on
• cycle count
• speed
• cost/performance
• energy efficiency
• memory usage
• different call scenarios the gateway will be handling

The codecs may not be ported on each one of them, for determining the above
factors, but some DSP modules like FIR filter, FFT etc can be used to evaluate the
above factors. There are certain benchmarking suites available in the market which
might help the reader in deciding a better processor for the intended gateway.

Most Commonly used Video/Audio File Formats

Extension File Description Extension File Description


3g2 3GPP2 Multimedia File .mpv2 MPEG-2 Video Stream
.3gp 3GPP Multimedia File .mqv Sony Movie Format File
.3gp2 3GPP Multimedia File .msh Visual Communicator Project File
.3gpp 3GPP Media File .mswmm Windows Movie Maker Project
.3mm 3D Movie Maker Movie Project .mts AVCHD Video File
.60d CCTV Video Clip .mvb Multimedia Viewer Book Source File
.aep After Effects Project .mvc Movie Collector Catalog
.ajp CCTV Video File .nsv Nullsoft Streaming Video File
.amv Anime Music Video File .nvc NeroVision Express Project File
.asf Advanced Systems Format File .ogm Ogg Media File
.asx Microsoft ASF Redirector File .par Dedicated Micros DVR Recording
.avb Avid Bin File .pds PowerDirector Project File
.avi Audio Video Interleave File .piv Pivot Stickfigure Animation
.avs AviSynth Script File .playlist CyberLink PowerDVD Playlist
Application Visualization System
.avs File .pmf PSP Movie File
.bik BINK Video File .pro ProPresenter Export File
.bix Kodicom Video File .prproj Premiere Pro Project
.box Kodicom Video .prx Windows Media Profile
.byu Brigham Young University Movie .qt Apple QuickTime Movie
.camrec Camtasia Studio Screen Recording .qtch QuickTime Cache File
.cvc cVideo .qtz Quartz Composer File
.d2v DVD2AVI File .rm Real Media File
.d3v Datel Video File .rmvb RealVideo Variable Bit Rate File
.dat VCD Video File .rp RealPix Clip
.dce DriveCam Video .rts RealPlayer Streaming Media
QuickTime Real-Time Streaming
.dif Digital Interface Format .rts Format
.dir Adobe Director Movie .rum Bink Video Subtitle File
.divx DivX-Encoded Movie File .rv Real Video File
Digital Multimedia Broadcasting
.dmb File .sbk SWiSH Project Backup File
.dpg Nintendo DS Movie File .scm ScreenCam Screen Recording
.dv Digital Video File .scm Super Chain Media File
.dvr-ms Microsoft Digital Video Recording .scn Pinnacle Studio Scene File
.dvx DivX Video File .sfvidcap Sonic Foundry Video Capture File
Protected Macromedia Director
.dxr Movie .smil SMIL Presentation File
.eye Eyemail Video Recording File .smk Smacker Compressed Movie File
.fcp Final Cut Project .smv VideoLink Mail Video
.flc FLIC Animation .spl FutureSplash Animation
.fli FLIC Animation .srt Subtitle File
.flv Flash Video File .ssm Standard Streaming Metafile
.flx FLIC Animation .str PlayStation Video Stream
.gl GRASP Animation .svi Samsung Video File
.grasp GRASP Animation .swf Macromedia Flash Movie
.gvi Google Video File .swi SWiSH Project File
.gvp Google Video Pointer .tda3mt DivX Author Template File
.ifo DVD-Video Disc Information File .tivo TiVo Video File
.imovieproj iMovie Project File .tod JVC Everio Video Capture File
.imovieproject iMovie Project .ts Video Transport Stream File
.ivf Indeo Video Format File .vdo VDOLive Media File
.ivr Internet Video Recording .veg Vegas Video Project
.ivs Internet Streaming Video .vf Vegas Movie Studio Project File
.izz Isadora Media Control Project .vfw Video for Windows
.izzy Isadora Project .vid Generic Video File
.lsf Streaming Media Format .viewlet Qarbon Viewlet
.lsx Streaming Media Shortcut .viv VivoActive Video File
.m1pg iFinish Video Clip .vivo VivoActive Video File
.m1v MPEG-1 Video File .vlab VisionLab Studio Project File
.m21 MPEG-21 File .vob DVD Video Object File
.m2t HDV Video File .vp6 TrueMotion VP6 Video File
.m2ts Blu-ray BDAV Video File .vp7 TrueMotion VP7 Video File
.m2v MPEG-2 Video .vro DVD Video Recording Format
.m4e MPEG-4 Video File .w32 WinCAPs Subtitle File
.m4u MPEG-4 Playlist .wcp WinDVD Creator Project File
.m4v iTunes Video File .wm Windows Media File
.mjp MJPEG Video File .wmd Windows Media Download Package
.mkv Matroska Video File .wmmp Windows Movie Maker Project File
.mod JVC Recorded Video File .wmv Windows Media Video File
.moov Apple QuickTime Movie .wmx Windows Media Redirector
.mov Apple QuickTime Movie .wvx Windows Media Video Redirector
.movie QuickTime Movie File .xvid Xvid-Encoded Video File
.mp21 MPEG-21 Multimedia File .yuv YUV Video File
.mp4 MPEG-4 Video File .zm1 ZSNES Movie #1 File
.mpe MPEG Movie File .zm2 ZSNES Movie #2 File
.mpeg MPEG Video File .zm3 ZSNES Movie #3 File
.mpg MPEG Video File .zmv ZSNES Movie File
Videoconferencing
What Is Videoconferencing?
Conducting a conference between two or more participants at different sites by using
ISDN or computer networks to transmit audio and video data.

Video-conferencing

A video communications session among three or more people who are geographically
separated. This form of conferencing started with room systems where groups of people
met in a room with a wide-angle camera and large monitors to hold a conference with
other groups at remote locations. Federal, state and local governments are making major
investments in group videoconferencing for distance learning and telemedicine.*

Benefits of Videoconferencing
• Interaction with people and classrooms anywhere in the world
• Share and collaboration on data
• Expose students to the latest technology available
• Save time and money involved in travel for meetings
• Distance learning - providing opportunities for learning that would otherwise
be unavailable in all settings

Videoconferencing Protocols
Videoconferencing protocols are based on the standards set by the IEEE*.

• H.323 - Videoconferencing over LAN


• H.320 - Videoconferencing over ISDN

*Institute of Electrical & Electronics Engineers


http://standards.ieee.org

H.320 – A dedicated pipe (mapped circuit) connecting locations. ISDN

Advantage: Always on connections

Disadvantages: Pricey for equipment and the dedicated line. Errors can cause call to
drop.
H.323 - Video over IP. Has the ability to dial by the IP address or alias. Includes the
T1 capabilities for sharing and collaboration. Can be used on both private WANs and
public Internet. It is packet based.

Advantages: More cost effective (higher speeds at lower cost than H.320)
• Ability to integrate into an existing network
• You can connect to an existing H.320 infrastructure
• Has the ability to go over the public Internet.
Disadvantages: Firewalls block video traffic
• Not enough bandwidth on IP network resulting in choppy IP video
• Non-secure transmission of data

Videoconferencing Terms
• MCU
• Gatekeeper
• Gateway
• CODEC

Multipoint Control Unit (MCU): Negotiates multiple clients in a conference format. The
client does scheduling from a GUI interface that allows the client to pick a "virtual"
conference room and decide if the meeting is private or public. The host client can then
invite other participants to join scheduled or impromptu virtual meetings right at the client
desktop. It translates the various protocols (I.e. H.320, H.323, ISDN) into one
videoconference so all can understand, regardless of what protocol they are running.

Gatekeeper: This component of H.323 manages the inbound and outbound bandwidth
from the LAN. The gatekeeper registers clients and coordinates communications with other
gatekeepers. It verifies users’ identities through static IP addressing and allows them to
pass through to the MCU.

There are four features within a Gatekeeper:

• Admission control authorizes clients’ access to the LAN.


• Bandwidth control manages bandwidth for each network segment.
• Client network addresses are translated so participants can dial
network locations with aliases (such as e-mail addresses) instead of IP
addresses.
• Call management monitors H.323 calls, tracks rejected calls,
accounts for use of WAN links and monitors other H.323 components.

Gateway: Gateway is the IP address of YOUR router, not MOREnet’s router.

CODEC: CODEC stands for coder-decoder. It translates signals from analog to digital and
back again.
Types of Videoconferencing
• Point-to-point
• Multipoint
• Multicast

(A) Point-to-point:

Point-to-point
Videoconference between two end points; directly connected to each other by IP or ISDN

Advantages:
 Clearer reception between the two sites
 Less scheduling
 Only the two parties involved in the conference need to schedule
Disadvantages:
• Both sites must be using the same protocol
• Only two sites are allowed

(B) Multipoint:
Multipoint
Three or more end points participating in a conference; accomplished by connecting to an
Multipoint Control Unit (MCU).

Advantages:
• Many sites using differing protocols can be connected in the same
conference.
• Better monitoring of the connections
Disadvantages:
• Slight increase in latency.
• Must be scheduled in advance with a Multipoint Control Unit. (MCU)

(C) Multicast:

Multicast
One-way communications to multiple locations. Like a TV broadcast.
Disadvantages

• No interaction from student.


• “Talking head”
• MOREnet currently does not support

Video over IP
Data/ Video/ Voice in ONE Net

Go to Video over IP
Traditional CCTV System

 Coaxial Cable
 Analog Signal

Problems:
• Hard to remote management and maintenance
• Storing in Video Tape, difficult to manage the video data and maintain the quality
• Analog signal system, hard to integrate with other system

DVR Solution
 Video stored in digital data
 PC-based infrastructure
Problems:
• Stand-alone System, poor in integration
• In Windows DVR, system stability is a problem.
• In Linux or single-chip DVR, service is a key maintain issue
• Hard to manage in large or distributed system
IP Network in Video Surveillance

 Transmitted in IP Network
 Client/ Server-based infrastructure

Benefits:
• Expandable & Integrated Network system
• Suitable for large or distributed system
• Lower total ownership cost
• Capable of remote management and maintenance
• Good flexibility for the system upgrade or re-layout
Client IP
Networ
k

Control
Room

Fiel
d

Video over IP Solution Structure


Video Capture

Resolution by NTSC & PAL

Different TV resolution standard in LINE/FIELD, mostly NTSC is for US, and PAL for Europe

NTSC
720 x 480
704 x 480
640 x 480
352 x 240
176 x 112

PAL
720 x 576
704 x 576
640 x 576
352 x 288
176 x 144

Other likeness
• CCIR601, RS170: much like NTSC
• SECAM: much like PAL

Compression- MJPEG Algorithm

• MJPEG (Motion JPEG): compressed by each still image


• Other likeness: JPEG2000, Wavelet

Compression- MPEG Algorithm

• MPEG: compressed by each moving object


• Other likeness: H.261/ 263, MPEG1, MPEG2, MPEG4
Video Capture IP products

IP Camera:
A camera that the video is transferred directly into IP signal
(= Analog camera + 1-ch Video Server)
Other alias: Network Camera

Video Server:
A device can digitize the analog video signal for IP network transmitting
Other alias: Encoder, IP Codec, Camera Server
Transmission

Transmission Media

Video Transmission needs more bandwidth than Data and Voice. Higher bandwidth can
result in better video performance (FPS and Quality)
Bandwidth Requirement

• Simple calculation
Bandwidth requirement =
Image Size per frame x FPS (Frame per second) x (1 + 3% IP overhead) x (1+30%
margins) x 8bits

For example:
5 Kbytes x 30 FPS x1.03 x 1.3 x 8 bits = 1.6 Mbps

Note: Video record storage space can be also calculated by this formula:
Image Size per frame x FPS (Frame per second) x record time = total storage
space requirement

Network Protocols
Advanced Network protocols can help the video transmission more efficient, such as:
Integrated with Alarm Systems

Video Motion Detection (VMD)


Detect the change of objects in the images to decide triggering alarm or not.
DI/ DO Control

Integrate with DI (Digital Input) sensors and DO (Relay output) Alarms can build an
intelligent video surveillance system.

Integrated with other Systems


In IP Network, all systems can be integrated in 1 system for the centralized control
and interoperation purpose.
View & Record

Multiple Camera Viewing Formats

Capability to view multiple camera’s image in 1 windows, such as


Video over IP Solution (Apllication)

IP Video Technologies—

Video Conference Architecture a Typical H.323 Terminal


Type of H.323 Endpoints

Video Conference Architecture H.323 Components


Video Conference Architecture- Call Signaling and Flow

Video Conference Architecture


Call Signaling and Flow

H.323 Multipoint Videoconference H.323 Multipoint

Vide
oconference
The ISDN to IP Migration for Videoconferencing
Introduction

Since the release of IP-capable videoconferencing solutions in the mid-1990s, the


percentage of video calls hosted over IP networks has continued to grow. As shown in the
left chart below, WR estimates that in 2004 IP became the most common network used
for hosting videoconference calls.

Virtually all video systems today include IP network capability, while only a limited
percentage support ISDN.

For some, the justification for migrating from ISDN to IP for videoconferencing was purely
financial as it allowed companies to enjoy a pay-one-price cost structure for unlimited
videoconferencing usage. For many others, however, it was the soft benefits of running
videoconferencing over IP, such as enhanced reliability and manageability, tighter
security, and an improved user experience that prompted the shift.
This session provides insight into the pros and cons of the four most common network
architectures in use today for videoconferencing:

• ISDN networks – using digital phone lines from telephone companies

• Converged IP - using the enterprise’s data network to host video traffic

• IP Overlay - involves the deployment of a dedicated network to host video


traffic

• Hybrid – utilizing a combination of the above options to meet specific


business challenges

For both new and existing VC users, there are many benefits and reasons for running
videoconferencing traffic over IP. Even if customers won’t save significant costs by
migrating from ISDN to IP, the IP strategy allows enterprise managers to turn
videoconferencing into a manageable enterprise business tool, instead of a technology
gadget or curiosity.

Architecting the Videoconferencing Environment

Modern day videoconferencing environments follow one of four basic network


architectures: ISDN-only, Converged IP, IP Overlay, or some combination of the three
which we call a hybrid environment.

ISDN-Only Environments

The diagram below highlights a traditional videoconferencing environment using only ISDN
service from a local telephone provider. Note that this organization may not be able to
connect to IP-only external endpoints (listed as Client Location below).
Figure 2: Traditional ISDN-Only Videoconferencing Environment

ISDN-Only Advantages

Data Isolation - In an ISDN videoconferencing environment the video traffic does not
touch the organization’s data network, which is a source of comfort for IT and network
managers.
Universal Availability - ISDN service is available almost anywhere in the world (or at
least in most places where phones services are available).
Low Fixed Costs - The fixed monthly cost for ISDN services is relatively low (typically
$150 per month for 384 kbps ISDN connectivity), which makes ISDN cost-effective for
organizations with limited monthly video usage.

ISDN-Only Disadvantages

Endpoint Cost – With today’s videoconferencing systems, ISDN network support is


typically an option costing several thousands of dollars per endpoint.
Endpoint Monitoring – ISDN-only environments typically include a number of legacy,
ISDN-only video systems which do not support the advanced endpoint monitoring features
available on current video endpoints. It is not possible to monitor the health and
“readiness” of these video endpoints.
Network Monitoring – Like the plain old telephone network (POTS), ISDN is a switched
technology in which the network is only connected when calls are in progress. This means
that an ISDN problem, such as a down ISDN line, will not be apparent until a call is
attempted – at which point the likelihood is that the users will be impacted. Even
commercially available video network management systems are not able to detect ISDN
issues while calls unless a call is connected.
Network Efficiency and Scalability – The typical ISDN environment requires that each
endpoint have its own dedicated bandwidth1, which means that even though the ISDN
lines connected to a specific system may only be in use for a few hours each month, that
system’s ISDN bandwidth cannot be shared with other endpoints. Deploying additional
endpoints will require additional ISDN lines.
Usage Costs – In most ISDN environments, every single video call – whether across
town, across the world, or simply between two rooms in the same building – will involve
per-minute ISDN transport and usage fees. Depending upon the frequency of usage, these
fees can be quite high on a monthly basis and can negatively impact the adoption of
videoconferencing within the enterprise.
Global Reach – In order to communicate with IP-only endpoints, such as those deployed
at the partner location shown above, either an ISDN to IP gateway device or an external
gateway service must be used.
Lack of Redundancy – In the event that one or more of an endpoint’s ISDN lines
experiences problems, the endpoints’ ability to communicate will either be blocked or
impacted. There is no alternate network to host the video traffic.
Limited TELCO Support – The decreased demand for ISDN lines for videoconferencing
has prompted telephone companies to reduce their ISDN support staff; a phenomenon
that can significantly impact ISDN troubleshooting and problem resolution efforts.

Converged IP Environments

In a converged IP environment, videoconferencing traffic rides over the organization’s


primary IP data network as shown in the diagram below. Note that unless an ISDN
gateway (or gateway service is used), this enterprise may not be able to connect to ISDN-
only endpoints (labeled Partner Location below).

Figure 3: Converged IP Videoconferencing Environment


Converged IP Advantages

Ability to Leverage Infrastructure – Since the endpoints are connected to the


corporate IP network, the enterprise can to leverage its existing network lines, support
staff, and monitoring / management systems.
Improved Reliability – IP endpoints and network can be monitored continuously
meaning that should a problem arise, the support team will be pro-actively notified, unlike
an ISDN environment in which problems are only discovered once a call is attempted. In
addition, ISDN video calls use multiple lines bonded together to form a single data pipe; a
process that often causes problems during ISDN video calls.
Enhanced Manageability – IP-capable video systems can be remotely managed either
individually or using a centralized management system like TMS, a software solution
available from TANDBERG, the sponsor of this white paper. Management features include
remote call launching and termination, endpoint configuration, software upgrades, and
more. Note that some legacy ISDN-only video systems include IP connections for remote
management, but the management capabilities do not include monitoring of the ISDN
network lines.
Installation Simplicity – By using IP instead of ISDN, organizations can avoid the
headaches often associated with the deployment of ISDN lines including assignment of
SPIDs and the activation of long distance service.
Expanded Scalability – In an IP environment, the deployment of an additional video
system does not require the activation of dedicated lines. Instead, the enterprise simply
needs to connect the video system to the enterprise network. This is especially important
for organizations planning to make desktop videoconferencing capabilities available to
their user base as these deployments typically involve thousands of endpoints.
Decreased Cost of Ownership – IP-only endpoints are less expensive to purchase
(ISDN is now an optional add-on for most endpoints), cheaper to keep under a service
plan (fees are based on purchase price), and do not require dedicated ISDN lines,
resulting in a lower total cost of ownership.
Predictable Usage Fees – While ISDN is a “metered” service with transport fees charged
on a per-minute basis, IP networks typically include unlimited usage for a fixed monthly
fee. This allows enterprise organizations to predict and budget for the monthly costs
associated with videoconferencing.
Call Speed Flexibility – In ISDN environments, the maximum possible connection speed
stems from the number of installed ISDN lines (ex. 3 ISDN lines permit a single call up to
384 kbps). In an IP environment, endpoints are usually connected to high bandwidth
connections either on the LAN or WAN, and therefore higher bandwidth calls are often
possible. This is especially important for multisite meetings during which the host endpoint
may require additional bandwidth to host the meeting.
Tighter Security – Although most IP video endpoints include support for AES data
encryption, including secure password authentication, most legacy ISDN systems do not
support encryption. Because securing ISDN calls on legacy endpoints requires the use of
expensive and complex external encryption systems, these are used primarily in military
and government environments.

Converged IP Disadvantages

Network Capability - Many enterprise networks are not equipped to host video traffic,
and cannot be cost-effectively upgraded to do so in some locations. For example, in one
organization the connections to the Los Angeles and London offices may be “video-ready,”
but those to the Milan and Singapore offices are not up to the task. In an IP-only
environment, the Milan and Singapore offices would be unreachable from the enterprise’s
IP video systems (unless an ISDN gateway product / service or an IP-overlay solution was
used).
Endpoint Capability – Many legacy video systems are not IP-capable and would need to
be replaced or upgraded to function in an IP-only environment.
Global Reach – In order to communicate with ISDN-only endpoints, such as those
deployed at the client location shown above, either an IP to ISDN gateway device or an
external gateway service must be used. In addition, corporate security systems, including
the enterprise firewalls and NAT systems, often block IP traffic between enterprises,
making it impossible to host IP video calls between organizations.
Lack of Redundancy – In the event that the enterprise LAN or WAN experiences
problems, one or more endpoints may be unable to place or receive video calls. Once
again, there is no alternate network to host the video traffic.
Potential Impact on Network – If not properly planned and managed, it is possible that
the videoconferencing traffic could negatively impact the other traffic on the data network.
This risk, however, is easily avoided through the use of a videoconferencing gatekeeper.
IP Overlay Environments

Many organizations are unable to host videoconferencing traffic on all or specific segments
of their primary data network due to limited bandwidth or lack of QoS (quality of service).
To bypass these issues, some organizations choose to replace their ISDN network with a
totally separate IP network dedicated to hosting IP video traffic.
The graphic below highlights an IP overlay environment. Note the use of the IP overlay
network provider’s ISDN and Internet gateways to allow the host organization to connect
to external ISDN and IP endpoints.

Figure 4: Pure IP-Overlay Videoconferencing Environment


IP Overlay Advantages

IP video overlay solutions share many of the advantages of the converged IP solution, plus
several key advantages:
Network Isolation - the IP overlay architecture allows organizations to enjoy the
benefits of IP videoconferencing without impacting the existing data network.
Upgrade Avoidance – the IP overlay method allows an organization to avoid the need for
network capacity and/or performance upgrades in some or all locations.

IP-Overlay Disadvantages

IP video overlay solution disadvantages include the need to purchase additional network
services dedicated to hosting IP video traffic, and the fact that gateways (which never
improve but often detract from the user experience) must be used to conduct calls with
any locations not on the IP overlay network.

Hybrid Video Environments

The fourth videoconferencing architecture involves a combination of two or more of the


ISDN, converged IP, and IP overlay methodologies as shown below.
Figure 5: Hybrid IP / ISDN Videoconferencing Environment

As shown above, in a well-designed hybrid environment, the majority of the enterprise


endpoints have access to both IP and ISDN connections, either directly or using the
enterprise gateway. In addition, the use of a session border controller (labeled SBC in the
diagram above) allows internal IP endpoints to connect to external IP (Internet) endpoints
without compromising enterprise network security.

Hybrid Environment Advantages

This architecture affords many advantages from the three prior methods, plus additional
benefits:
Endpoint Flexibility –The enterprise can utilize a mixture of new (and relatively
inexpensive) IP-capable video endpoints and legacy, ISDN-only endpoints.
Network Redundancy – Since most endpoints have access to IP and ISDN connections,
video connections can be made even if one of the networks (IP or ISDN) is experiencing
problems.
Global Reach – The support for IP and ISDN video traffic throughout the enterprise
makes it easier to host video calls between different organizations.

Hybrid Environment Disadvantages

The most significant disadvantage of this method is the frequent use of gateways
(products or services) to connect to internal and external video endpoints.
Frequently Asked Questions

1. What are the benefits of IP vs. ISDN for business-quality videoconferencing?


The business case for IP vs. ISDN-based videoconferencing spans quality, cost,
management, efficiency, reliability, and scalability areas.
a) ISDN is usually inexpensive to own, but it is expensive to use. Besides an initial
capital outlay to provision select conference rooms with ISDN connectivity, there are
few additional costs required to begin videoconferencing using ISDN. A standard
ISDN business-level videoconferencing call at 384 Kbps requires the bonding
together of 6 ISDN channels; higher call speeds require the bonding of additional
channels. Enterprises pay for ISDN on a per-minute-per-B-channel basis (often
based on distance as well), making the use of the equipment costly. TV quality video
at 768 Kbps on an ISDN system quickly becomes prohibitive in cost. These expensive
ISDN usage fees often prohibit deep adoption of ISDN videoconferencing within an
organization or enterprise.
b) The availability of flat-rate pricing for IP videoconferencing, on the other hand,
allows calls at bandwidths too expensive for ISDN, including some IP calls up to 2
Mbps and beyond. These high bit rate calls enable higher quality audio and video
communications. Because IP is so affordable and due to the pervasiveness of IP
network connections, IP videoconferencing endpoints can be deployed across the
enterprise economically.
c) Furthermore, since IP systems do not bond channels together like ISDN system
do, very high call reliability rivaling the POTS network can be achieved. ISDN has
proven itself unreliable over the years due to the channel bonding problem: if one of
the bonded channels is dropped during a call, often the entire call goes down. Most
companies using ISDN are delighted to achieve a 92-94% success rate while those
using IP videoconferencing often achieve greater than 99% reliability.
d) IP videoconferencing also permits significant management benefits. IP based
video systems are always connected to the packet-switched network. This constant
connectivity allows these systems to be remotely controlled and managed from a
central, remote location. Large-scale conferencing environments often use an IP-
based software product, called a gatekeeper, to control and track the usage of their
videoconferencing systems, enabling improved measurement of ROI and convenient
billing mechanisms.
e) One of the primary advantages of deploying IP based videoconferencing is the
ability to use an organization’s existing data network as the means of transport. This
is called “converged networking”. Converged networking can result in both cost
savings and efficiency enhancements because only one network is deployed,
maintained, and managed.
f) Furthermore, since IP connections are already nearly everywhere – to every
enterprise conference room and to every enterprise desktop, scaling voice and video
over IP applications is easy because the network is already deployed, debugged, up
and running. ISDN requires a separate network infrastructure and a separate
management team, and will usually be limited to niche deployments within the
enterprise.

2. What network protocols are used for IP videoconferencing services?


The two most important protocols are H.323 and Session Initiation Protocol (SIP).
H.323. H.323 is an ITU umbrella standard describing a family of protocols used to
perform call control for multimedia communication on packet networks. The most
important protocols used to set up, manage, and tear down calls are H.225 and
H.245. H.225 is used to perform call control, and H.245 is used to perform call
management.
In the most basic use of H.323v1 to set up a call, an endpoint initiates an H.225
exchange on a TCP wellknown port with another endpoint. This exchange uses the
Q.931 signaling protocol. Once a call has been established using Q.931 procedures,
the H.245 call management phase of the call is begun. H.245 negotiations take place
on a separate channel from the one used for H.225 call setup (although with the use
of H.245 tunneling, H.245 messages can be encapsulated in Q.931 messages on
existing H.225 channels), and the H.245 channel is dynamically allocated during the
H.225 phase. The port number to be used for H.245 negotiation is not known in
advance. The media channels (those used to transport voice and video) are similarly
dynamically allocated, this time using the H.245 Open Logical Channel procedure.
Note that H.245 channels are unidirectional. In a minimal situation with direct call
signaling between endpoints and the use of one bi-directional voice channel, for each
call there will be a minimum of five channels (one H.225 channel, one H.245
channel, and one shared voice channel). Three of these will be on dynamically
allocated ports. Business-quality IP video communication between two H.323 end-
points typically requires in excess of 380 kbps data rates for each unidirectional
media channel or aggregate data rates of over 750 kbps.

SIP. Session Initiation Protocol (SIP) is an Internet Multimedia Architecture


established by the Internet Engineering Task Force (IETF – www.ietf.org).., SIP may
be used for Voice over IP (VoIP), video conferencing, instant messaging, and is being
planned for use in 3G wireless applications as well as new converged data and voice
applications. It is an application layer signaling protocol used to establish, modify,
and terminate multimedia sessions and is part of the. SIP applications include voice,
video, gaming, instant messaging, presence, call control, etc. In the spirit of other
Internet based applications, SIP relies on a number of other computer
communications standards including Session Description Protocol (SDP), Real-Time
Protocol (RTP), TCP, UDP, and so on.
SIP messages are based on HTTP protocol and have a similar text-based structure.
SIP uses Uniform Resource Indicators (URIs) which are a more general form of
world-wide web Uniform Resource Locators (URLs). There are a number of URI
forms, including user@domain, domain, user@ipaddr, telephone-number@domain.
SIP messages can also use other URIs, such as the telephone URL (as defined in
IETF RFC2806) Generally, the IP components are defined as user agents, proxies,
redirect servers, and registrars: user agents are much like an endpoint in H.323 and
may be telephones, video units, PDAs, etc. SIP communicates between these four
components using a request – response data model. Messages between components
are initiated when one component sends a request message (called a method) to
second component. Responses consist of a numerical code and a textual “reason”. To
initiate a session, one SIP device sends an “invite” message to another SIP device.
SDP is carried in the SIP message to describe the media streams and RTP is used to
exchange real-time media streams.

3. What are the main IP videoconferencing deployment issues?


The main technical issues surrounding IP videoconferencing deployment include the
following:
• Quality of Service. Quality of service (QoS) is a network term that specifies a
guaranteed throughput level. In layman’s terms, it means that the network must be
designed so that the voice and video data are transmitted through the network with
a minimum of delay and loss. The network must be carefully evaluated to insure that
it will be able to transmit voice and video data properly. Often components in the
network must be upgraded or additional routers, switches, or “packet shaping”
devices may be required.
• Overlay Network vs. Converged Network Architecture. Enterprises may not
want to put voice and video data in competition with mission critical data applications
such as market or manufacturing data running across the same network.
Consequently, a separate QoS enabled “overlay” network may be deployed for voice
and video applications.
• Security. Most enterprise networks employ firewalls and network address
translation (NAT) in an effort to prevent hackers or unauthorized persons to have
access to the data one network. Voice and video over IP are not NAT and firewall
friendly. Organizations will need to consider how they will securely traverse the
corporate firewalls or NAT system be modified, re-configured, or upgraded to allow
IP-based videoconferencing traffic?
• Bandwidth Over the WAN. IP data connections must be available at the locations
where the enterprise needs to use video. These will typically be available from
service providers (or to consider what other alternatives are available and what
bandwidth will be required. In general, satellites communications will not offer
sufficient quality of service for IP videoconferencing due to excessive latency.
• Multipoint Bridging Capability. Organizations will need to consider whether
more than two parties will need to participate in a video call. If so, some type of
video multipoint bridging capability will be necessary. The MCU may be purchased
and managed internally or all bridging functions may be outsourced to a service
provider. If an internal MCU is to be purchased, then the magnitude of the initial
investment increases and internal staff will need to be allocated to manage the video
bridge.
• IP-ISDN Gateway Needs. As the transition to IP will not be complete for some
years to come, organizations using IP video systems will likely need to communicate
others using ISDN. Organizations will need to consider how many gateways are
needed and how many ISDN lines should be provisioned for each. Rather than
owning the gateway with its associated capital outlay and management costs, an
enterprise may utilize a gateway owned and managed by a service provider.
• Additional IT Resource Requirements: Network maintenance and support
resources must be willing, capable, and available to support a converged network
carrying data, video, and voice traffic? Organizations should consider if additional
resources will be required to manage new video-centric devices on the network.

4. What are quality-of-service (QoS) requirements for IP-based voice and video?
Real-time IP applications, such as videoconferencing and voice-over-IP are much
more sensitive to network quality of service vis-à-vis store-and-forward-type of data
applications, such as e-mail and file transfer. Quality of Service (QoS) refers to
intelligence in the network to grant appropriate network performance to satisfy an
application’s requirements. For multimedia over IP networks, the goal is to preserve
both the mission-critical data in the presence of multimedia voice and video and to
preserve the voice and video quality in the presence of bursty data traffic. Four
parameters are generally used to describe quality of service: latency or delay, the
amount of time it takes a packet to transverse the network; jitter, the variation in
delay from packet to packet; bandwidth, the data rate that can be supported on the
network; and packet loss, the per cent of packets that do not make it to their
destination for various reasons.
• End-to-end latency. End-to-end latency refers to the total transit time for
packets in a data stream to arrive at the remote endpoint. The upper bound for
latency for H.323 voice and video packets should not be more than 125-150
milliseconds. The average packet size for video packets is usually large (800-1500
bytes) while audio packet sizes are generally small (480 bytes or less). This means
that the average latency for an audio packet may be less than that for a video packet
as intervening routers/switches typically prioritize smaller over larger packets when
encountering network congestion. In addition, an H.323 video call actually represents
four streams – each station sends and receives audio and video. The difference in
latency of the streams will manifest itself as additional delay (both H.323 and SIP
convey sufficient information to lip-synch the various streams).
• Jitter or variability of delay. This refers to the variability of latencies for packets
within a given data stream and should not exceed 20 - 50 milliseconds. An example
would be a data stream in a 30 FPS H.323 session that has an average transit time
of 115 milliseconds. If a single packet encountered a jitter of 145 milliseconds or
more (relative to a prior packet), an underun condition may occur at the receiving
endpoint, potentially causing either blocky, jerky video or undesirable audio. Too
much jitter can cause inter-stream latencies which as discussed next.
• Inter-stream latency. This refers to the relative latencies that can be
encountered between the audio and video data streams and is based on how the
relative average transit time for the given streams, at any given point, vary from
each other. In this case the relative latency variations are not symmetrical. This is
due to the fact that the human brain already compensates for audio latency relative
to video. Due to this fact, an audio stream that starts arriving at an endpoint 30
milliseconds ahead of its video stream counterpart(s) will produce detectable lip-
synchronization problems for most participants. An audio stream that arrives later
than its associated video stream data has a slightly higher tolerance of 40
milliseconds before the loss of audio and video synchronization becomes generally
detectable.
• Packet loss. This term refers to the loss or desequencing of data packets in a
real-time audio/video data stream. A packet loss rate of 1% produces roughly a loss
of one fast video update per second for a video stream producing jerky video. Lost
audio packets produce choppy, broken audio. Since audio operates with smaller
packets at a lower bandwidth, in general, it is usually less likely to encounter packet
loss, but an audio stream is not immune from the effects of packet loss. A 2% packet
loss rate starts to render the video stream generally unusable, though audio may be
minimally acceptable. Consistent packet loss above 2% is definitely unacceptable for
H.323 videoconferencing unless some type of packet loss correction algorithm is
used between the endpoints. Packet loss in the 1-2% should still be considered a
poor network environment and the cause of this type of consistent, significant packet
loss should be resolved.
Three tools for network quality of service
Three types of tools or solutions are available to the network engineer to build
quality of service into the network system. 1) Provisioning means providing adequate
bandwidth for all voice, video, and data applications that traverse a common
network. By using a 100 Mbps Ethernet network for example instead of a 10Mbps
network, the network is more likely to support multimedia traffic together with data.
Note that IP networks typically have significant packet overhead. For example, a 384
Kbps video call actually requires about 10% additional bandwidth for IP overhead;
furthermore, when going from IP to ATM or frame relay, an additional 10% of the
call bandwidth should be allocated for encapsulation. Hence, a 384 Kbps IP call
traversing an ATM backbone may required as much as 460 Kbps of bandwidth. 2)
Classifying means giving packets a classification based on their priority. Voice
packets would be given the highest priority since they are very delay and jitter
sensitive, even though they are not particularly bandwidth challenging. Video
packets might be given a slightly lower priority; and email packets, for example,
given the lowest priority. There are many different classification schemes possible,
including some that are in the process of being standardized. One common scheme
is to give VoIP packets an IP precedence of of 5 and videoconferencing applications
an IP precedence of 4. 3) Queuing refers to a process that takes place in the routers
and switches whereby different queues or buffers are established for the different
packet classifications. One of the buffers, for example, might be a delay and drop
sensitive buffer designed to handle voice and/or video packets. Many queuing
schemes are available for implementation.
Solving QoS over IP networks for multimedia conferencing is a two-phase problem:
1. Guarantee QoS within a specific, controlled enterprise intranet or service provider
network.
2. Guarantee QoS across the hand-off (peering) points between the networks. The
public Internet presents this second challenge to the extreme.
Four major QoS initiatives are RSVP (resource ReSerVation Protocol), IP Precedence,
and Differentiated Services (DiffServ) from the IETF, and 802.1p from the IEEE.
Improved quality of service through use of standard mechanisms, such as DiffServ
and MPLS, is the key factor behind the promise of broad-based use of interactive
business-quality IP video. The underlying requirement is for the IP video
infrastructure to enable end-to-end prioritized processing and delivery of video traffic
between subscriber networks and carrier core networks. This requires prioritized
treatment of video traffic over the “last mile” access network through the metro
network through the carrier networks. While DiffServ is gaining broad support to
enable “soft” QoS through prioritization of processing of traffic by service provider
routers, MPLS is being deployed in service provider networks as an adjunct to enable
the fine-grained delivery of a number of value-added services.

5. What are enterprise network policy-related challenges for H.323-and SIP-


based video usage?
Use of high data rate applications, such as H.323/SIP-based business-quality video,
has the potential of significantly impacting available LAN capacity for data traffic.
Even with the on-going migration of corporate LANs to gigabit backbones and 100BT
switched subnets, uncontrolled usage of interactive video services has the potential
of severely reducing response times for business applications. Thus, the H.323 video
delivery infrastructure is required to permit corporations to implement fine-grained
controls regarding who can use interactive video and under what conditions.
Specifically, corporations require the ability to control:

• Who (by user and IP address) can use IP videoconferencing services?


• Can specific users and end-points only receive and/or initiate calls.
• What types of codec’s can specific end-points/users use for calls?
• Maximum aggregate video traffic throughput coming into/exiting an enterprise
network.
• The above by time of day.

6. What are the formats of videoconferencing?


There are mainly two formats for Videoconferencing:
1. Point-to-Point Videoconferencing: This is conferencing with video and audio on
the network much like a video telephone. It is a conference between two sites
where each site can have capabilities like document sharing, chatting, etc.
2. Multiple Point Videoconferencing: Multipoint videoconferencing allows three or
more participants to sit in a virtual conference room and communicate as if they
were sitting right next to each other.
Related to multiple site videoconferencing is bridging where sites connect through a
meeting point software that supports capabilities like document sharing, chatting,
etc.

See also: Desktop Videoconferencing and Room-Based Videoconferencing

7. What are the most used protocols in Video Conference?


H.323: Internet-Based connection that is the cheapest way to go with. Basically, you use
the Internet as the medium to transmit audio and video.
H.320: or what is known by ISDN which is transmitted through digital telephone lines.
There is a cost associated with the usage of this protocol.

Video Essential:-
http://www.videoessentials.com/glossary.php#M

You might also like