Professional Documents
Culture Documents
TABLE OF CONTENTS
FUNDAMENTALS OF VIDEO
Video components
Video Signal
Types of Analog Video Signal
• Component Video
• Composite Video
• S-Video
DIGITAL VIDEO
Analog Video Scanning Process
• Progressive scanning
• Interlaced Scanning
Color Video
Digitizing Video
Digital Video Color Sampling
VIDEO COMPRESSION
Video Compression Requirements
Coding Techniques
• Entropy coding
• Source coding
• Hybrid encoding
Methods for Compression
STEPS IN VIDEO COMPRESSION
VIDEO-CONFERENCING
Benefits of Videoconferencing
Videoconferencing Protocols
• H.320
• H.323
Videoconferencing Terms
Types of Videoconferencing
• Point-to-point
• Multipoint
• Multicast
VIDEO OVER IP
Data/ Video/ Voice in ONE Net
Video over IP Solution Structure
IP VIDEO TECHNOLOGIES
The ISDN to IP Migration for Videoconferencing
• ISDN-Only Environments
• Converged IP Environments
• IP Overlay Environments
• Hybrid Video Environments
Persistence of Vision
The rapid presentation of frames of video information to give you the illusion of
smooth motion.
Fundamentals of Video
Video is nothing but a sequence of still pictures
• To create an illusion of motion the pictures have to be played at a rate > 24
frames / sec
• A picture is divided into small areas called pixels
• Picture qualities
• Brightness : Overall / average intensity of illumination of the picture and it
determines the background level in the reproduced picture
• Contrast : The difference in the intensity between the dark parts and the bright
parts of the picture
• Detail or Resolution: The detail / Resolution depend on the number of picture
elements. Also known as the definition
• Video components
a. voltage circuit
b. Luminance
c. Color
d. Timing
• Scan rates
a. Video 525 lines interlaced
b. Computer pixels, Lines vs. Pixels
c. How may Pixels are in the frame
• Refresh Rates
a. Traditional Video 15.75 khz = 525 lives x 30 frames per sec
b. Computer graphics 640 x 480 up to 1390 x 1024 up to 110 Khz.
Video Basic
http://www.doom9.org/index.html?/video-basics.htm
http://www.maxim-ic.com/appnotes.cfm/an_pk/734
Video Signal
• A picture has 4 variables, two in the spatial axes, intensity variation and one along
the temporal axis
• An electrical signal can only represent a single variable with time
• Picture is scanned horizontally in lines to produce an electrical signal corresponding
to the brightness level of the pixels along the line
• The vertical resolution of the picture is determined by the number of scanning lines
Component video: Higher-end video systems make use of three separate video
signals for the red, green, and blue image planes. Each color channel is sent as a
separate video signal.
(a) Most computer systems use Component Video, with separate signals for R, G,
and B signals.
(b) For any color separation scheme, Component Video gives the best color
reproduction since there is no “crosstalk“between the three channels.
(c) This is not the case for S-Video or Composite Video, discussed next. Component
video, however, requires more bandwidth and good synchronization of the
three components.
Composite video:
A composite video signal is a combination Luminance level and the line
synchronization information
Color (“chrominance") and intensity (“luminance") signals are mixed into a single
carrier wave.
a) Chrominance is a composition of two color components (I and Q, or U and V).
b) In NTSC TV, e.g., I and Q are combined into a chroma signal, and a color sub
carrier is then employed to put the chroma signal at the high-frequency end of
the signal shared with the luminance signal.
c) The chrominance and luminance components can be separated at the receiver end
and then the two color components can be further recovered.
d) When connecting to TVs or VCRs, Composite Video uses only one wire and video
color signals are mixed, not sent separately. The audio and sync signals are additions
to this one signal. Since color and intensity are wrapped into the same signal, some
interference between the luminance and chrominance signals is inevitable.
http://en.wikipedia.org/wiki/Composite_video
http://electronics.howstuffworks.com/tv9.htm
Digital Video
Digital Video obtained by:
Sampling an analog video signal V(t)
Sampling the 3-D space-time intensity distribution I(x,y,t)
Video Sampling
Progressive scanning: One full frames every 1/30th of a second.
Interlaced scanning: two separate fields every 1/60th of a second.
(P:1 interlacing)
Progressive scanning
Interlaced Scanning
Because of interlacing, the odd and even lines are displaced in time from each other |
generally not noticeable except when very fast action is taking place on screen, when
blurring may occur.
Scanning and Interlacing
• Even at rates > 24 frames /sec, the user will be able to see a flicker at high
intensity levels
• To avoid flicker, a single frame is displayed in two interlaced fields
• Interlaced video standards
o NTSC – 525 / 60
o PAL – 625 / 50
NTSC (National Television System Committee)
NTSC is the video system or standard used in North America and most of South America.
In NTSC, 30 frames are transmitted each second. Each frame is made up of 525 individual
scan lines.
http://en.wikipedia.org/wiki/NTSC
Format Resolution/Lines
VHS 240
S-VHS 400-425
Betamax 500
Standard 8 mm 300
Hi-8 mm 425
Mini DV 480 (720X480)
DVD 720X480
HD-DVD up to 1920X1080
PAL (Phase Alternating Line) is a TV standard widely used in Western Europe, China,
India, and many other parts of the world.
PAL uses 625 scan lines per frame, at 25 frames/second, with a 4:3 aspect ratio and
interlaced fields.
(a) PAL uses the YUV color model. It uses an 8 MHz channel and allocates a bandwidth of
5.5 MHz to Y, and 1.8 MHz each to U and V. The color sub carrier frequency is fsc 4:43
MHz.
(b) In order to improve picture quality, chroma signals have alternate signs (e.g., +U and
-U) in successive scan lines, hence the name “Phase Alternating Line".
(c) This facilitates the use of a (line rate) comb filter at the receiver| the signals in
consecutive lines are averaged so as to cancel the chroma signals (that always carry
opposite signs) for separating Y and C and obtaining high quality Y signals.
Digital Levels
Color Video
o YCbCr
Recommended for Digital TV broadcasting by ITU-BT.601
http://en.wikipedia.org/wiki/YCbCr
http://www.graphicsacademy.com/what_ycbcr.php
Digitizing Video
A composite video signal is sampled at a rate 4 – times the fundamental
sampling frequency recommended by ITU ( 4 x 3.375 = 13.5 MHz)
With the recommended sampling rate the number of samples during the
active line period for both NTSC and PAL will be the same
The signal is converted into 8 – bit samples using A/D converter
Color difference signals are sampled at a reduced rate, which is also an
integral multiple of 3.375
http://www.pctechguide.com/45DigitalVideo.htm
The advantages of digital representation for video are many. For example:
(a) Video can be stored on digital devices or in memory, ready to be processed (noise
removal, cut and paste, etc.), and integrated to various multimedia applications;
(b) Direct access is possible, which makes nonlinear video editing achievable as a simple,
rather than a complex task;
(c) Repeated recording does not degrade image quality;
(d) Ease of encryption and better tolerance to channel noise.
Since humans see color with much less spatial resolution than they see black and white, it
makes sense to “decimate" the chrominance signal.
Interesting (but not necessarily informative!) names have arisen to label the different
schemes used.
To begin with, numbers are given stating how many pixel values, per four original pixels,
are actually sent:
(a) The chroma subsampling scheme “4:4:4" indicates that no chroma subsampling is
used: each pixel's Y, Cb and Cr values are transmitted, 4 for each of Y, Cb, Cr.
(b) The scheme \4:2:2" indicates horizontal subsampling of the Cb, Cr signals by a factor
of 2. That is, of four pixels horizontally labeled as 0 to 3, all four Ys are sent, and every
two Cb's and two Cr's are sent, as (Cb0, Y0)(Cr0, Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4), and so
on (or averaging is used).
(c) The scheme “4:1:1" subsamples horizontally by a factor of 4.
(d) The scheme “4:2:0" subsamples in both the horizontal and vertical dimensions by a
factor of 2. Theoretically, an average chroma pixel is positioned between the rows and
columns as shown Fig.5.6.
Scheme 4:2:0 along with other schemes is commonly used in JPEG and MPEG (see later
chapters in Part 2).
Color Sampling
http://www.larryjordan.biz/articles/lj_sampling.html
4:2:2
At the first sample point on a line, Y (luminance), Cr (R-Y), and Cb (B-Y) samples are all
taken; at the second sample point only a Y sample is taken; at the third sample point a Y,
a Cb and a Cr are taken, and this process is repeated throughout the line
4:2:0
At the first sample site in the first line, a Y sample and a Cb sample are taken. At the
second site a Y sample only is taken, while at the third site a Y and a Cb are taken and
this is repeated across the line. Similarly Cr samples are taken in the second line
---------------------------------------------
Video compression
Goal of video compression is to minimize the bit rate in the digital representation of
the video signal while:
– Maintaining required levels of signal quality
– Minimizing the complexity of the codec
– Containing the delay
Video compression is all about reducing the number of bytes by which a video can
be transmitted or stored, without costing much on the quality. It also reduces the
time of transmitting a video over a channel, thanks to the reduced size. Compressed
video can be transmitted more economically over a smaller carrier.
Most networks handle approximately 120 Mbits/s of data. Uncompressed video
normally exceeds a network’s bandwidth capacity, does not get displayed properly,
and requires large amount of disk space for storage purposes. Therefore, it is not
practical to transmit video sequences without using compression.
There are well defined standards and protocols describing how the how the
information should be encoded, decoded, and otherwise represented.
Source coding
Takes advantage of the nature of the data to generate a one-way relationship
between the original and compressed information. “Lossy” techniques
• Lossy coding is an irreversible process -recovered data is degraded -> the
reconstructed video is numerically not identical to the original. Takes into account
the semantics of the data. Quality is dependent on the compression method the
compression ratio.
– Example: Source Coding
• Degree of compression depends on data content.
• E.g. content prediction technique - DPCM, delta modulation
Hybrid encoding
Uses elements from both Entropy and Source
Most techniques used in multimedia systems are hybrid
– E.g. JPEG, H.263, MPEG-1, MPEG-2, MPEG-4
---------------------------------------------
The difference between subsequent frames will be minimal. Video compression uses
this property to reduce the size of a video. A video encoder is the device that does
the compression. An encoder compared consecutive frames, picks out only the
difference and encodes that instead of encoding the entire frame. The compression is
done on Frame by Frame basis.
Below given diagram gives the basic procedure of video compression irrespective of
the standards.
RGB to YUV:
This is the first step in compressing a video sequence. RGB (Red, Green, and Blue)
and YUV (Luminance, Blue Chrome, and Red Chrome) refer to color formats by which
a video can be represented. Each frame will have a particular value for Red, Green
and Blue components. When a camera captures a video, it will be in RGB (Red, Geen,
and Blue) format. But RGB videos require more space for storage than YUV format.
Therefore to make transmission and storage easier, the video sequence is converted
from RGB to YUV. This conversion is done for each frame of the video. The formula
by which the conversion is done is given below.
Motion Vectors
(Motion Compensation)
Residual Frame
DCT (Discrete Cosine Transform):
Discrete Cosine Transform involves in converting the frames from time domain to
frequency domain.
A DCT is performed on small blocks (8 pixels by 8 lines) of each component of the
motion compensated frame to produce blocks of DCT coefficients. The magnitude of
each DCT coefficient indicates the contribution of a particular combination of
horizontal and vertical spatial frequencies to the original picture block. The coefficient
corresponding to zero horizontal and vertical frequency is called the DC coefficient.
Quantization:
Quantization is the process of converting analog signal to digital signal. It involves
the approximation of continuous values to discrete integer values. Quantization plays
a important role in data compression. The frame that comes from the discrete cosine
transform is very high in precision. By quantizing the values to approximate integer,
the size of the frame is reduced. Instead of using large numbers, we reduce them to
inexpensive integer values by dividing hem with constant values. But there are
losses associated with the quantization.
Inverse Quantization:
Inverse Quantization helps in reconstructing the frame which can be used as a
reference frame for Motion Estimation. The quantized frame is multiplied with the
same quantizing constant value with which it is divided during quantization.
Huffman Coding:
The quantized frame will have discrete values associated with each pixel. Huffman
coding associates each pixel value with a symbol that can be transmitted easily
through a channel. While decompression, the symbols are remapped to their
corresponding values and the frame can be reconstructed.
Once the frame is out of the Huffman Coding phase, the video stream is ready to be
transmitted.
---------------------------------------------
SPEECH CODECS
G.723.1
G.723.1 is an optional legacy codec included in the 3rd Generation Partnership
Project (3GPP) recommendation for compatibility with standards such as H.323. A
look-ahead of 7.5 ms duration is also used. Music or tones such as DTMF or fax tones
cannot be transported reliably with this CODEC, and thus some other method such
as G.711 or out-of-band methods should be used to transport these signals. G.723.1
operates at two bit rates of 6.3 kbit/s and 5.3 kbit/s.
G.711
G.711 is an ITU-T standard for audio companding. G.711 is a standard to represent 8
bit compressed pulse code modulation (PCM) samples for signals of voice
frequencies, sampled at the rate of 8000 samples/second and 8 bits per sample.
G.711 encoder will thus create a 64 kbit/s bitstream. This codec is used to transmit
DTMF and fax tones in E1/T1 lines.
There are two main algorithms defined in the standard, mu-law algorithm (used in
North America & Japan) and a-law algorithm (used in Europe and rest of the world)
G.729
G.729 is mostly used in Voice over IP (VoIP) applications for its low bandwidth
requirement. Music or tones such as DTMF or fax tones cannot be transported
reliably with this codec, and thus use G.711 or out-of-band methods to transport
these signals. Also very common is G.729a which is compatible with G.729, but
requires less computation. This lower complexity is not free since speech quality is
marginally worsened. The annex B of G.729 is a silence compression scheme, which
has a Voice Activity Detection (VAD) module (used to detect voice activity, speech or
non speech), Comfort noise generator (CNG), a DTX module which decides on
updating the background noise parameters for non speech (noisy frames) which are
also called as SID frames.
G.729 operates at 8 kbit/s, but there are extensions, which provide also 6.4 kbit/s
and 11.8 kbit/s rates for marginally worse and better speech quality respectively.
GSM-AMR
Under 3G-324M, the adaptive multi-rate (AMR) codec is the mandatory speech
codec. AMR can operate at different rates between 12.2 and 4.75 kbps. It also
supports comfort noise generation (CNG) and a discontinuous transmission (DTX)
mode. It can dynamically adjust its rate and error control, providing the best speech
quality for the current channel conditions. The AMR codec also supports unequal
error detection and protection (UED/UEP). This scheme partitions the bit stream into
classes on the basis of their perceptual relevance. An AMR frame is discarded if
errors are detected in the most perceptually relevant data, otherwise it is decoded
and error concealment is applied.
Since the ability to suppress silence is one of the primary motivations for using
packets to transmit voice, the real time protocol (RTP) header carries both a
sequence number and a timestamp to allow a receiver to distinguish between lost
packets and periods of time when no data was transmitted. Some payload formats
define a "silence insertion descriptor" or "comfort noise" (CN) frame (like G.711
codec which is sample based; i.e. the encodings produce one or more octets per
sample) to specify parameters for artificial noise that may be generated during a
period of silence to approximate the background noise at the source. Some codecs
like G729 (It is a frame based codec because it encodes a fixed-length block of audio
into another block of compressed data, typically also of fixed length) have the silent
frames as a part of the codec frame structure and hence don’t need separate payload
format for the silent frame. When the CN payload format is used with another
payload format, different values in the RTP payload type field distinguish comfort-
noise packets from those of the selected payload format.
The RTP header for the comfort noise packet SHOULD be constructed as if the
comfort noise were an independent codec. Each RTP packet containing comfort noise
MUST contain exactly one CN payload per channel. This is required since the CN
payload has a variable length. The CN packet update rate is left implementation
specific. The CN payload format provides a minimum interoperability specification for
communication of comfort noise parameters. The comfort noise analysis and
synthesis as well as the VAD and DTX algorithms are unspecified and left
implementation-specific.
VIDEO CODECS
H.261
Designed for video phone and video conference over ISDN
• Bit rate: n x 64kbps, n [1, 30]
• QCIF (172x144), CIF (352x288)
• Coding Scheme
– DCT based compression to reduce spatial redundancy (similar to JPEG)
– Block based motion compensation to reduce temporal redundancy
H.263
Designed for low bit rate video applications
• Bit rate: 10 ~ 384kbps
• SQCIF (128x96) ~ 16CIF (1408x1152)
• Coding similar to H.261 but more efficient
H.263 is a video codec designed by the ITU-T as a low-bit rate encoding solution for
videoconferencing. It is a legacy codec that is used by existing H.323 systems and
has been kept for compatibility. It was further enhanced to codec’s such as H.263v2
(a.k.a. H.263+ or H.263 1998) and H.263v3 (a.k.a. H.263++ or H.263 2000).
http://www.h263l.com/
H.264:
This is one of the most advanced standards for video compression. This is based on
the basic compression principles like most standards but has some unique features.
The average bitrate reduction in H.264 is 50% which is higher than any other
standards mentioned above. Video conferencing, Tele-medicine, Satellite telecast are
some of the application that uses H.264.
MPEG-1
Designed for storage/retrieval of VHS quality video on CD-ROM
• Bit rate: ~1.5Mbps
• Similar Coding scheme to H.261 with:
– Random access support
– Fast forward/backward support
Standard used for the compression of moving pictures and audio. This was based on
CD-ROM video applications, and is a popular standard for transmitting video
sequences over the internet. In addition, level 3 of MPEG-1 is the most popular
standard for digital compression of audio--known as MP3. MPEG-1 is designed for
bitrates up to 1.5 Mbit/sec.
MPEG-2
Designed for broadcast quality video storage and transport
• HDTV support
• Bit rate: 2Mbps or higher (CBR/VBR)
• Two system bit streams: Packet Stream & Transport
Stream
• Used for:
– DVD
– DirecTV
– Digital CATV
This standard is mainly used in Digital Television set top boxes and DVD video. It is
based on MPEG-1, but has some special features for digital broadcast television. The
most significant enhancement from MPEG-1 is its ability to efficiently compress
interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, reducing the
need for an MPEG-3. Designed for Videos with bitrate between 1.5 and 15 Mbit/sec.
MPEG-4
• Internet in the future
– Not only text and graphics, but also audio and video
• Fast and versatile interactivity
– Zoom in; zoom out (remote monitoring)
– Fast forward and fast backward (video on demand)
– Change viewing point (online shopping, sports)
– Trigger a series of events (distance learning)
– On the fly composition
– Virtual environments
• Support both low bandwidth connections
(wireless/mobile) and high bit rates (fixed/wire line)
MPEG-4 is a standard used primarily to compress audio and video (AV) digital data.
It is more flexible than H.263 baseline and offers advanced error detection and
correction schemes.
MPEG-4 absorbs many of the features of MPEG-1 and MPEG-2 and other related
standards, adding new features such as (extended) VRML support for 3D rendering,
object-oriented composite files (including audio, video and VRML objects), support
for externally-specified Digital Rights Management and various types of interactivity.
AAC (Advanced Audio Codec) was standardized as an adjunct to MPEG-2 (as Part 7)
before MPEG-4 was issued.
Most of the features included in MPEG-4 are left to individual developers to decide
whether to implement them, which is why it is divided into many parts ranging from
part1 to part 22.
http://www.webopedia.com/TERM/M/MPEG.html
http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group
-----------------------------------------
In following Table, the speech codecs are compared based on the differences in their
frame duration, frame size, Bit rate and RTP payload type. The RTP payload type is
the number specified in the RFC’s for the respective codec.
Codec Compre Transfor Bit Rate Resolutio Frame Latenc Quality Applicatio
ssion m n Rate y n
MJPEG Frame- DCT 10~300 Any size 0~30 Low Broadca IP
based 0 st networks
Wavelet Frame- Wavelet 30~750 160x120 8~30 High Visually Various
based 0 ~ lossless
320x240
MPEG-4 Stream- DCT and 10 ~ 64x48 ~ 1~ Mediu Internet Wireless
based Wavelet 10000 4096x409 60 m to Digital
6 TV
H.263 Stream- DCT 30~200 128x96 ~ 10 ~ Low Video Teleconfer
based 1408x115 15 phone e-nce
2
Many leading major commercial DSP processors from Analog Devices, Motorola, and
Texas Instruments, Freescale, ARM (not exactly a DSP but is the core for may of the
DSP’s) are used in these gateways. The user has to analyze the processor to choose
from based on
• cycle count
• speed
• cost/performance
• energy efficiency
• memory usage
• different call scenarios the gateway will be handling
The codecs may not be ported on each one of them, for determining the above
factors, but some DSP modules like FIR filter, FFT etc can be used to evaluate the
above factors. There are certain benchmarking suites available in the market which
might help the reader in deciding a better processor for the intended gateway.
Video-conferencing
A video communications session among three or more people who are geographically
separated. This form of conferencing started with room systems where groups of people
met in a room with a wide-angle camera and large monitors to hold a conference with
other groups at remote locations. Federal, state and local governments are making major
investments in group videoconferencing for distance learning and telemedicine.*
Benefits of Videoconferencing
• Interaction with people and classrooms anywhere in the world
• Share and collaboration on data
• Expose students to the latest technology available
• Save time and money involved in travel for meetings
• Distance learning - providing opportunities for learning that would otherwise
be unavailable in all settings
Videoconferencing Protocols
Videoconferencing protocols are based on the standards set by the IEEE*.
Disadvantages: Pricey for equipment and the dedicated line. Errors can cause call to
drop.
H.323 - Video over IP. Has the ability to dial by the IP address or alias. Includes the
T1 capabilities for sharing and collaboration. Can be used on both private WANs and
public Internet. It is packet based.
Advantages: More cost effective (higher speeds at lower cost than H.320)
• Ability to integrate into an existing network
• You can connect to an existing H.320 infrastructure
• Has the ability to go over the public Internet.
Disadvantages: Firewalls block video traffic
• Not enough bandwidth on IP network resulting in choppy IP video
• Non-secure transmission of data
Videoconferencing Terms
• MCU
• Gatekeeper
• Gateway
• CODEC
Multipoint Control Unit (MCU): Negotiates multiple clients in a conference format. The
client does scheduling from a GUI interface that allows the client to pick a "virtual"
conference room and decide if the meeting is private or public. The host client can then
invite other participants to join scheduled or impromptu virtual meetings right at the client
desktop. It translates the various protocols (I.e. H.320, H.323, ISDN) into one
videoconference so all can understand, regardless of what protocol they are running.
Gatekeeper: This component of H.323 manages the inbound and outbound bandwidth
from the LAN. The gatekeeper registers clients and coordinates communications with other
gatekeepers. It verifies users’ identities through static IP addressing and allows them to
pass through to the MCU.
CODEC: CODEC stands for coder-decoder. It translates signals from analog to digital and
back again.
Types of Videoconferencing
• Point-to-point
• Multipoint
• Multicast
(A) Point-to-point:
Point-to-point
Videoconference between two end points; directly connected to each other by IP or ISDN
Advantages:
Clearer reception between the two sites
Less scheduling
Only the two parties involved in the conference need to schedule
Disadvantages:
• Both sites must be using the same protocol
• Only two sites are allowed
(B) Multipoint:
Multipoint
Three or more end points participating in a conference; accomplished by connecting to an
Multipoint Control Unit (MCU).
Advantages:
• Many sites using differing protocols can be connected in the same
conference.
• Better monitoring of the connections
Disadvantages:
• Slight increase in latency.
• Must be scheduled in advance with a Multipoint Control Unit. (MCU)
(C) Multicast:
Multicast
One-way communications to multiple locations. Like a TV broadcast.
Disadvantages
Video over IP
Data/ Video/ Voice in ONE Net
Go to Video over IP
Traditional CCTV System
Coaxial Cable
Analog Signal
Problems:
• Hard to remote management and maintenance
• Storing in Video Tape, difficult to manage the video data and maintain the quality
• Analog signal system, hard to integrate with other system
DVR Solution
Video stored in digital data
PC-based infrastructure
Problems:
• Stand-alone System, poor in integration
• In Windows DVR, system stability is a problem.
• In Linux or single-chip DVR, service is a key maintain issue
• Hard to manage in large or distributed system
IP Network in Video Surveillance
Transmitted in IP Network
Client/ Server-based infrastructure
Benefits:
• Expandable & Integrated Network system
• Suitable for large or distributed system
• Lower total ownership cost
• Capable of remote management and maintenance
• Good flexibility for the system upgrade or re-layout
Client IP
Networ
k
Control
Room
Fiel
d
Different TV resolution standard in LINE/FIELD, mostly NTSC is for US, and PAL for Europe
NTSC
720 x 480
704 x 480
640 x 480
352 x 240
176 x 112
PAL
720 x 576
704 x 576
640 x 576
352 x 288
176 x 144
Other likeness
• CCIR601, RS170: much like NTSC
• SECAM: much like PAL
IP Camera:
A camera that the video is transferred directly into IP signal
(= Analog camera + 1-ch Video Server)
Other alias: Network Camera
Video Server:
A device can digitize the analog video signal for IP network transmitting
Other alias: Encoder, IP Codec, Camera Server
Transmission
Transmission Media
Video Transmission needs more bandwidth than Data and Voice. Higher bandwidth can
result in better video performance (FPS and Quality)
Bandwidth Requirement
• Simple calculation
Bandwidth requirement =
Image Size per frame x FPS (Frame per second) x (1 + 3% IP overhead) x (1+30%
margins) x 8bits
For example:
5 Kbytes x 30 FPS x1.03 x 1.3 x 8 bits = 1.6 Mbps
Note: Video record storage space can be also calculated by this formula:
Image Size per frame x FPS (Frame per second) x record time = total storage
space requirement
Network Protocols
Advanced Network protocols can help the video transmission more efficient, such as:
Integrated with Alarm Systems
Integrate with DI (Digital Input) sensors and DO (Relay output) Alarms can build an
intelligent video surveillance system.
IP Video Technologies—
Vide
oconference
The ISDN to IP Migration for Videoconferencing
Introduction
Virtually all video systems today include IP network capability, while only a limited
percentage support ISDN.
For some, the justification for migrating from ISDN to IP for videoconferencing was purely
financial as it allowed companies to enjoy a pay-one-price cost structure for unlimited
videoconferencing usage. For many others, however, it was the soft benefits of running
videoconferencing over IP, such as enhanced reliability and manageability, tighter
security, and an improved user experience that prompted the shift.
This session provides insight into the pros and cons of the four most common network
architectures in use today for videoconferencing:
For both new and existing VC users, there are many benefits and reasons for running
videoconferencing traffic over IP. Even if customers won’t save significant costs by
migrating from ISDN to IP, the IP strategy allows enterprise managers to turn
videoconferencing into a manageable enterprise business tool, instead of a technology
gadget or curiosity.
ISDN-Only Environments
The diagram below highlights a traditional videoconferencing environment using only ISDN
service from a local telephone provider. Note that this organization may not be able to
connect to IP-only external endpoints (listed as Client Location below).
Figure 2: Traditional ISDN-Only Videoconferencing Environment
ISDN-Only Advantages
Data Isolation - In an ISDN videoconferencing environment the video traffic does not
touch the organization’s data network, which is a source of comfort for IT and network
managers.
Universal Availability - ISDN service is available almost anywhere in the world (or at
least in most places where phones services are available).
Low Fixed Costs - The fixed monthly cost for ISDN services is relatively low (typically
$150 per month for 384 kbps ISDN connectivity), which makes ISDN cost-effective for
organizations with limited monthly video usage.
ISDN-Only Disadvantages
Converged IP Environments
Converged IP Disadvantages
Network Capability - Many enterprise networks are not equipped to host video traffic,
and cannot be cost-effectively upgraded to do so in some locations. For example, in one
organization the connections to the Los Angeles and London offices may be “video-ready,”
but those to the Milan and Singapore offices are not up to the task. In an IP-only
environment, the Milan and Singapore offices would be unreachable from the enterprise’s
IP video systems (unless an ISDN gateway product / service or an IP-overlay solution was
used).
Endpoint Capability – Many legacy video systems are not IP-capable and would need to
be replaced or upgraded to function in an IP-only environment.
Global Reach – In order to communicate with ISDN-only endpoints, such as those
deployed at the client location shown above, either an IP to ISDN gateway device or an
external gateway service must be used. In addition, corporate security systems, including
the enterprise firewalls and NAT systems, often block IP traffic between enterprises,
making it impossible to host IP video calls between organizations.
Lack of Redundancy – In the event that the enterprise LAN or WAN experiences
problems, one or more endpoints may be unable to place or receive video calls. Once
again, there is no alternate network to host the video traffic.
Potential Impact on Network – If not properly planned and managed, it is possible that
the videoconferencing traffic could negatively impact the other traffic on the data network.
This risk, however, is easily avoided through the use of a videoconferencing gatekeeper.
IP Overlay Environments
Many organizations are unable to host videoconferencing traffic on all or specific segments
of their primary data network due to limited bandwidth or lack of QoS (quality of service).
To bypass these issues, some organizations choose to replace their ISDN network with a
totally separate IP network dedicated to hosting IP video traffic.
The graphic below highlights an IP overlay environment. Note the use of the IP overlay
network provider’s ISDN and Internet gateways to allow the host organization to connect
to external ISDN and IP endpoints.
IP video overlay solutions share many of the advantages of the converged IP solution, plus
several key advantages:
Network Isolation - the IP overlay architecture allows organizations to enjoy the
benefits of IP videoconferencing without impacting the existing data network.
Upgrade Avoidance – the IP overlay method allows an organization to avoid the need for
network capacity and/or performance upgrades in some or all locations.
IP-Overlay Disadvantages
IP video overlay solution disadvantages include the need to purchase additional network
services dedicated to hosting IP video traffic, and the fact that gateways (which never
improve but often detract from the user experience) must be used to conduct calls with
any locations not on the IP overlay network.
This architecture affords many advantages from the three prior methods, plus additional
benefits:
Endpoint Flexibility –The enterprise can utilize a mixture of new (and relatively
inexpensive) IP-capable video endpoints and legacy, ISDN-only endpoints.
Network Redundancy – Since most endpoints have access to IP and ISDN connections,
video connections can be made even if one of the networks (IP or ISDN) is experiencing
problems.
Global Reach – The support for IP and ISDN video traffic throughout the enterprise
makes it easier to host video calls between different organizations.
The most significant disadvantage of this method is the frequent use of gateways
(products or services) to connect to internal and external video endpoints.
Frequently Asked Questions
4. What are quality-of-service (QoS) requirements for IP-based voice and video?
Real-time IP applications, such as videoconferencing and voice-over-IP are much
more sensitive to network quality of service vis-à-vis store-and-forward-type of data
applications, such as e-mail and file transfer. Quality of Service (QoS) refers to
intelligence in the network to grant appropriate network performance to satisfy an
application’s requirements. For multimedia over IP networks, the goal is to preserve
both the mission-critical data in the presence of multimedia voice and video and to
preserve the voice and video quality in the presence of bursty data traffic. Four
parameters are generally used to describe quality of service: latency or delay, the
amount of time it takes a packet to transverse the network; jitter, the variation in
delay from packet to packet; bandwidth, the data rate that can be supported on the
network; and packet loss, the per cent of packets that do not make it to their
destination for various reasons.
• End-to-end latency. End-to-end latency refers to the total transit time for
packets in a data stream to arrive at the remote endpoint. The upper bound for
latency for H.323 voice and video packets should not be more than 125-150
milliseconds. The average packet size for video packets is usually large (800-1500
bytes) while audio packet sizes are generally small (480 bytes or less). This means
that the average latency for an audio packet may be less than that for a video packet
as intervening routers/switches typically prioritize smaller over larger packets when
encountering network congestion. In addition, an H.323 video call actually represents
four streams – each station sends and receives audio and video. The difference in
latency of the streams will manifest itself as additional delay (both H.323 and SIP
convey sufficient information to lip-synch the various streams).
• Jitter or variability of delay. This refers to the variability of latencies for packets
within a given data stream and should not exceed 20 - 50 milliseconds. An example
would be a data stream in a 30 FPS H.323 session that has an average transit time
of 115 milliseconds. If a single packet encountered a jitter of 145 milliseconds or
more (relative to a prior packet), an underun condition may occur at the receiving
endpoint, potentially causing either blocky, jerky video or undesirable audio. Too
much jitter can cause inter-stream latencies which as discussed next.
• Inter-stream latency. This refers to the relative latencies that can be
encountered between the audio and video data streams and is based on how the
relative average transit time for the given streams, at any given point, vary from
each other. In this case the relative latency variations are not symmetrical. This is
due to the fact that the human brain already compensates for audio latency relative
to video. Due to this fact, an audio stream that starts arriving at an endpoint 30
milliseconds ahead of its video stream counterpart(s) will produce detectable lip-
synchronization problems for most participants. An audio stream that arrives later
than its associated video stream data has a slightly higher tolerance of 40
milliseconds before the loss of audio and video synchronization becomes generally
detectable.
• Packet loss. This term refers to the loss or desequencing of data packets in a
real-time audio/video data stream. A packet loss rate of 1% produces roughly a loss
of one fast video update per second for a video stream producing jerky video. Lost
audio packets produce choppy, broken audio. Since audio operates with smaller
packets at a lower bandwidth, in general, it is usually less likely to encounter packet
loss, but an audio stream is not immune from the effects of packet loss. A 2% packet
loss rate starts to render the video stream generally unusable, though audio may be
minimally acceptable. Consistent packet loss above 2% is definitely unacceptable for
H.323 videoconferencing unless some type of packet loss correction algorithm is
used between the endpoints. Packet loss in the 1-2% should still be considered a
poor network environment and the cause of this type of consistent, significant packet
loss should be resolved.
Three tools for network quality of service
Three types of tools or solutions are available to the network engineer to build
quality of service into the network system. 1) Provisioning means providing adequate
bandwidth for all voice, video, and data applications that traverse a common
network. By using a 100 Mbps Ethernet network for example instead of a 10Mbps
network, the network is more likely to support multimedia traffic together with data.
Note that IP networks typically have significant packet overhead. For example, a 384
Kbps video call actually requires about 10% additional bandwidth for IP overhead;
furthermore, when going from IP to ATM or frame relay, an additional 10% of the
call bandwidth should be allocated for encapsulation. Hence, a 384 Kbps IP call
traversing an ATM backbone may required as much as 460 Kbps of bandwidth. 2)
Classifying means giving packets a classification based on their priority. Voice
packets would be given the highest priority since they are very delay and jitter
sensitive, even though they are not particularly bandwidth challenging. Video
packets might be given a slightly lower priority; and email packets, for example,
given the lowest priority. There are many different classification schemes possible,
including some that are in the process of being standardized. One common scheme
is to give VoIP packets an IP precedence of of 5 and videoconferencing applications
an IP precedence of 4. 3) Queuing refers to a process that takes place in the routers
and switches whereby different queues or buffers are established for the different
packet classifications. One of the buffers, for example, might be a delay and drop
sensitive buffer designed to handle voice and/or video packets. Many queuing
schemes are available for implementation.
Solving QoS over IP networks for multimedia conferencing is a two-phase problem:
1. Guarantee QoS within a specific, controlled enterprise intranet or service provider
network.
2. Guarantee QoS across the hand-off (peering) points between the networks. The
public Internet presents this second challenge to the extreme.
Four major QoS initiatives are RSVP (resource ReSerVation Protocol), IP Precedence,
and Differentiated Services (DiffServ) from the IETF, and 802.1p from the IEEE.
Improved quality of service through use of standard mechanisms, such as DiffServ
and MPLS, is the key factor behind the promise of broad-based use of interactive
business-quality IP video. The underlying requirement is for the IP video
infrastructure to enable end-to-end prioritized processing and delivery of video traffic
between subscriber networks and carrier core networks. This requires prioritized
treatment of video traffic over the “last mile” access network through the metro
network through the carrier networks. While DiffServ is gaining broad support to
enable “soft” QoS through prioritization of processing of traffic by service provider
routers, MPLS is being deployed in service provider networks as an adjunct to enable
the fine-grained delivery of a number of value-added services.
Video Essential:-
http://www.videoessentials.com/glossary.php#M