Professional Documents
Culture Documents
1. INTRODUCTION
Video compression plays an important role in todays wireless communications. It
allows raw video data to be compressed before it is sent through a wireless channel.
However, video compression is computation-intensive and dissipates a significant amount of
power. This is a major limitation in todays portable devices. Existing multimedia devices can
only play video applications for a few hours before the battery is depleted. [1]
The latest video compression standard MPEG-4 AVC/H.264 gives 50% improvement
in compression efficiency compared to previous standard. However, the coding gain comes at
the expense of increased computational complexity at the encoder. Motion estimation (ME)
has been identified as the main source of power consumption in video encoders. It consumes
5090% of the total power used in video compression. The introduction of variable block size
partitions and multiple reference frames in the standard result in increased of computational
load and memory bandwidth during motion prediction.
Block-based ME have been widely adopted by the industry due to its simplicity and
ease of implementation. Each frame is partitioned into 16 16 pixels, known as macro blocks
(MBs). Full-search ME predicts the current MB by finding the candidate that gives the
minimum sum of absolute difference (SAD) as follows: [1]
M 1 N 1
SAD(i,j) =
C (k , l ) R(i k , j l )
(i)
k 0 l 0
Where (c, l) is the current macro block and R (I + k, j + l) is the candidate MB located
in the search window within the previously encoded frame. From (1), the power consumption
in ME is effected by the number of candidates and the total computation to calculate the
matching costs. Thus the power can be reduced by minimizing these parameters. [1]
Furthermore, to maximize the available battery energy, the computational power
should be adapted to the supply power, picture characteristics, and available bandwidth.
Because these parameters change over time, the ME computation should be adaptable to
different scenarios without degrading the picture quality. Pixel truncation can be used to
reduce the computational load by allowing us to disable the hardware that processes the
truncated bits. While previous studies focused on fixed-block size ME (16 16 pixels), very
little work has been done to study the effect of pixel truncation for smaller block sizes. The
latest MPEG-4 standard, MPEG-4 AVC/H.264, allows variable block size for motion
estimation (VBSME). Defines 1616, 16 8, 8 16, 8 8, 8 4, 4 8, and 4 4 block
sizes. At smaller block partitions, a better prediction is achieved for objects with complex
motion. [2]
Truncating pixels at a 1616 block size results in acceptable performance as shown in the
literature. However, at smaller block sizes, the number of pixels involved during motion
prediction is reduced. Due to the truncation error, there is a tendency for smaller blocks to
yield matched candidates, which could lead to the wrong motion vector. Thus, truncating
pixels using smaller blocks results in poor prediction.
We have implemented a low-power algorithm and architecture for ME using pixel truncation
for smaller block sizes. The search is performed in two steps: 1) truncation mode and 2)
refinement mode. This method reduces the computational cost and memory access without
significantly degrading the prediction accuracy. In this project, we perform an in-depth
analysis of this technique and extend the technique to a complete H.264 system. [1]
different frames but in the same pixel position. Spectral correlation is the correlation between
samples of the same source from multiple sensors.
Typical values of image or video without any compression would be in the range of
some Giga Bytes (of some 1.30hrs video). To store this much data, it will not be practical and
of course, it is not possible for transmissions also (as it requires high bandwidth rates). So, in
order to store or transmit the data, we need to compress the data, so as to meet the
requirements.[1]
2.LITERATURE SURVEY
2.1 To Low Power H.264 Architecture:
The latest video compression standard MPEG-4 AVC/H.264 gives 50% improvement
in compression efficiency compared to previous standard. However, the coding gain comes at
the expense of increased computational complexity at the encoder. Motion estimation (ME)
has been
Identified as the main source of power consumption in video encoders. It consumes 5090%
of the total power used in video compression. The introduction of variable block size
partitions and multiple reference frames in the standard result in increased of computational
load and memory bandwidth during motion prediction.[2]
Block-based Motion Estimation (ME) has been widely adopted by the industry due to
its simplicity and ease of implementation. Each frame is partitioned into 16 16 pixels,
known as macro blocks (MBs).
Full-search ME predicts the current MB by finding the candidate that gives the
minimum sum of absolute difference (SAD), as follows:
M 1 N 1
SAD(i,j) =
C (k , l ) R(i k , j l )
k 0 l 0
Pixel truncation (usually, a pixel will be represented with 8-bits, here, instead of using
8-bits for pixel representation, we use less number of pixels) can be used to reduce the
computational load by allowing us to disable the hardware that processes the truncated bits.
While previous studies focused on fixed-block size ME (16 16 pixels), very little work has
been done to study the effect of pixel truncation for smaller block sizes. The latest MPEG-4
standard, MPEG-4 AVC/H.264, allows variable block size for motion estimation (VBSME).
At smaller block partitions, a better prediction is achieved for objects with complex motion.
For video applications, data is highly correlated, and the switching activity is
distributed non-uniformly. Since the LSBs of a data word experience a higher switching
activity, significant power reduction can be achieved by truncating these bits. In general,
about 50% switching activity reduction is obtained if we truncate up to three LSBs. Further
reduction can be achieved if the number of truncated bits (NTBs) is increased. For example,
if the NTB is set to 6, the switching activity could be reduced by 8090%. This makes pixel
truncation attractive to minimize power in ME.[2]
Table I shows the cumulative distribution function (CDF) for SAD that is obtained
during ME using five Foreman sequences. The SAD is grouped into five categories: 0%
represents the percentage for SAD = 0, 5% represents the percentage of SAD < 5% SADmax,
and so on.
TABLE 1
CDF OF CALCULATED SAD DURING MOTION ESTIMATION USING
Foreman SEQUENCE (SEARCH RANGE, P=+8)
Block Size
16 x 16
8x8
4x4
NTB
0
4
0
4
0
4
0%
0
0.2
0
5
0
12
<5%
25
25
35
35
58
58
<10%
60
60
58
58
58
58
<20%
94
94
87
87
81
81
<40%
100
100
98
98
99
99
For 16 16 block size with NTB = 4, the percentage of SAD = 0 is close to the un
truncated bit (NTB = 0). This shows that for 1616 block size, the truncated pixel is more
likely to have the same matched candidate as in the un truncated pixel. However, for 44
block with NTB = 4, the percentage of SAD = 0 is 12% compared to 0% for NTB = 0. This
shows that there are more matched candidates using truncated pixel for 44 block size, which
could lead to incorrect motion vectors.[2]
To illustrate the effect of pixel truncation on VBSME, computed values of the average
PSNR for 50 predicted frames of Foreman sequence (QCIF@30frames/s) is given Table II.
TABLE II
AVERAGE FULL-SEARCH PSNR FOR VARIOUS NTB USING SAD AS
MATCHING CRITERIA (SEARCH RANGE, P=+8)
NTB
0
2
4
6
16 x 16
33.11
33.12
33.03
31.79
Diff.
--0.01
-0.08
-1.33
Block Size
8x8
Diff.
34.89
--34.85
-0.03
34.35
-0.54
30.29
-4.60
4x4
36.82
36.75
34.66
27.46
Diff.
---0.07
-2.16
-9.36
The frames are predicted using full-search algorithm at different block sizes and NTB.
From Table II, for full pixel resolution (NTB = 0), the prediction accuracy improves as the
block size decreases. This is reflected by a higher PSNR for predictions using a 4 4 block
compared to a 16 16 block. For NTB = 4, a small PSNR drop is observed for a block size of
1616 (0.08 dB) compared to untruncated pixels. The PSNR drop for predictions using
smaller block sizes is higher with 0.54 dB and 2.16 dB drops for frames with block sizes 8
8 and 4 4, respectively. [2]
As we increase the NTB = 6, the PSNR drop for the smaller blocks increases rapidly.
The PSNR drop for the 16 16 block size is 1.33 dB. However, for 8 8 and 4 4 block
sizes, the PSNR drop increases to 4.6 and 9.36 dB, respectively. This shows that pixel
truncation is not suitable for smaller block sizes. In the H.264 standard, substantial
improvement in motion prediction is gained by using smaller blocks. Therefore, it is
important to improve the PSNR gain especially for smaller block partitions.
For video coding systems, motion estimation can remove most of temporal
redundancy, so a high compression ratio can be achieved. Among various ME algorithms, a
full-search block matching algorithm (FSBMA) is usually adopted because of its good
quality and regular computation. In FSBMA, the current frame is partitioned into many small
macro blocks (MBs) of size N*N, for each MB in the current frame, one reference block that
is the most similar to current MB is sought in the searching range of size [-P,P].[2]
Although FSBMA provides the best quality among various ME algorithms, it
consumes the largest computation power. In general, the computation complexity of ME
varies from 50% to 90% of a typical video coding system. Hence, a hardware accelerator of
ME is required.
Variable block size motion estimation (VBSME) is a new coding technique and provides
more accurate predictions compared to traditional fixed block size motion estimation.
4. MOTION ESTIMATION
4.1 Introduction:
A standard movie, which is also known as motion picture, can be defined as a
sequence of several scenes. A scene is then defined as a sequence of several seconds of
motion recorded without interruption. A scene usually has at least three seconds. A movie in
the cinema is shown as a sequence of still pictures, at a rate of 24 frames per second.
Similarly, a TV broadcast consists of a transmission of 30 frames per second (NTSC, and
some flavors of PAL, such as PAL-M), 25 frames per second (PAL, SECAM) or anything
from 5 to 30 frames per second for typical videos in the Internet. [4]
The name motion picture comes from the fact that a video, once encoded, is nothing
but a sequence of still pictures that are shown at a reasonably high frequency. That gives the
viewer the illusion that it is in fact a continuous animation. Each frame is shown for one
small fraction of a second, more precisely 1/ k seconds, where k is the number of frames per
second. Coming back to the definition of a scene, where the frames are captured without
interruption, one can expect consecutive frames to be quite similar to one another, as very
little time is allowed until the next frame is to be captured. With all this in mind we can
finally conclude that each scene is composed of at least 3 k frames (since a scene is at least
3 seconds long). In the NTSC case, for example, that means that a movie is composed of a
sequence of various segments (scenes) each of which has at least 90 frames similar to one
another.[4]
Before going further with details on motion estimation we need to describe briefly
how a video sequence is organized. As mentioned earlier a video is composed of a number of
pictures. Each picture is composed of a number of pixels or peals (picture elements). A video
frame has its pixels grouped in 88 blocks. The blocks are then grouped in macro blocks
(MB), which are composed of 4 luminance blocks each (plus equivalent chrominance
blocks). Macro blocks are then organized in groups of blocks (GOBs) which are grouped in
pictures (or in layers and then pictures). Pictures are further grouped in scenes, as described
above, and we can consider scenes grouped as movies. Motion estimation is often performed
in the macro block domain. For simplicity sake well refer to the macro blocks as blocks, but
we shall remember that most often the macro block domain is the one in use for motion
estimation.[4]
For motion estimation the idea is that one block b of a current frame C is sought for in
a previous (or future) frame R. If a block of pixels which is similar enough to block b is found
in R, then instead of transmitting the whole block just a motion vector is transmitted.
Ideally, a given macro blocks would be sought for in the whole reference frame;
however, due to the computational complexity of the motion estimation stage the search is
usually limited to a pre-defined region around the macro blocks. Most often such region
includes 15 or 7 pixels to all four directions in a given reference frame. The search region is
often denoted by the interval [-p, p] meaning that it includes p pixels in all directions.[4]
The video compression model is a two-stage procedure. The first procedure consists
of taking advantage of the temporal redundancy followed by a procedure similar to that used
for lossy. Image compression which aims at exploring the spatial redundancy. In the temporal
redundancy exploitation stage, we have motion estimation of the current frame (C) using the
reference frame (R). The first stage produces both a set of motion vectors (i, j) as well as
difference macro blocks (C-R). The difference macro blocks then go through the second
stage which exploits spatial redundancy. One may notice that the difference frame has usually
very high spatial redundancy due to the fact that it only stores information of difference of
motion estimated macro blocks as well as macro blocks where a good match is not found in
the reference frame(s).[3]
reference frame R. More generally, multiple frames in the past and future can be used by a
motion estimator, but more often we either use a single frame in the past or one in the past
and one in the future, as described herein.[4]
As we have seen, the temporal prediction technique used in MPEG video is based on
motion estimation. The basic premise of motion estimation is that in most cases, consecutive
video frames will be similar except for changes induced by objects moving within the frames.
In the trivial case of zero motion between frames (and no other differences caused by noise,
etc.), it is easy for the encoder to efficiently predict the current frame as a duplicate of the
prediction frame. When this is done, the only information necessary to transmit to the
decoder becomes the syntactic overhead necessary to reconstruct the picture from the original
reference frame. When there is motion in the images, the situation is not as simple.
afford an exhaustive search. Some algorithms have been developed aiming at finding a suboptimal match in much less time than the exhaustive search.
The temporal prediction technique used in MPEG video is based on motion
estimation. The basic premise of motion estimation is that in most cases, consecutive video
frames will be similar except for changes induced by objects moving within the frames. In the
trivial case of zero motion between frames (and no other differences caused by noise, etc.), it
is easy for the encoder to efficiently predict the current frame as a duplicate of the prediction
frame. When this is done, the only information necessary to transmit to the decoder becomes
the syntactic overhead necessary to reconstruct the picture from the original reference frame.
When there is motion in the images, the situation is not as simple.[4]
Shows an example of a frame with 2 stick figures and a tree. The second half of this
figure is an example of a possible next frame, where panning has resulted in the tree moving
down and to the right, and the figures have moved farther to the right because of their own
movement outside of the panning. The problem for motion estimation to solve is how to
adequately represent the changes, or differences, between these two video frames.
luminance. It should be noted at this point that MPEG does not define how this search should
be performed. This is a detail that the system designer can choose to implement in one of
many possible ways. This is similar to the bit-rate control algorithms discussed previously, in
the respect that complexity vs. quality issues need to be addressed relative to the individual
application. It is well known that a full, exhaustive search over a wide 2-dimensional area
yields the best matching results in most cases, but this performance comes at an extreme
computational cost to the encoder. As motion estimation usually is the most computationally
expensive portion of the video encoder, some lower cost encoders might choose to limit the
pixel search range, or use other techniques such as telescopic searches, usually at some cost
to the video quality. [3]
Shows an example of a particular macro block from Frame 2, relative to various
macro blocks of Frame 1. As can be seen, the top frame has a bad match with the macro
block to be coded. The middle frame has a fair match, as there is some commonality between
the 2 macro blocks. The bottom frame has the best match, with only a slight error between the
2 macro blocks. Because a relatively good match has been found, the encoder assigns motion
vectors to the macro block, which indicate how far horizontally and vertically the macro
block must be moved so that a match is made. As such, each forward and backward predicted
macro block may contain 2 motion vectors, so true bi-directionally predicted macro blocks
will utilize 4 motion vectors.
Of course not every macro block search will result in an acceptable match. If the
encoder decides that no acceptable match exists (again, the "acceptable" criterion is not
MPEG defined, and is up to the system designer) then it has the option of coding that
particular macro block as an intra macro block, even though it may be in a P or B frame. In
this manner, high quality video is maintained at a slight cost to coding efficiency.
5.1. Overview:
Compression methodology:
The MPEG compression methodology is considered asymmetric in that the encoder is
more complex than the decoder. The encoder needs to be algorithmic or adaptive where as the
decoder is dumb and carries out fixed actions. This is considered advantageous in
applications such as broadcasting where the number of expensive complex encoders is small
but the number of simple inexpensive decoders is large. This approach of the ISO to
standardization in MPEG is considered novel because it is not the encoder which is
standardized; instead, the way in which a decoder shall interrupt the bit stream is defined. A
decoder which can successfully interpret the bit stream is said to be compliant. The advantage
of the standardizing the decoder is that the over time encoding algorithms can improve yet
compliant decoders will continue to function with them. The MPEG standards give very little
information regarding structure and operation of the encoder and implementers can supply
encoders using proprietary algorithm. This gives scope for competition between different
encoder designs which meant that better designs can evolve and users will have greater
choice because of different levels of cost and complexity can exist in a range of coders yet a
compliant decoder will operate with them all.
MPEG also standardizes the protocol and syntax under which it is possible to combine
or multiplex audio data with video data to produce a digital equivalent of a television
program. Many such programs are multiplexed and MPEG defines the way in which such
multiplexers can be created and transported.
Requirements
Systems
Video
Audio
3D Graphics compression
Test
6. HARDWARE IMPLEMENTATION
In the hardware implementation here we proposed architectures to implement the twostep algorithm. First, the conventional ME architecture that is used in our analysis is
reviewed. Next we discuss the architectures needed to support the two step method. The area
and power overhead for the computation and memory unit are also investigated. Based on
these analyzes, we propose three low-power ME architectures with different areas and power
efficiencies. in this paper, we implement the ME architecture based on 2-D ME has been
discussed. We choose 2-D ME, because it can cope with the high computational needs of the
real-time requirement of H.264 using a lower clock frequency than 1-D architecture.[5]
6.1. Computation unit:
Shows the functional units in the conventional 2-D ME (me_sad). The ME consists of search
area (SA) memory, a processing array which contains 256 processing elements (PEs), an
adder tree, a comparator, and a decision unit. The search area memory consists of 16 memory
banks where each bank stores 8-bit pixels in a H*W/N total word, where H and W are the
search area windows height and width respectively, and N is the MBs width. During motion
prediction, 16 pixels are read from the 16 memory banks simultaneously. The data in the
memory are stored in a ladder like manner to avoid delay during the scanning. At each initial
search, the current and the first candidate MB are loaded into the processing arrays registers.
Then it calculates the matching costs for one candidate per clock cycle. The 256 absolute
differences from the PEs are summed by adder tree, and output the SAD for 41 block
partitions. The adder tree reuses the SAD for 4*4 blocks to calculate a larger block partition.
In total, the adder tree calculates the 41 partitions per clock cycle.[5]
Throughout the scanning process, the comparator updates the minimum SAD and the
respective candidate location for each 41 block partition. Once the scanning is complete, the
decision unit outputs the best MB partition and its motion vectors. The ME requires 256
clock cycles to scan all candidates.
For me_sad, the input and output for each of the PE are 8-bits wide as shown in fig 1
(a). The input for the adder tree is 8-bits wide, and the SAD output is 12 to 16-bit wide,
depending on the partition size. These data are then input into the comparator, together with
the current search location information.[5]
Using similar architecture as in me_sad, DPC-based ME (me_dpc) requires two bits
for current and reference pixel inputs as shown in fig (b).furthermore, the matching cost is
calculated using boolean logic (XOR and OR) rather than arithmetic operation as in SAD
based PE. These make the overall area for the 256 PEs in me_dpc is much smaller than
me_sad. The reduction in output bitwidht in DPC-based PE also reduces the bitwidth required
for adder tree and comparator unit. The input and output for the adder tree is 1-bit and 5 to 9
bitwidths, respectively. A similar bitwidth is applide to the comparators input.
Table (1) compares the area (mm2), the total equivalent gates (based on 2- input NAND
gate), and power consumption (m W) for me_sad and me_dpc computational units. The
comparisons are based on systhesis results using 0.13M m CMOS UMC technology.[5]
TABLE (1)
ME-SAD AND ME-DPC AREA (mm2), TOTAL EQUIVALENT GATES
(BASED ON 2-INPUT NAND GATE) AND POWER (Mw)
me-sad
me-dpc
Modules
Area
Gates
Power
Area
Gates
256 PE
0.90
173611
28.67
0.32
61728
Adder tree
0.13
25077
5.53
0.04
7716
Comparator Unit
0.11
21219
1.25
0.09
17361
Decision Unit
0.10
19290
0.54
0.07
13503
Total
1.24
239198
36.00
0.52
100309
Power
2.31
0.99
0.86
0.50
4.66
The table shown that me_sads area is dominated by 256 PE (73%). Thus, with the
significantly smaller area for 256 PE, the me_dpc will require less area than the me_sad. The
overall me_dpc requires 42% of the me_sad area.
Based on the above analysis, we propose two types of architectures for the ME
computation unit that can perform low-resolution and full-resolution searches. These are
me_split and me_combine as shown in fig (3(a) and (b)).
Me_split implememts me_sad and me_dpc as two seaparte modules as shown in fig
3(a). during low resolution search, me_sad is switched off, while the me_dpc is used to
perform the search. The second step usesthe me_sad, while the me_dpc is switched off. This
architecture allows only the necessary bit size to be used during different search modes.
While potential power savings is possible, this architecture requires additional area for the
adder tree, comparator and decision unit to support the low-resolution search.
resolution search and full pixel resolution is shown in fig (3)(b). this architecture results in a
much smaller area compared to me_split. However higher power consumption is expected
during the low-resolution search because the adder tree, comparator and decision unit operate
at higher bit size than needed.[5]
Fig (6.1).SA memory arrangement (a) mem8 (b) mem28) and (c) mem8pre
Mem8 stores the data in the same way as in the conventional ME. We access 8-bit
data during both low-resolution and refinement stage. However, during the low resolution
search, the lower six bits are not used by the PE. Because the memory is accessed during the
low-resolution and the refinement stage, it results in higher memory bandwidth than the
conventional ME architecture.
To overcome the problem in mem8, mem28 uses two types of memory: 2-bit and 8bit. The 2- bit memory stores the first two MSBs of each datum, and the 8-bit memory stores
the complete full pixel bitwidth. During the low-resolution search the data from the 2-bit
memory are accessed. This allows only the required bits to be accessed without wasting any
power during low-resolution. In the refinement stage, the 8-bit memory is read into the PEs.
Although this architecture potentially reduce the memory bandwidth and power consumption,
it needs an additional area for the 2-bit memory.[3]
In mem8pre, the data is prearranged before storing them in 8-bit memory. Four pixels
are grouped together, and then transposed according to their bit position, as shown in fig (5).
Fig (6.2).storing 8-bit pixel in 8-bit memory: (a) conventional arrangement and (b) mem8pre.
During the low-resolution search, we read only the memory locations that stores the
first two MSBs of the original pixels. Thus the total memory is accessed during the lowresolution is one-fourth of the conventional full pixel access.[3]
In full resolution search, we read four memory locations that contain the first up to
eighth bits in four clock cycles. Delay buffers as shown in fig (4)realigns these words to
match the original 8-bit pixel. By prearranging the pixels this way, we can use the same
memory size as in the conventional full search while retaining the ability to access the first
two MSBs, as well as the full bit resolution. The drawback of this approach is that it needs
additioanlal circuity to transpose and realign the pixels during the motion prediction. The
estimated bandwidth for the above three memory architectures are shown in table (2).
TABLE (2)
MEMORY BANDWIDTH FOR DIFFERENT ARCHITECTURES
Mem8
Low-resolution
NWH X 8-BIT
Mem28
NWH X 2-BIT
Mem8pre
NWH X 2-BIT
High-resolution
WH
x 8-bit
4
WH
N
x 8-bit
4
WH
N
x 8-bit
4
The memories for the search area data consume non-trivial hardware cost and power
dissipation. For instance, the memory modules of the 2-D design occupy almost 50% die
area. In fact, the memory organization is one critical issue for the system design, especially in
H.264 which applies multiple reference frames. In this section, one memory mapping
algorithm is proposed to reduce the memory partition number in our architecture.
Consequently, the hardware cost and the power consumption are both optimized.
For example, he search width is 48 and the search height is 32. So, the search area is
6347 pixels. One pixel is extended in both vertical and horizontal directions. Thus, the
memory capacity for the search area is 6448 pixels. When 16 PEGs are configured, the
memory is divided into 4 logic partitions, which are labeled as L0, L1, L2 and L3, as
illustrated in Fig. 6(a). Each solid line rectangle represents one 16-pixel wide and 48-pixel
high logic partition. The ME processing includes three stages. In the first stage, the area
covered by slash pattern is searched, so L0 and L1 are active. In the second stage, the sub
search area is moved 16-pixel in horizon. The rectangle with backslash pattern includes these
searching candidates. L1 and L2 are active in this stage. In the last phase, L2 and L3
are used and the rectangle filled with dot represents this sub search area. The intuitive
approach is implementing these logic partitions with four memory modules. Each module is
48W128b. But this method causes low memory IO utilization, which is just 48%. When the
search width is increased, the memory utilization becomes even worse.[3]
Structure of the PE
The block diagram of the PE is shown in Fig.2. Each PE consists of an ALU, local
memory and three registers. The ALU takes charge of the calculation and the memory data
recording and I/O. Two registers, called A register and B register, read data from the memory
and the ALU performs an operation on the data. Then the result is once fetched by the Z
register and is written into the memory again. This process is defined as a single cycle and by
performing several cycles you can process various kinds of algorithms.[21]
The block diagram of the ALU is shown in Fig.3. The ALU can process one of 10
logical and 8 arithmetic operations at one time. They are all binary operations and multi-bits
operations are processed by repeating single operations serially.
The local memory has 5-bit address space and consists of a 24-bit RAM, and an 8-bit
memory-mapped I/O connected to a sensor, an output circuit, four-neighboring PEs and
ground. Each bit can be randomly accessed. The address map is shown in Table1.[21]
The function of the ALU and the size of the memory are proved to be enough for most
early visual processing algorithms which are often used in visual applications.
The sum of absolute differences may be used for a variety of purposes, such as object
recognition, the generation of disparity maps for stereo images, and motion estimation for
video compression.
An algorithm used for video compression that adds up the absolute differences
between corresponding elements in the macro blocks of video frames. When coding or
compressing video, the similarities between video frames could be used to the best advantage
to achieve better compression ratios. Using the usual coding techniques on moving objects
within a video scene diminish the compression efficiency because they only consider the
pixels located at the same position in the video frames. Motion estimation and the SAD
algorithm are used to capture such movements more accurately for better compression
efficiency.[23]
4. A comparator is designed to produce well limited output voltages that easily interface
with digital logic. Compatibility with digital logic must be verified while using an opamp as a comparator.
Dedicated voltage comparator chips:
distributed signal across 8 ports matches the voltage and current gain after each amplifier, and
resistors then behave as level-shifters.
The LM339 accomplishes this with an open collector output. When the inverting
input is at a higher voltage than the non inverting input, the output of the comparator
connects to the negative power supply. When the non inverting input is higher than the
inverting input, the output is 'floating' (has a very high impedance to ground).
Output type:
output does not need a pull up resistor and can also source current unlike an open drain
output.
Input voltage range:
The input voltages must stay the limits specified by the manufacturer. Early integrated
comparators, like the LM111 family, and certain high-speed comparators like the LM119
family, require input voltage ranges substantially lower than the power supply voltages (15
V vs. 36V).[1] Rail-to-rail comparators allow any input voltages within the power supply
range. When powered from a bipolar (dual rail) supply,
VS - V+,V- VS +
or, when powered from a uni-polar TTL/CMOS power supply:
0 V+,V- Vcc
Specific rail-to-rail comparators with p-n-p input transistors, like the LM139 family,
allow input potential to drop 0.3 Volts below the negative supply rail, but do not allow it to
rise above the positive rail.[2] Specific ultra-fast comparators, like the LMH7322, allow input
signal to swing below the negative rail and above the positive rail, although by a narrow
margin of only 0.2V.[3] Differential input voltage (the voltage between two inputs) of a
modern rail-to-rail comparator is usually limited only by the full swing of power supply.
7.H.264/AVC
7.1. Overview of H.264:
inverse transformed (T-1) to produce a difference macro block Dn. This is not identical to the
original difference macro block Dn; the quantization process introduces losses and so Dn is a
distorted version of Dn.
The prediction macro block P is added to Dn to create a reconstructed macro block
uFn (a distorted version of the original macro block). A filter is applied to reduce the effects
of blocking distortion and reconstructed reference frame is created from a series of macro
blocks Fn.[13]
7.5. Decoder:
The decoder receives a compressed bit stream from the NAL. The data elements are
entropy decoded and reordered to produce a set of quantized coefficients X. these are rescaled
and inverse transformed to give Dn (this identical to the Dn shown in the encoded). Using the
header information decoded from the bit stream, the decoder creates a prediction macro
block P, identical to the original prediction P formed in the encoder, P is added to Dn to
produce uFn which this is filtered to create the decoded macro block Fn.
It should be clear the figures and from the discussion above that the purpose of the
reconstruction path in the encoder is ensure that both encoder and decoder use identical
reference frames to create the prediction P. If this is not the case, then the predictions P in
encoder and decoder will not be identical, leading to an increasing error or drift between
the encoder and decoder.
The latest video compression standard, H.264 (also known as MPEG-4 part 10/AVC
for advanced video coding), is expected to become the video standard of choice in the coming
years.
H.264 is an open, licensed standard that supports the most efficient video compression
techniques available today. Without compromising the image quality, an H.264 encoder can
reduce the size of a digital video file by more than 80% compared with the motion JPEG
format and as much as 50%more than with the MPEG-4 part-2 standard. This means that
much less network bandwidth and storage space are required for a video file. Or seen another
way, much higher video quality can be achieved for a given bit rate.[13]
Jointly defined by standardization organizations in the telecommunications and IT
industries, H.264 is expected to be more widely adapted than previous standards.
H.264 has already introduced in new electronic gadgets such as mobile phones and
digital video players, and has gained fast acceptance by end users. Service providers such as
online video storage and telecommunication companies are also beginning to adopt H264.
In the video surveillance industry, H264 will most likely find the quickest traction in
applications where there are demands for high frame rates and high resolution, such as in the
surveillance of highways, airports and casinos, where the use of 30/25 (NTSC/PAL) frames
per second is the norm. this is where the economies of reduced bandwidth and storage needs
will deliver the biggest savings.[13]
H.264 is also expected to accelerate the adoption of megapixel cameras since the
highly efficient compression technology can reduce the large file sizes and bit rates generated
without compromising the image quality. There are tradeoffs, however, while H.264 provides
savings in network bandwidth and storage costs, it will require higher performance network
cameras and monitoring stations.
The graph below provides a bit rate comparison, given the same level of image
quality, among the following video standards: motion JPEG, MPEG-4 Part-2 (no motion
compensation), MPEG-4 Patr-2 (with motion compensation) and H.264 (baseline profile).
Below figure shows an H.264 encoder generated up to 50% fewer bits per second for
a sample video sequence than an MPEG-4 encoder with no motion compensation and at least
six times more than motion JPEG.
Figure-2. A typical sequence with I, B, and P-frames. A P-frame may only reference
preceding I- or P-frames, while a B-frame may reference both preceding and succeeding I- or
P-frames.
When a video decoder restores a video by decoding the bit stream frame by frame,
decoding must always start with an I-frame. P-frames and B-frames, if used, must be decoded
together with the reference frame(s).
In the H.264 baseline profile, only I-and P-frames are used. This profile is ideal for
network cameras and video encoders since low latency is achieved because B-frames are not
used.
Figure-3. With motion JPEG format, the three images in the above sequence are coded and
sent as separate unique images (I-frames) with no dependencies on each other
Figure-4. With difference coding (used in most video compression standards including
H.264), only the first image (I-frame) is coded in its entirely. In the two following images (Pframes), references are made to the first picture for the static elements i.e., the house and the
moving parts, i.e., the running man, is coded using motion vectors, thus reducing the amount
of information that is sent and stored.
The amount of encoding can be further reduced if detection and encoding of
differences is based on blocks of pixels (macro blocks) rather than individual pixels;
therefore, bigger area are compared and only blocks that are significantly different are
decoded. The overhead associated with indicating the location of areas to be changed is also
reduced.
Difference coding, however, would not significantly reduce data if there was a lot of
motion in a video. Here, techniques such as block based motion compensation can be used.
Block based motion compensation takes into account that much of what makes up a new
frame in a video sequence can be found in an earlier frame, but perhaps in a difference
location. This technique divides a frame in to a series of macro blocks. Block by block, a new
frame-for instance, a P-frame can be composed or predicted by looking for a matching block
in a reference frame. If a match is found, the encoder simply codes the position where the
matching block is to be found in the reference frame. Coding the motion vector, as it is
called, takes up fewer bits than if the actual content of a block were to be coded.
The Digital Video Broadcast project (DVB) approved the use of H.264/AVC for
broadcast television in late 2004.
The Advanced Television Systems Committee (ATSC) standards body in the United
States approved the use of H.264/AVC for broadcast television in July 2008, although the
standard is not yet used for fixed ATSC broadcasts within the United States. It has also been
approved for use with the more recent ATSC-M/H (Mobile/Handheld) standard, using the
AVC and SVC portions of H.264.
AVCHD is a high-definition recording format designed by Sony and Panasonic that uses
H.264 (conforming to H.264 while adding additional application-specific features and
constraints).
AVC-Intra is an intraframe-only compression format, developed by Panasonic.
The CCTV (Close Circuit TV) or Video Surveillance market has included the technology
in many products. Prior to this technology, the compression formats used within the industry's
DVRs Digital Video Recorders were generally low qualities in compression capability. With
the application of the H.264 compression technology to the video surveillance industry, the
quality of the video recordings became substantially improved. Starting in 2008, some in the
surveillance industry promoted the H.264 technology as synonymous with "high quality"
video.
allows modest improvements in bit rate and quality in most scenes. But in certain types of
scenes, such as those with repetitive motion or back-and-forth scene cuts or uncovered
background areas, it allows a significant reduction in bit rate while maintaining clarity.[13]
Variable block-size motion compensation (VBSMC) with block sizes as large as 1616
and as small as 44, enabling precise segmentation of moving regions. The supported luma
prediction block sizes include 1616, 168, 816, 88, 84, 48, and 44, many of which
can be used together in a single macro block. Chroma prediction block sizes are
correspondingly smaller according to the chroma sub sampling in use.
The ability to use multiple motion vectors per macro block (one or two per partition) with
a maximum of 32 in the case of a B macro block constructed of 16 44 partitions. The
motion vectors for each 88 or larger partition region can point to different reference
pictures.
The ability to use any macro blocks type in B-frames, including I-macro blocks, resulting
in much more efficient encoding when using B-frames. [13]
MPEG-4 ASP.
Six-tap filtering for derivation of half-peal lima sample predictions, for sharper sub pixel
motion-compensation. Quarter-pixel motion is derived by linear interpolation of the helpful
values, to save processing power.
Quarter-pixel precision for motion compensation, enabling precise description of the
displacements of moving areas. For chroma the resolution is typically halved both vertically
and horizontally. Therefore the motion compensation of chroma uses one-eighth chroma pixel
grid units.
Weighted prediction, allowing an encoder to specify the use of a scaling and offset when
performing motion compensation, and providing a significant benefit in performance in
special casessuch as fade-to-black, fade-in, and cross-fade transitions. This includes
implicit weighted prediction for B-frames, and explicit weighted prediction for P-frames.
Spatial prediction from the edges of neighboring blocks for "intra" coding, rather than the
"DC"-only prediction found in MPEG-2 Part 2 and the transform coefficient prediction found
in H.263v2 and MPEG-4 Part 2. This includes luma prediction block sizes of 1616, 88,
and 44 (of which only one type can be used within each macro block).[13]
Lossless macro block coding features including:
A lossless "PCM macro block" representation mode in which video data samples are
represented directly,[16] allowing perfect representation of specific regions and allowing a
strict limit to be placed on the quantity of coded data for each macro block.
An enhanced lossless macro block representation mode allowing perfect
representation of specific regions while ordinarily using substantially fewer bits than the
PCM mode.
Flexible interlaced-scan video coding features, including:
Macro block-adaptive frame-field (MBAFF) coding, using a macro block pair
structure for pictures coded as frames, allowing 1616 macro blocks in field mode (compared
with MPEG-2, where field mode processing in a picture that is coded as a frame results in the
processing of 168 half-macro blocks).
Picture-adaptive frame-field coding (PAFF or PicAFF) allowing a freely-selected
mixture of pictures coded either as progressive frames where both fields are combined or as
individual single fields.
New transform design features, including:
An exact-match integer 44 spatial block transform, allowing precise placement of
residual signals with little of the "ringing" often found with prior codec designs. This is
conceptually similar to the well-known DCT design, but simplified and made to provide
exactly-specified decoding.
An exact-match integer 88 spatial block transform, allowing highly correlated
regions to be compressed more efficiently than with the 44 transform. This is conceptually
similar to the well-known DCT design, but simplified and made to provide exactly-specified
decoding.
Adaptive encoder selection between the 44 and 88 transform block sizes for the
integer transform operation.
A secondary Handmaid transform performed on "DC" coefficients of the primary
spatial transform applied to chroma DC coefficients (and also luma in one special case) to
obtain even more compression in smooth regions.
A quantization design including:
Logarithmic step size control for easier bit rate management by encoders and
simplified inverse-quantization scaling
Frequency-customized quantization scaling matrices selected by the encoder for
perceptual-based quantization optimization
An in-loop deblocking filter that helps prevent the blocking artifacts common to other
DCT-based image compression techniques, resulting in better visual appearance and
compression efficiency
An entropy coding design including:
Context-adaptive binary arithmetic coding (CABAC), an algorithm to losslessly
compress syntax elements in the video stream knowing the probabilities of syntax elements in
a given context. CABAC compresses data more efficiently than CAVLC but requires
considerably more processing to decode.
Context-adaptive variable-length coding (CAVLC), which is a lower-complexity
alternative to CABAC for the coding of quantized transform coefficient values. Although
lower complexity than CABAC, CAVLC is more elaborate and more efficient than the
methods typically used to code coefficients in other prior designs.
A common simple and highly structured variable length coding (VLC) technique for
many of the syntax elements not coded by CABAC or CAVLC, referred to as ExponentialGolomb coding (or Exp-Golomb).
Loss resilience features including:
A Network Abstraction Layer (NAL) definition allowing the same video syntax to be
used in many network environments. One very fundamental design concept of H.264 is to
generate self contained packets, to remove the header duplication as in MPEG-4's Header
Extension Code (HEC). This was achieved by decoupling information relevant to more than
one slice from the media stream. The combination of the higher-level parameters is called a
parameter set. The H.264 specification includes two types of parameter sets: Sequence
Parameter Set (SPS) and Picture Parameter Set (PPS). An active sequence parameter set
remains unchanged throughout a coded video sequence, and an active picture parameter set
remains unchanged within a coded picture. The sequence and picture parameter set structures
contain information such as picture size, optional coding modes employed, and macroblock
to slice group map.
Flexible macro block ordering (FMO), also known as slice groups, and arbitrary slice
ordering (ASO), which are techniques for restructuring the ordering of the representation of
the fundamental regions (macro blocks) in pictures. Typically considered an error/loss
robustness feature, FMO and ASO can also be used for other purposes.
Data partitioning (DP), a feature providing the ability to separate more important and
less important syntax elements into different packets of data, enabling the application of
unequal error protection (UEP) and other types of improvement of error/loss robustness.
Redundant slices (RS), an error/loss robustness feature allowing an encoder to send an
extra representation of a picture region (typically at lower fidelity) that can be used if the
primary representation is corrupted or lost.
Frame numbering, a feature that allows the creation of "sub-sequences", enabling
temporal scalability by optional inclusion of extra pictures between other pictures, and the
detection and concealment of losses of entire pictures, which can occur due to network packet
losses or channel errors.
Switching slices, called SP and SI slices, allowing an encoder to direct a decoder to jump
into an ongoing video stream for such purposes as video streaming bit rate switching and
"trick mode" operation. When a decoder jumps into the middle of a video stream using the
SP/SI feature, it can get an exact match to the decoded pictures at that location in the video
stream despite using different pictures, or no pictures at all, as references prior to the switch.
A simple automatic process for preventing the accidental emulation of start codes, which
are special sequences of bits in the coded data that allow random access into the bit stream
and recovery of byte alignment in systems that can lose byte synchronization.
Supplemental enhancement information (SEI) and video usability information (VUI),
which are extra information that can be inserted into the bit stream to enhance the use of the
video for a wide variety of purposes.
Auxiliary pictures, which can be used for such purposes as alpha compositing.
Support of monochrome, 4:2:0, 4:2:2, and 4:4:4 chroma sub sampling (depending on the
selected profile).
Support of sample bit depth precision ranging from 8 to 14 bits per sample (depending on
the selected profile).
The ability to encode individual color planes as distinct pictures with their own slice
structures, macro block modes, motion vectors, etc., allowing encoders to be designed with a
simple parallelization structure (supported only in the three 4:4:4-capable profiles).
Picture order count, a feature that serves to keep the ordering of the pictures and the
values of samples in the decoded pictures isolated from timing information, allowing timing
information to be carried and controlled/changed separately by a system without affecting
decoded picture content.
These techniques, along with several others, help H.264 to perform significantly better
than any prior standard under a wide variety of circumstances in a wide variety of application
environments. H.264 can often perform radically better than MPEG-2 videotypically
obtaining the same quality at half of the bit rate or less, especially on high bit rate and high
resolution situations.
Like other ISO/IEC MPEG video standards, H.264/AVC has a reference software
implementation that can be freely downloaded. Its main purpose is to give examples of
H.264/AVC features, rather than being a useful application per se. Some reference hardware
design work is also under way in the Moving Picture Experts Group. The above mentioned
are complete features of H.264/AVC covering all profiles of H.264. A profile for a codec is a
set of features of that codec identified to meet a certain set of specifications of intended
applications. This means that many of the features listed are not supported in some profiles.
Various profiles of H.264/AVC are discussed in next section.[13]
8. FPGA
Definitions of FPGA on the Web:
1. A field-programmable gate array (FPGA) is an integrated circuit designed to be
configured by the customer or designer after manufacturing--hence "fieldprogrammable".
...
2. This device is similar to the gate array, defined above, with the device shipped to the
user with general-purpose metallization pre-fabricated, often with variable length
segments.
3. Field-programmable gate array. A chip composed of an array of configurable logic
cells (also called logic blocks). Each cell can be configured, or programmed, to
perform one of a variety of simple functions, such as computing the logical AND of
two inputs. ...
particular function defined at the time of manufacture. In contrast, the FPGA's function is
defined by a program written by someone other than the device manufacturer. Depending on
the particular device, the program is either 'burned' in permanently or semi-permanently as
part of a board assembly process, or is loaded from an external memory each time the device
is powered up. This user programmability gives the user access to complex integrated
designs without the high engineering costs associated with application specific integrated
circuits.
IC Xilinx Spartan-3E FPGA, 1200K gates connectorsUSB2 Port, Hirose FX2, Four
12-pin p-mod connectors, VGA, PS/2, and serial programming Diligent USB2 port providing
board power, programming and data transfers.
The Nexys-2 is a powerful digital system design platform built around a Xilinx
Spartan 3E FPGA. With 16Mbytes of fast SDRAM and 16Mbytes of Flash ROM, the Nexys2 is ideally suited to embedded processors like Xilinx's 32-bit RISC Micro blaze. The onboard high-speed USB2 port, together with a collection of I/O devices, data ports, and
expansion connectors, allow a wide range of designs to be completed without the need for
any additional components.
Features:
1.
2.
USB2 port providing board power, device configuration, and high-speed data
transfers
3.
4.
5.
6.
7.
8.
9.
75 FPGA I/Os routed to expansion connectors (one high-speed Hirose FX2 connector
with 43 signals and four 2x6 P-mod connectors)
10.
All I/O signals are ESD and short-circuit protected, ensuring a long operating life in
any environment.
11.
12.
Specifications:
Description
VCCINT Level required to retain RAM data
VCCAUX Level required to retain RAM data
Min
1.0
2.0
Units
V
V
Description
Conditions
Inter supply voltage
Auxiliary Supply Voltage
Output driver supply voltage
Input reference Voltage
Voltage applied to all Driver in Commercial
user I/O pins and Dual
a highIndustrial
purpose pins
impedanc
e state
Voltage applied to all
All temp. ranges
dedicated pins
Input clamp current per -0.5V< VIN <( VCCO + 0.5V)
I/O pin
Electrostatic Discharge Human body model
Charged device model
Voltage
Machine model
Junction temperature
Storage temperature
Min
-0.5
-0.5
-0.5
-0.5
-0.95
-0.85
Max
1.32
3.00
3.75
VCCO +0.5(1)
4.4
4.3
Units
V
V
V
V
V
V
-0.5
VCCAUX+0.5(3)
mA
--
100
---------65
2000
500
200
125
150
V
V
V
0
C
0
C
Max
1.0
Units
V
Description
Threshold for the VCCINT supply
Min
0.4
VCCAUXT
VCCO2T
0.8
0.4
2.0
1.0
V
V
Steps to simulate and dump the program into an FPGA Spartan-3E Kit:
[1]
[2]
Click on FILE option on the tool bar and select a new project.
[5] After checking the syntax successfully, write the test bench program.
[6]
For test bench waveform, open the test bench program and check the syntax and
simulate
[7]
[8]
[9]
Then the program will be dumped Into FPGA Spartan-3E kit, and output will be
FLOW CHARTS
Flow chart for SAD Flow:
Buffering Current frame
Accumulate the
absolute differences
Compare for minimum SAD
Decide MV_ROW,
MV_COL
[1] Click on Xilinx-click on FILE option-new project (write the program)-save it-
Click on synthesize-XST and check the syntax
[2] In behavioral simulation, open test bench program- save it- Xilinx ISE simulator and
check the syntax- simulate behavioral model (then we can get the timing diagram)
[3] After simulating the behavioral model we can get the timing diagram as shown below
[6] After connecting the hardware to the CPU, firstly, we have to initialize the chain.
[8] Now run the programming chain, then the program will be dumped into FPGA Spartan3E KIT.
So the output will be displayed as indication of LEDs.
10. CONCLUSION
This paper has represented a method to reduce the computational cost and memory
access for VBSME using pixel truncation. VBSME is a new coding technique and provides
more accurate predictions compared to traditional fixed block size motion estimation. With
FBSME, if an MB consists of two objects with different motion directions, the coding
performance of this MB is worse. On the other hand, for the same condition, the MB can be
divided into smaller blocks in order to fit the different motion directions with VBSME. Hence
the coding performance is improved.
However, for motion prediction using a smaller block sizes, pixel truncation reduces the
motion prediction accuracy. So here we have proposed a two-step search to improve the
frame prediction using pixel truncation. Our method reduces the total computation and
memory accuracy compared to the conventional method without significantly degrading the
picture quality. The results theoretically show that the proposed architectures are able to save
up to 53%energy compared to the full search conventional full-search ME architecture, which
is equivalent to 40% energy saving over the conventional H.264 system. This makes such
architecture attractive for H.264 application in future mobile devices.
11. REFERENCES
[1
&
[2]
C-Y Chen, S-Y Chien,Y-W. Huang. T-C Chen. Wang, and L-G Chen, Analysis
andarchitecture design of variable block size motion estimation for H.264/AVC, IEEE Trans.
[3]
Z-L.Bahari, T.Arslan, and A.T.Eedogan, low computation and memory access for
variable block size motion estimation using pixel truncation, In proc. IEEE workshop signal
process. Syst. 2007 pp.681-685.
[4] Z-L, He, C-Y.Tsui, K-K. Chen, and M.L. Liou, low power VLSI design for motion
estimation using adaptive pixel truncation IEEE Trans, vol. 10, no. 5, pp. 669-678, aug
2000.
[5] Z-L.Bahari, T.Arslan, and A.T.Eedogan, low power hardware architecture for VBSME
using pixel truncation, In proc. IEEE workshop signal process. Syst. 2007 pp.681-685.
Hyderabad, andhrapradesh.
[6] B.Natrajan, V.Bhaskaran, and K.Constantitudes, Low complexity block based motion
estimation via one-bit transform, IEEE Trans, vol, 7, no.4, pp. 702-706.
Advanced video coding for generic audiovisual services,ITU-T Recommendation H.264 &
ISO/IEC 14496-10 (MPEG-4) AVC, 2005.
[7] A.Erturk and S.Erturk, two-bit transform for binary block motion estimation IEEE
Trans, Circuits Syst. Video Technol., vol. 15, no.7, pp. 938-946, Jul.2005.
[8] S.Lee, J.M. Kim, and S-I Chae, New motion estimation algorithm using adaptively
quantized low bit resolution image and its VLSI architecture for MPEG2 video encoding
IEEE Trans. Cirsuits syst. Viseo technol. Vol.8, no. 6, pp. 734-744. Oct 1998.
[9]
Y.Chan and S.Kung, Multi level pixel differece classification methods, in proc,
[10]
M-J. Chen, L-G. Chen, T-D.Chiueh, and Y-P. Lee, A new block matching criterion for
motion estimation and its implementation, IEEE Trans. Circuits Syst. Video Technol., vol. 5,
no.3 pp. 231-236.
[12]
S.Sharma, P.Mishra, S. Sawant, C.P. Mammen, and V.M. Gadre, Pre-decision strategy
for coded/non-coded MBs in MPEG4, in Proc, Lnt. Conf. Signal Process. Comman
2004(SPCOM), Bangalore, India 2004, pp.501-505.
[13]
T.C.Chen, Y-H.Chen, S-F. Tsai, S.Y.Chien, and L.G.Chen, Fast algorithm and
architecture design of low-power integer motion estimation for H.264/AVC, IEEE Trans.
Circuits Syst. Video Tchnol, vol. 17, no. 5, pp. 568-577, may 2007.
[14] M.Miyama, J.Miyakoshi, Y.Kurada, K.Lmamura, H.Hashimoto, and M.Yoshimoto, A
sub-mw MPEG4 motion estimation process core for mobile video applivation, IEEE Trans.
Cicuits Syst. Vol. 39, no. 9, pp. 1562-1570. Sep.2004.
[15]
[16]
14496- 2, 1999.
[17]
C.Lee, and J.Ribas-corbera, Windows media video 9: over view and applications, Signal
process: image commun., vol.19, pp.851-875. Sep.2004.
[18] Draft-ITU-T, Recommendations and fian draft international standard of joint video
specification, May 2003.
[19]
K.M.Ynag, M.T.Sun, and L.Wu, A family of VLSI designs for the motion
compensation block matching algorithm, IEEE Trans. Circuits Syst. Vol.36, no.10. pp. 13171325. Oct 1989.
[20]
T.Komarek and P. Pirsch, Array architectures for block matching algorithms, IEEE
Trans, Circuits Syat ., vol. 36, no. 10, pp. 1301-130 Oct 1989.
[21]
Y.Nakabo, M.I.shikava, High speed target tracking using 1ms, visual feedback system,
I.Ishi, Y.N.akobo, and M.Ishikawa; target tracking algorithm for 1ms visual feedback
system using Massively parallel processing, proc.int.conf. robotics and automation, pp. 23092314.
[23]
massively parallel processing proc. Int.conf. on intelligent robotics and systems. Pp. 373-377
(1992).