Professional Documents
Culture Documents
1. Title of paper
2. Corresponding Author
Yongjoo Kim
Phone: +82-2-880-5457
Fax: +82-2-887-6575
E-mail: yjkim@poppy.snu.ac.kr
Efficient VLSI Architecture for Lossless Data Compression
Abstract: An architecture for LZ1-type lossless data compression is described. The architecture is area-
efficient and fast since it exploits the locality of substring match lengths. The property has been shown
experimentally for various data and buffer lengths, and an architecture based on it has been designed.
Introduction: Data compression is the reduction of redundancy in data representation in order to reduce
the size of data to be transmitted or stored. LZ-type(LZ1 and LZ2) text substitutional methods are popular
for lossless data compression because of their speed, simplicity, and adaptability. They compress data by
replacing large blocks of text with shorter references to earlier occurrences of identical text.
Recently, several hardware architectures to accelerate LZ-type lossless data compression have been
In this Letter, we have designed an area-efficient and fast non-systolic VLSI architecture for LZ1-
type lossless data compression. Although the architecture is non-systolic, its performance is comparable to
or even better than existing systolic architectures. Such performance gain is primarily due to the
exploitation of a property which we call locality of substring match lengths. It has been experimentally
LZ1-type data compression: The LZ1-type data compression[3] requires a buffer of length 2n where the
first half contains n symbols(x0, x1, ..., xn-1) that have been recently coded and the second half contains the
following n symbols(y0, y1, ..., yn-1) that are yet to be coded. We can consider yi to be equal to xn+i for 0 ≤ i
≤ n-2. Parsing, which is the core step of LZ1-type data compression, is performed as follows:
For each string comparison between y0y1 ... yn-2 and xpxp+1 ... xp+n-2, the substring match length, lp, is
obtained. From n resultant lp’s (0 ≤ p ≤ n-1), a maximum match length, lmax and its corresponding index,
Locality of substring match lengths: This property, the basis of our architecture, is derived from our
observation that most lp’s are much shorter than maximum possible match length(n-1) during a session of
data compression. As a quantitative measure of the degree of locality of lp’s, we have defined the degree of
x n −1
DL ( x , n ) = ∑ f (l ) ∑ f (l )
p p (1)
lp = 0 lp = 0
where f(lp) is the count of each lp and 0 ≤ x ≤ n-1. DL(x,n) is, therefore, the ratio of the count of local
matching whose length is shorter than x+1 to the count of all matching. Fig. 1 is a simulation result
showing the property. Each point represents an average over various real data files(12 text files, 11 binary
files). From this experimental result, we could derive the following conclusion about the locality: (1) The
larger n and x are, the higher DL(x,n) is. (2) DL(x,n) is saturated for large n.
Architecture for LZ1-type data compression: Fig. 2 represents the overall architecture for the parsing
step. Match length encoder converts the output bit pattern(z0z1...zn-2) of string comparator into substring
match length(lp). A pulse generated after each match length encoding triggers encoding buffer to shift data
by one position, updates length register and index register to save the latest maximum match length and
the corresponding index(or pointer), and increments the index counter. The signal ‘done’ is raised when
For the design of an area-efficient and fast match length encoder, the heart of our architecture, we
exploited the locality of substring match lengths described above. The main idea is as follows: (1) For lp
≤ x, which is true in most cases, lp is computed using a small-sized combinational circuit(small length
encoder) in only one clock cycle. (2) For lp > x, which occurs in very rare cases due to the locality of lp, lp
Fig. 3 depicts the internal structure of the match length encoder. For lp > x, a sequential circuit, which
consists of a controller FSM, a shift register and a counter, is enabled and additional ones are counted
from the x-th bit of the bit pattern(z0z1...zn-2) until the first zero is encountered. This process takes (lp - x +
1) cycles. But the portion of the clock cycles taken by this process among the total number of clock cycles
taken by the whole parsing step is very small due to the locality of substring match lengths. So this causes
only negligible overhead in the overall performance of data compression. If we did not use this scheme
and used a combinational logic to implement the whole block, the combinational logic would be much
existing systolic architecture[1] in the same area because we do not use any area-consuming systolic
scheme but balance combinational logic for small match lengths with sequential logic for large match
lengths. As for the compression speed, the systolic architecture[1] and our architecture takes 3n cycles and
n + ∆n cycles on the average(∆n<< n), respectively, in obtaining each maxlength and its corresponding
index. Thus the speedup of ours over systolic is 3n/(n + ∆n) ≈ 3. From the experiment using the same data
files and parameters(x and n) as for the degree of locality, the minimum and maximum speedup was 2.86
and 2.96, respectively. Table 1 summarises the detailed comparison between our architecture and the
existing architectures(systolic[1] and CAM-based[2]) for LZ-1 type data compression. In area comparison
between our and systolic architecture, we omitted the area for data and encoding buffers because they
Conclusions: In this paper, we designed a data compression architecture for LZ-1 type data compression
which takes much less area and clock cycles than existing architectures. This gain is the result of
References
[1] Ranganathan, N. and Henriques, S.: ‘High-speed VLSI designs for Lempel-Ziv-based data
compression’, IEEE Transactions on Circuits and Systems-II, 1993, 40, (2), pp. 96-106
[2] Nusinov, E. and Pasco-Anderson, J.: ‘High performance multi-channel data compression chip’,
Proceedings of IEEE 1994 Custom Integrated Circuits Conference, 1994, pp. 203-206
[3] Ziv, J. and Lempel, A.: ‘A universal algorithm for sequential data compression’, IEEE Transactions on
Yong-Joo Kim
Kyu-Seok Kim
Ki-Young Choi
Department of Electronics Engineering
Seoul National University, Seoul, South Korea 151-742
List of captions to figures and tables
Fig. 1 Average DL(x,n) and speedup for real input data files
99.8
x =3
99.7
99.6
99.5
0 1000 2000 3000 4000 5000
initialize
clock
data data
buffer x0 x1 x2 x3 y0 y1 y2 data clock
z0
=
z1
match length
string =
z2 length
comparator = encoder
load start pulse
encoding
buffer x0 x1 x2 x3 y0 y1 y2
index counter
R
R maxlength
index mux D Q
done & pointer
length & index
length
registers
>
mux length
large_enable select
start load
counter
pulse controller FSM enable
clock
load enable =’
=’ shift register
bit
clock
zx zx+1 .... zn-3 zn-2