10 1 1 50

PAPER SUBMISSION TO IEE ELECTRONICS LETTERS
1. Title of paper
Efficient VLSI Architecture for Lossless Data Compression
2. Corresponding Author
Yongjoo Kim
Inter-University Semiconductor Research Center (ISRC)

Seoul National University
Seoul, South Korea 151-742
Phone: +82-2-880-5457
Fax: +82-2-887-6575
E-mail: yjkim@poppy.snu.ac.kr
Efficient VLSI Architecture for Lossless Data Compression
Indexing terms: lossless data compression, VLSI architecture

____________________________________________________________________________________
Abstract: An architecture for LZ1-type lossless data compression is described. The architecture is area-
efficient and fast since it exploits the locality of substring match lengths. The property has been shown
experimentally for various data and buffer lengths, and an architecture based on it has been designed.
Introduction: Data compression is the reduction of redundancy in data representation in order to reduce
the size of data to be transmitted or stored. LZ-type(LZ1 and LZ2) text substitutional methods are popular
for lossless data compression because of their speed, simplicity, and adaptability. They compress data by
replacing large blocks of text with shorter references to earlier occurrences of identical text.
Recently, several hardware architectures to accelerate LZ-type lossless data compression have been
proposed. They are systolic[1] or CAM(content addressable memory)-based[2].
In this Letter, we have designed an area-efficient and fast non-systolic VLSI architecture for LZ1-
type lossless data compression. Although the architecture is non-systolic, its performance is comparable to
or even better than existing systolic architectures. Such performance gain is primarily due to the
exploitation of a property which we call locality of substring match lengths. It has been experimentally
confirmed for various types of data.
LZ1-type data compression: The LZ1-type data compression[3] requires a buffer of length 2n where the
first half contains n symbols(x0, x1, ..., xn-1) that have been recently coded and the second half contains the
following n symbols(y0, y1, ..., yn-1) that are yet to be coded. We can consider yi to be equal to xn+i for 0 ≤ i
≤ n-2. Parsing, which is the core step of LZ1-type data compression, is performed as follows:
For each string comparison between y0y1 ... yn-2 and xpxp+1 ... xp+n-2, the substring match length, lp, is
obtained. From n resultant lp’s (0 ≤ p ≤ n-1), a maximum match length, lmax and its corresponding index,
pmax are found and encoded.
Locality of substring match lengths: This property, the basis of our architecture, is derived from our
observation that most lp’s are much shorter than maximum possible match length(n-1) during a session of
data compression. As a quantitative measure of the degree of locality of lp’s, we have defined the degree of
locality using following equation:
x n −1
DL ( x , n ) = ∑ f (l ) ∑ f (l )
p p (1)
lp = 0 lp = 0
where f(lp) is the count of each lp and 0 ≤ x ≤ n-1. DL(x,n) is, therefore, the ratio of the count of local
matching whose length is shorter than x+1 to the count of all matching. Fig. 1 is a simulation result
showing the property. Each point represents an average over various real data files(12 text files, 11 binary
files). From this experimental result, we could derive the following conclusion about the locality: (1) The
larger n and x are, the higher DL(x,n) is. (2) DL(x,n) is saturated for large n.
Architecture for LZ1-type data compression: Fig. 2 represents the overall architecture for the parsing
step. Match length encoder converts the output bit pattern(z0z1...zn-2) of string comparator into substring
match length(lp). A pulse generated after each match length encoding triggers encoding buffer to shift data
by one position, updates length register and index register to save the latest maximum match length and
the corresponding index(or pointer), and increments the index counter. The signal ‘done’ is raised when
the current execution of the parsing step is finished.
For the design of an area-efficient and fast match length encoder, the heart of our architecture, we
exploited the locality of substring match lengths described above. The main idea is as follows: (1) For lp
≤ x, which is true in most cases, lp is computed using a small-sized combinational circuit(small length
encoder) in only one clock cycle. (2) For lp > x, which occurs in very rare cases due to the locality of lp, lp
is obtained using a simple sequential circuit in (lp - x + 1) clock cycles.
Fig. 3 depicts the internal structure of the match length encoder. For lp > x, a sequential circuit, which
consists of a controller FSM, a shift register and a counter, is enabled and additional ones are counted
from the x-th bit of the bit pattern(z0z1...zn-2) until the first zero is encountered. This process takes (lp - x +
1) cycles. But the portion of the clock cycles taken by this process among the total number of clock cycles
taken by the whole parsing step is very small due to the locality of substring match lengths. So this causes
only negligible overhead in the overall performance of data compression. If we did not use this scheme
and used a combinational logic to implement the whole block, the combinational logic would be much
bigger and would require much longer clock cycle time.

Comparison to existing architectures: We can implement the algorithm with larger buffer size than
existing systolic architecture[1] in the same area because we do not use any area-consuming systolic
scheme but balance combinational logic for small match lengths with sequential logic for large match
lengths. As for the compression speed, the systolic architecture[1] and our architecture takes 3n cycles and
n + ∆n cycles on the average(∆n<< n), respectively, in obtaining each maxlength and its corresponding
index. Thus the speedup of ours over systolic is 3n/(n + ∆n) ≈ 3. From the experiment using the same data
files and parameters(x and n) as for the degree of locality, the minimum and maximum speedup was 2.86
and 2.96, respectively. Table 1 summarises the detailed comparison between our architecture and the
existing architectures(systolic[1] and CAM-based[2]) for LZ-1 type data compression. In area comparison
between our and systolic architecture, we omitted the area for data and encoding buffers because they
occupy same area in both architectures.
Conclusions: In this paper, we designed a data compression architecture for LZ-1 type data compression
which takes much less area and clock cycles than existing architectures. This gain is the result of
exploitation of locality in the match lengths of string comparison.
References
[1] Ranganathan, N. and Henriques, S.: ‘High-speed VLSI designs for Lempel-Ziv-based data
compression’, IEEE Transactions on Circuits and Systems-II, 1993, 40, (2), pp. 96-106
[2] Nusinov, E. and Pasco-Anderson, J.: ‘High performance multi-channel data compression chip’,
Proceedings of IEEE 1994 Custom Integrated Circuits Conference, 1994, pp. 203-206
[3] Ziv, J. and Lempel, A.: ‘A universal algorithm for sequential data compression’, IEEE Transactions on
Information Theory, 1977, IT-23, (3), pp. 337-343
Yong-Joo Kim
Kyu-Seok Kim
Ki-Young Choi
Department of Electronics Engineering
Seoul National University, Seoul, South Korea 151-742
List of captions to figures and tables
Fig. 1 Average DL(x,n) and speedup for real input data files
Fig. 2 Overall architecture (n = 4)
Fig. 3 Match length encoder
Table 1 Area and speed comparison with existing architectures

100
degree of locality(DL(x,n)), % x = 15
x =7
99.9
99.8
x =3
99.7
99.6
99.5
0 1000 2000 3000 4000 5000
buffer length(n), bytes
Fig. 1 Average DL(x,n) for real input data files
initialize
clock
data data
buffer x0 x1 x2 x3 y0 y1 y2 data clock
z0
=
z1
match length
string =
z2 length
comparator = encoder
load start pulse
encoding
buffer x0 x1 x2 x3 y0 y1 y2
index counter
R
R maxlength
index mux D Q
done & pointer
length & index
length
registers
>
Fig. 2 Overall architecture (n = 4)

z0 w0
z1 w1
.
small
.
. length .
zx-1
.
encoder . wm-1
zx
mux length
large_enable select
start load
counter
pulse controller FSM enable
clock
load enable =’
=’ shift register
bit
clock
zx zx+1 .... zn-3 zn-2
Fig. 3 Match length encoder
Table 1 Area and speed comparison with existing architectures
our systolic [2] CAM-based

[3]
no. of comparators:
character(=) n-1 n
magnitude(>) 1 n
no. of counters 2 n
no. of multiplexers 3 2n
no. of 1 / 1 /1 n/n/0
length/index/shift
registers
amount of comb. logic O(1) O(n)
no. of clock cycles for n + ∆n 3n 2n + 1
each maximum length (∆n << n )

10 1 1 50

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1 1 50

Uploaded by

Copyright:

Available Formats

PAPER SUBMISSION TO IEE ELECTRONICS LETTERS

Efficient VLSI Architecture for Lossless Data Compression

Inter-University Semiconductor Research Center (ISRC)

Indexing terms: lossless data compression, VLSI architecture

proposed. They are systolic[1] or CAM(content addressable memory)-based[2].

confirmed for various types of data.

pmax are found and encoded.

locality using following equation:

the current execution of the parsing step is finished.

is obtained using a simple sequential circuit in (lp - x + 1) clock cycles.

bigger and would require much longer clock cycle time.

occupy same area in both architectures.

exploitation of locality in the match lengths of string comparison.

Information Theory, 1977, IT-23, (3), pp. 337-343

Fig. 2 Overall architecture (n = 4)

Fig. 3 Match length encoder

Table 1 Area and speed comparison with existing architectures

buffer length(n), bytes

Fig. 1 Average DL(x,n) for real input data files

Fig. 2 Overall architecture (n = 4)

Fig. 3 Match length encoder

Table 1 Area and speed comparison with existing architectures

our systolic [2] CAM-based

You might also like