You are on page 1of 7

PAPER SUBMISSION TO IEE ELECTRONICS LETTERS

1. Title of paper

Efficient VLSI Architecture for Lossless Data Compression

2. Corresponding Author

Yongjoo Kim

Inter-University Semiconductor Research Center (ISRC)


Seoul National University
Seoul, South Korea 151-742

Phone: +82-2-880-5457
Fax: +82-2-887-6575
E-mail: yjkim@poppy.snu.ac.kr
Efficient VLSI Architecture for Lossless Data Compression

Indexing terms: lossless data compression, VLSI architecture


____________________________________________________________________________________

Abstract: An architecture for LZ1-type lossless data compression is described. The architecture is area-

efficient and fast since it exploits the locality of substring match lengths. The property has been shown

experimentally for various data and buffer lengths, and an architecture based on it has been designed.

Introduction: Data compression is the reduction of redundancy in data representation in order to reduce

the size of data to be transmitted or stored. LZ-type(LZ1 and LZ2) text substitutional methods are popular

for lossless data compression because of their speed, simplicity, and adaptability. They compress data by

replacing large blocks of text with shorter references to earlier occurrences of identical text.

Recently, several hardware architectures to accelerate LZ-type lossless data compression have been

proposed. They are systolic[1] or CAM(content addressable memory)-based[2].

In this Letter, we have designed an area-efficient and fast non-systolic VLSI architecture for LZ1-

type lossless data compression. Although the architecture is non-systolic, its performance is comparable to

or even better than existing systolic architectures. Such performance gain is primarily due to the

exploitation of a property which we call locality of substring match lengths. It has been experimentally

confirmed for various types of data.

LZ1-type data compression: The LZ1-type data compression[3] requires a buffer of length 2n where the

first half contains n symbols(x0, x1, ..., xn-1) that have been recently coded and the second half contains the

following n symbols(y0, y1, ..., yn-1) that are yet to be coded. We can consider yi to be equal to xn+i for 0 ≤ i

≤ n-2. Parsing, which is the core step of LZ1-type data compression, is performed as follows:

For each string comparison between y0y1 ... yn-2 and xpxp+1 ... xp+n-2, the substring match length, lp, is

obtained. From n resultant lp’s (0 ≤ p ≤ n-1), a maximum match length, lmax and its corresponding index,

pmax are found and encoded.

Locality of substring match lengths: This property, the basis of our architecture, is derived from our

observation that most lp’s are much shorter than maximum possible match length(n-1) during a session of
data compression. As a quantitative measure of the degree of locality of lp’s, we have defined the degree of

locality using following equation:

x n −1
DL ( x , n ) = ∑ f (l ) ∑ f (l )
p p (1)
lp = 0 lp = 0

where f(lp) is the count of each lp and 0 ≤ x ≤ n-1. DL(x,n) is, therefore, the ratio of the count of local

matching whose length is shorter than x+1 to the count of all matching. Fig. 1 is a simulation result

showing the property. Each point represents an average over various real data files(12 text files, 11 binary

files). From this experimental result, we could derive the following conclusion about the locality: (1) The

larger n and x are, the higher DL(x,n) is. (2) DL(x,n) is saturated for large n.

Architecture for LZ1-type data compression: Fig. 2 represents the overall architecture for the parsing

step. Match length encoder converts the output bit pattern(z0z1...zn-2) of string comparator into substring

match length(lp). A pulse generated after each match length encoding triggers encoding buffer to shift data

by one position, updates length register and index register to save the latest maximum match length and

the corresponding index(or pointer), and increments the index counter. The signal ‘done’ is raised when

the current execution of the parsing step is finished.

For the design of an area-efficient and fast match length encoder, the heart of our architecture, we

exploited the locality of substring match lengths described above. The main idea is as follows: (1) For lp

≤ x, which is true in most cases, lp is computed using a small-sized combinational circuit(small length

encoder) in only one clock cycle. (2) For lp > x, which occurs in very rare cases due to the locality of lp, lp

is obtained using a simple sequential circuit in (lp - x + 1) clock cycles.

Fig. 3 depicts the internal structure of the match length encoder. For lp > x, a sequential circuit, which

consists of a controller FSM, a shift register and a counter, is enabled and additional ones are counted

from the x-th bit of the bit pattern(z0z1...zn-2) until the first zero is encountered. This process takes (lp - x +

1) cycles. But the portion of the clock cycles taken by this process among the total number of clock cycles

taken by the whole parsing step is very small due to the locality of substring match lengths. So this causes

only negligible overhead in the overall performance of data compression. If we did not use this scheme

and used a combinational logic to implement the whole block, the combinational logic would be much

bigger and would require much longer clock cycle time.


Comparison to existing architectures: We can implement the algorithm with larger buffer size than

existing systolic architecture[1] in the same area because we do not use any area-consuming systolic

scheme but balance combinational logic for small match lengths with sequential logic for large match

lengths. As for the compression speed, the systolic architecture[1] and our architecture takes 3n cycles and

n + ∆n cycles on the average(∆n<< n), respectively, in obtaining each maxlength and its corresponding

index. Thus the speedup of ours over systolic is 3n/(n + ∆n) ≈ 3. From the experiment using the same data

files and parameters(x and n) as for the degree of locality, the minimum and maximum speedup was 2.86

and 2.96, respectively. Table 1 summarises the detailed comparison between our architecture and the

existing architectures(systolic[1] and CAM-based[2]) for LZ-1 type data compression. In area comparison

between our and systolic architecture, we omitted the area for data and encoding buffers because they

occupy same area in both architectures.

Conclusions: In this paper, we designed a data compression architecture for LZ-1 type data compression

which takes much less area and clock cycles than existing architectures. This gain is the result of

exploitation of locality in the match lengths of string comparison.

References

[1] Ranganathan, N. and Henriques, S.: ‘High-speed VLSI designs for Lempel-Ziv-based data

compression’, IEEE Transactions on Circuits and Systems-II, 1993, 40, (2), pp. 96-106

[2] Nusinov, E. and Pasco-Anderson, J.: ‘High performance multi-channel data compression chip’,

Proceedings of IEEE 1994 Custom Integrated Circuits Conference, 1994, pp. 203-206

[3] Ziv, J. and Lempel, A.: ‘A universal algorithm for sequential data compression’, IEEE Transactions on

Information Theory, 1977, IT-23, (3), pp. 337-343

Yong-Joo Kim
Kyu-Seok Kim
Ki-Young Choi
Department of Electronics Engineering
Seoul National University, Seoul, South Korea 151-742
List of captions to figures and tables

Fig. 1 Average DL(x,n) and speedup for real input data files

Fig. 2 Overall architecture (n = 4)

Fig. 3 Match length encoder

Table 1 Area and speed comparison with existing architectures


100
degree of locality(DL(x,n)), % x = 15
x =7
99.9

99.8
x =3

99.7

99.6

99.5
0 1000 2000 3000 4000 5000

buffer length(n), bytes

Fig. 1 Average DL(x,n) for real input data files

initialize

clock
data data
buffer x0 x1 x2 x3 y0 y1 y2 data clock

z0
=
z1
match length
string =
z2 length
comparator = encoder
load start pulse
encoding
buffer x0 x1 x2 x3 y0 y1 y2

index counter
R

R maxlength
index mux D Q
done & pointer
length & index
length
registers
>

Fig. 2 Overall architecture (n = 4)


z0 w0
z1 w1
.
small
.
. length .
zx-1
.
encoder . wm-1
zx

mux length
large_enable select
start load

counter
pulse controller FSM enable
clock
load enable =’
=’ shift register
bit

clock
zx zx+1 .... zn-3 zn-2

Fig. 3 Match length encoder

Table 1 Area and speed comparison with existing architectures

our systolic [2] CAM-based


[3]
no. of comparators:
character(=) n-1 n
magnitude(>) 1 n
no. of counters 2 n
no. of multiplexers 3 2n
no. of 1 / 1 /1 n/n/0
length/index/shift
registers
amount of comb. logic O(1) O(n)
no. of clock cycles for n + ∆n 3n 2n + 1
each maximum length (∆n << n )

You might also like