You are on page 1of 6

2008 IEEE Conference on Soft Computing in Industrial Applications (SMCia/08), June 25-27, 2008, Muroran, JAPAN

Image compression Using Wavelet Transform and Vector


Quantization with Variable block Size
Osamu Yamanaka, Student Member, IEEE, Tsuyoshi Yamaguchi, Junji Maeda, Member, IEEE,
and Yukinori Suzuki, Member, IEEE

Abstract— We introduced discrete wavelet transform (DWT) to construct a code book (CB), decoding requires only lookup
to vector quantization (VQ) for image compression. DWT is from the CB once a CB has been constructed. Therefore,
multi-resolution analysis, and a signal energy concentrates to computational cost for decoding is remarkably smaller than
specific DWT coefficients. This characteristics is useful for
image compression. DWT coefficients are compressed using that needed to construct a CB. Furthermore, image quality
VQ with variable block size. To perform effective compression, by a VQ is higher than that of JPEG in the region of high
blocks are merged by the algorithm proposed in this paper. compression ratio. These facts are attractive points for image
Results of computational experiments show that our algorithm compression and applications [3].
is effective compared with the performance of the previous Wavelet transform (WT) is a widely used image compres-
proposed algorithm.
sion technique. In JPEG 2000, discrete WT is used as a core
I. I NTRODUCTION technology to compress still images. WT is multi-resolution
analysis and it decomposes images into wavelet coefficients
The use of low cost personal computer, Internet, and
and scaling function. In WT, a signal energy concentrates
mobile phones has been all over the world. People can now
to specific wavelet coefficients. This characteristics is useful
communicate with each other beyond physical distances.
for compressing images. In this paper, we describe decom-
The new communication tools are also changing business
position of images using discrete wavelet transform (DWT)
approaches and our way of life. New technologies have
and encode images using VQ with variable block size. To
been developed in response to demands for data transmission
compress an image effectively, we propose a block merging
bandwidth and storage space. However, these demands con-
algorithm. The rest of paper is organized as follows. A brief
tinues to outstrip the capacity of existing technologies. To
introduction of DWT is given in section 2. In section 3,
use communication channels effectively, data compression
VQ with variable block size is described . A block merging
technologies are essential [1], [2].
algorithm is presented in section 4. Results of computational
Multimedia data (e.g., images, videos, voices, music, etc.)
experiments to confirm the effectiveness of the proposed
are streaming through communication channels. Since im-
method are presented in section 5. Finally, conclusions are
ages and videos require large bandwidth and storage capacity,
given in section 6.
technologies for compression of these data are essential
to use communication channels effectively. We have been II. D ISCRETE WAVELET T RANSFORM
studying technologies to compress images. The purpose of
image compression technology is to reduce the amount of In this section, we briefly review discrete wavelet trans-
data and to achieve low bit rate digital representation without form (DWT). Fourier transform (FT) computes an inner
perceptual loss of image quality. JPEG (Joint Photographic product between a signal f (t) and an integral kernel that
Expert Group) has recently been widely used for image com- is composed by sine and cosine waves. On the other hand,
pression. We are developing technologies to compress images wavelet transform computes an inner product between a
based on vector quantization (VQ) for the following reasons. signal f (t) and wavelets. In FT, we detect similarity between
Although a VQ requires a large amount of computational cost a given signal f (t) and sine wave or cosine wave. These
sine and cosine waves are infinite functions. If a signal exists
Osamu Yamanaka is with the Department of Computer Science & locally in the time axis, the signal f (t) may not be similar to
Systems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-
cho, Muroran 050-8585, Japan (phone: +81-143-46-5435; fax: +81-143-46- the sine or cosine wave. We use a wavelet that exists locally
5430; email: osamu@athena.csse.muroran-it.ac.jp). in the time axis, and it can therefore detect a function f (t)
Tsuyoshi Yamaguchi is with the Department of Computer Science & existing locally in the time axis. This is a basic idea for using
Systems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-
cho, Muroran 050-8585, Japan (phone: +81-143-46-5435; fax: +81-143-46- a wavelet [2], [4].
5430; email: tsuyoshi@athena.csse.muroran-it.ac.jp). Wavelet transform (WT) is defined by an inner product
Kazuya Sasazaki is with the Department of Computer Science & Sys- between a wavelet function and a signal as
tems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho,
  ∗
Muroran 050-8585, Japan (email: sasazaki@athena.csse.muroran-it.ac.jp). 1 t−b
Junji Maeda is with the Department of Computer Science & Systems En- (Wψ f )(b, a) = √ f (t)ψ dt, (1)
gineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran a a
050-8585, Japan (email: junji@csse.muroran-it.ac.jp). R
Yukinori Suzuki is with the Department of Computer Science & Sys-
tems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, where ψa,b (t) is a wavelet function and a, b ∈ R(a > 0) are
Muroran 050-8585, Japan (email: yuki@csse.muroran-it.ac.jp). parameters for scale and translation, respectively [4]. This

978-1-4244-3782-5/08/$25.00 ©2008 IEEE - 359 -

Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
continuous wavelet function is digitized by binary partition
for a and b such as a = 2j ,and b = 2j k, respectively. Then
a discrete wavelet function is obtained:
j  
ψj,k (t) = 2− 2 ψ 2−j t − k . (2)
Some class of ψa,b (t) satisfies orthogonality for parameters
a and b. If ψa,b (t) satisfies orthogonality, a signal f (t) can
be expanded with a wavelet series such as
  (j)
f (t) = wk ψj,k (t), (3)
j k
Fig. 1. Image of Lenna and its DWT with j = 2.
(j)
where wk is a wavelet coefficient.
Furthermore, a signal is represented by a linear combina- (0)
tion of scaling functions. An approximated signal f0 (t) of a we obtain sk , then scaling and wavelet coefficients can be
signal f (t) is generated using a scaling function computed using the two scale relations (8) and (9).
 (j)

f0 (t) = sk ϕ(t − k), (4) sk = p∗n−2k sn(j−1) . (12)
k n

(j)


where ϕ(t) is a scaling function such that wk = qn−2k sn(j−1) . (13)
 n
1(0 ≤ t < 1)
ϕ(t) = (5) Discrete wavelet transform (DWT) for two-dimensional
0(otherwise).
data f (m, n) is computed as well as for f (n). We first carried
In this case, multiresolution level j = 0 is the highest level. out DWT in the direction of the horizontal axis and then
This continuous function is digitized by binary partition carried out DWT in the direction of the vertical axis for the
j   data obtained by horizontal transform. Fig. 1 shows a Lenna
ϕj,k (t) = 2− 2 ϕ 2−j t − k . (6)
image and its DWT with level j = 2.
This scaling function also satisfies orthogonality for both
translation and scaling. The jth level approximated function, III. VARIABLE B LOCK S IZE FOR V ECTOR
fj (t), is represented using ϕj,k (t). Q UANTIZATION
 (j) Vector quantization (VQ) consists of an encoder, a code
fj (t) = sk ϕj,k (t), (7) book CB) and a decoder as shown in Fig. 2. A CB first has
k
to be designed for a VQ. A sufficient number of training
(j) images are prepared and then each image is partitioned into
where sk is a scaling coefficient. Here, we mention about
two scaling relations: rectangular blocks. These rectangular blocks form training
 vectors with fixed size. The sizes of vectors are usually
ϕj,k (t) = pn−2k ϕj−1,n (t), (8) 2 × 2, 4 × 4, 8 × 8, etc. The training vectors are grouped
k into clusters on the basis of an optimality criterion, and then
where sequence pn connects the jth level scaling function cluster centers provide the code vectors (CVs) of the CB.
and the j − 1th level of it. For the wavelet function ψj,k (t), Fig. 2 shows a conceptual diagram for encoding an image
there is the following relation: using a CB. A block of pixels is extracted from an image
 as the input vector. Encoding involves choosing the CV that
ψj,k (t) = qn−2k ϕj−1,n (t), (9) is closest to the input vector from the CB. An index of the
n
CV is sent to the decoder through a communication cannel.
where qn also connects the jth level wavelet and the j − 1th The decoder chooses the CV corresponding to the receiving
level of it. index. A VQ of an image is carried out by repeating the
We consider to compute wavelet coefficients from a dis- above process. In a VQ, the larger the size of CVs in a
crete sequence. We represent a sampled sequence as f (n). CB is, the lower is the compression rate of the encoded
As shown in (4), a continuous signal f (t) is expanded as image. On the other hand, the smaller the size of CVs in
 (0) a CB is, the higher is the quality of the encoded image. It
f (t)  f0 (t) = sk ϕ(t − k). (10) is a trade-off between compression rate and quality of the
k
encoded image. Vaisey and Gersho proposed variable block
 +∞
(0) size segmentation, in which a top-down quadtree (QT) was
sk = f (t)ϕ0,k (t)∗ dt. (11) employed to divided an image into variable blocks size. A
−∞
QT decomposition is based on homogeneity of local regions
(0)
In (10), sk cannot be computed, Mallet therefore proposed of an image. They obtained high quality-image reprodunction
(0)
that sampled sequence f (n) can be considered as sk . If at rates between 0.25 and 0.7 bits/pixel (bpp). Overall, their

- 360 -

Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
Fig. 3. Image ”airplane” and its LFD map. The LFD values are mapped
from the range of 2.0 - 3.0 to the range of 0 - 255. The bright level in the
LFD map is proportional to LFD values.
Fig. 2. Conceptual diagram of vector quantization.

of discriminant analysis [13]. We divide the histogram of


algorithm is quite complex. Since their study, a number LFDs into F clusters.
of researchers have studied a VQ with variable block size
using QT decomposition [5], [6], [7], [8], [9]. However, it 2.0 ≤ k1 < k2 · · · kF −1 < 3.0 (17)
is natural to assume that the complexity of local regions of The number of wavelet coefficients showing LFD level kj is
an image is more essential than the homegeneity of local denoted by ni , and the total number of wavelet coefficients
regions in constructing variable block size segmentation. is N .
According to the above idea, we proposed a VQ of an
F

image with variable block size using local fractal dimensions
pi = ni /N (i ∈ Sj , pi ≥ 0, pi = 1). (18)
(LFDs) [10]. We usually pay attention to complex regions
i=1
rather than homogeneous regions, and we consider complex
regions to be perceptually essential for image compression. The histogram of the LFDs is divided into F classes.
A fractal is a phenomena which has fundamental char- Cj for Sj = [kj−1 , . . . , kj ](j = 1, . . . , F ), (19)
acteristics of invariance under different scales. A fractal is
a geometrical pattern consisting of self-similar patterns. It where k0 = 2.0 and kF = 3.0. The probabilities of class
also possesses infinite details and is generally self-similar occurrence and class mean level, respectively, are given as

and independent of scale [11]. Fractal dimension is a quan- ωj = Pr(Cj ) = pi = ω(kj ) − ω(kj−1 ), (20)
titative measure of how densely a fractal occupies a space. i∈Sj
Novianto et al. developed a method for estimating a local
  ipi μ(kj ) − μ(kj−1 )
fractal dimension (LFD) of images [12]. The method is an μj = i Pr(i|Cj ) = = , (21)
optimization of a blanket method and its appropriateness was ωj ω(kj ) − ω(kj−1 )
i∈Sj i∈Sj
confirmed experimentally. A blanket method covers an image
kj
surface g(i, j) with a blanket by the top ue and bottom be where ω(kj ) = i=1 pi , ω(0) and μ(0) = 0. The goodness
surfaces [12]. of the threshold is determined so as to maximize the follow-
 ing objective function:
uε (i, j) = max uε−1 (i, j) + 1, max uε−1 (m, n) , F
|(m,n)−(i,j)|≤1 
2

(14) σB (k1 , ..., kD−1 ) = ωj (μj − μT )2 , (22)
j=1
bε (i, j) = min bε−1 (i, j) − 1, min bε−1 (m, n) ,
|(m,n)−(i,j)|≤1
F
(15) where μT = i=1 ipi . Fig. 4 show a Lenna image (top left),
where g(i, j) = u0 (i, j) = b0 (i, j). If ε is the number of its two level WT (top right), LFD map of wavelet coefficients
blankets, the area of blanket A(ε) is computed by (bottom left), and division of the LFD image with variable

block size (bottom right).
(uε (i, j) − bε (u, j))
i,j IV. I MAGE S EGMENTATION
A(ε) = . (16)
2ε Compression rate of decode image is computed by
The FD value, D, is computed by A(ε) = F ε2−D . F is n·c
a constant. The FD value can be estimated from the linear bpp = , (23)
N
fit of log A(ε) against log ε with the blanket’s scale ranging
where n is the number of indexes of an encoded image, c
from 1 to ε, and the slope should be equal to 2 − D. Fig. 3
is the number of bits to represent indexes of CVs, and N is
shows an image of Airplane and its LFD map.
the number of pixels of an original image. c is computed by
The training image is divided into blocks of L1 , L2 , ..., LF
in sizes using LFDs. The division is carried out by a method c = log2 C , (24)

- 361 -

Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
is given as

(Y − Yi )2 + (Cb − Cbi )2 + (Cr − Cri )2
di =  , (28)
Y 2 + Cb2 + Cr2
where i = 1, 2, ..., 8. Maximum distance to its neighbor for
each pixel is
8
dmax = max (di ) . (29)
i=1

The seed pixel candidate must satisfy the following two


conditions.
(1) It is necessary to have similarity higher than a threshold
value that is determined by Otsu discriminant analysis.
(2) It is necessary to have the maximum relative Euclidean
distance to its neighbors.
To apply this algorithm to the image divided into variable
block size, we consider a block of pixels in an image instead
of each pixel. The blocks are merged by an algorithm based
on the region growing algorithm. We specify neighboring
blocks as follows. We select one block indexed i. If there are
three blocks of the same in size on the right side (indexed
Fig. 4. The four images are Lenna image (top left), its two level WT (top i + 1), lower side (indexed i + 2), and opposite side (indexed
right), LFD map of wavelet coefficients (bottom left), and division of the
LFD map with variable block size (bottom right). i+3) to block i, these three blocks are neighboring blocks of
block indexed i. For these neighboring blocks, we compute
the standard deviation.

where C is the number of CVs in the CB and  means i+4 BS
1    2
raising the decimal to the nearest whole number. As shown σblock = xlj − x̄j , (30)
in (23), compression rate is specified as n, c, and N . c is 4 j=1 l=i
predetermined by the number of CVs in the CB. N is also
predetermined by the input image. Therefore, to improve where BS is block size, xlj
is jth pixel in block l. x̄j is mean
compression rate, we have to reduce n. For VQ, we divide the value of the jth pixel in the block i and it is computed as
i+4
image into blocks, being n is equal to the number of blocks. 1 l
We reduce n using a technique of image segmentation, which x̄j = xj . (31)
4
is based on the method of seed region growing proposed by l=i
Shih and Cheng [14]. For the next step, the relative Euclidean distance of block i
The technique proposed by Shih and Cheng [14] consists to its neighboring blocks is computed as
of two parts: automatic seed selection and a region growing 

 i
BS 2  2  2
algorithm to segment color image. For automatic seed se- Yj − Yjl + Cbij − Cblj + Crij − Crl j
lection, similarity of a pixel to its neighbors is computed j=1
dblock l =  ,
as follows. The standard deviation of Y, Cb, Cr (color

BS  2  2  2
components) in 3 × 3 window in size is Yji + Cbij + Crij
j=1

9 (32)
1 
σx = (xi − x̄)2 , (25) where Yji stand for the j pixel of the Y component of the
9 i=1 block that has its three neighboring blocks, l = i+1, i+2, i+
3. Yjl stand for the jth pixel value of lth neighboring block.
1

9 Maximum distance to its neighboring blocks is computed as


where x̄ is the mean value given by x̄ = 9 xi . The total
i=1
standard deviation for all color components is dblock max = max (dblock l ) . (33)
l

σ = σY + σCb + σr . (26) We specify the block as a seed block when the block
satisfies conditions (1) and (2) stated above. To confirm
σ is normalized between [0, 1], which is represented by σN . conditions (1) and (2), similarity and maximum distance are
Then, similarity of a pixel to its neighbors is defined as also employed. Since there is high similarity between the
seed block and its neighboring blocks, we merge these four
H = 1 − σN . (27) blocks as a new block. This merging process is applied to
all blocks comprising the image. For VQ, the merged blocks
The relative Euclidean distances of a pixel to its neighbors are encoded by a CB.

- 362 -

Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
TABLE I
C OMPARISON OF THE METHODS , VQ WITH VARIABLE BLOCK SIZE
PROPOSED PREVIOUSLY AND VQ WITH VARIABLE BLOCK SIZE USING A
MERGING ALGORITHM . DATA ARE FOR Y COLOR COMPONENT.

Image Method the number of pixels in the block PSNR bpp


2×2 4×4 8×8
Lenna previous 5188 1619 295 29.161 2.601318
proposed 1704 2002 353 29.0884 2.229858
Airplane previous 5260 1433 337 25.4878 2.592529
proposed 1168 2077 368 25.1925 2.175415
Balloon previous 4372 1819 232 33.3404 2.518433
proposed 1224 2246 322 33.4197 2.197266
Sailboat previous 6004 1719 219 26.9581 2.703857
proposed 1248 2404 281 26.8869 2.214478

V. C OMPUTATIONAL E XPERIMENTS
We carried out computational experiments to confirm the
effectiveness of the proposed algorithm for VQ. The training
image to construct a CB is shown in Fig. 5. We constructed
the CB according to the procedures described in section 3. Fig. 5. Training image to construct a CB.
The second level DWT is computed as shown in Fig. 1. Then
we computed LFDs of the image which consists of DWT
coefficients except for the upper left image (lower frequency VQ with variable block size.
band). Based on the LFDs, the image was divided into three
blocks of 2 × 2, 4 × 4, and 8 × 8 in size. These blocks VI. C ONCLUSIONS
of pixels constitute input vectors for VQ. We encoded five We have proposed a block merging algorithm for VQ
test images (Lenna, Airplane, Balloon, Sailboat) by VQ with with variable block size. The algorithm is based on the
variable block size. To use a merging algorithm, we divided method proposed by Shih and Cheng [14]. As shown in (23),
the image of DWT coefficients into blocks of 2 × 2 and compression rate with VQ is determined by the number of
4 × 4 in size. The division was carried out by the same indexes of an encoded image. Therefore, if this number can
procedure as VQ with variable block size. We applied the be reduced, compression rate can be improved. This is a
merging algorithm described in section 4 to these images basic motivation to introduce a merging algorithm. Results of
divided by two different block sizes. After applying the computational experiments show that the proposed algorithm
merging algorithm, there were three blocks of different sizes is effective for VQ with variable block size.
in the image: 2 × 2, 4 × 4, and 8 × 8. Then we encoded
the image using VQ with variable block size. The upper left R EFERENCES
image was quantized by scalar quantization with 64 levels. [1] K. Sayood, Introduction to data compression, Morgan Kaufmann
A comparison of the methods, VQ with variable block size Publisher: Boston, 2000.
proposed previously and VQ with variable block size using [2] R. C. Gonzalez and R. E. Woods, Digital Image Processing (Third
Edition), Pearson Prentis Hall: New Jersey, 2008.
the merging algorithm, is shown in Table 1. [3] M. Fujibayashi, T. Nozawa, T. Nakayama, K. Mochizuki, M. Konda,
As shown in Table 1, in the image ”Lenna”, the number K. Kotani, S. Sugawara, and T. Ohmi, A still-image encoder based on
of blocks of 2×2 in size obtained using the previous method adaptive resolution vector quantization featuring needless calculation
elimination architecture, IEEE Journal of Solid-State Circuit, vol. 38,
is 5188, while that obtained using the proposed method is no. 5, pp. 726-733, 2003.
1709, a reduction of 32.8%. However, the number of blocks [4] K. Nakano, K. Yamamoto, Y. Yoshida, Signal and Image Processing
of 4×4 in size obtained by the proposed method increases by by Wavelet (in Japanese), Kyoritushupan: Tokyo, 1999.
[5] M. Lightstone, K. Rose, and S.K. Mitra, Locally optimal codebook
23.6% compared with the number obtained by the previous design for quadtree-based vector quantization, Proc. IEEE ICASSP ’
method. The number of blocks of 8 × 8 in size obtained 95, pp. 2479-2482, Detroit, MI, 1995.
by the proposed method increases by 19.7% compared with [6] G. J. Sullivan, R.L. Baker, Efficient quadtree coding of images and
video, IEEE Trans Image Processing, vol. 3, no. 3, pp. 327-311, 1994.
the number obtained by the previous method. As a result, [7] C.Y. Wang, S.J. Liao, and L.W. Chang, Wavelet image coding using
bpp of the proposed method decreases by 12% compared variable blocksize vector quantization with optimal quadtree segmen-
with that of the previous method. In P SN R, two decoded tation, Signal Processing: Image Communication, vol. 15, pp. 879-890,
2000.
images show almost the same values. There is no substantial [8] C-C Chang, J-C Chuang, and C-Y Chung, Quadtree-segmented image
difference in image quality. For the other images, almost compression method using vector quantization and cubic B-spline
the same results were obtained. This can be confirmed from interpolation, The Image Science Journal, vol. 52, pp. 106-116, 2004.
[9] Y-C Hu, C-C Chang, Quadtree-semented image coding schemes using
both Fig. 6 and Fig. 7. Experimental results show that our vector quantizagtion and block truncation coding, Optical Engineering,
proposed method using a merging algorithm is effective for 39(2), pp. 464-471, 2000.

- 363 -

Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
[10] K. Sasazaki, S. Saga, J. Maeda, and Y. Suzuki, Vector quantization of
images with variable block size, Applied Soft Computing, vol. 8, pp.
634-645, 2008.
[11] M.F. Barnsely, Fractals everywhere, Academic Press Professional:
Bostan, 1993.
[12] S. Novianto, Y. Suzuki, and J. Maeda, Near optimal estimation of local
fractal dimension for image segmentation, Pattern Recognition Letter,
vol. 24, pp. 365-374, 2003.
[13] N. Otsu, T. Kurita, I. Sekita, Pattern Recognition (in Japanese),
Asakura Shoten: Tokyo, 1996.
[14] F. Y. Shih and S. Cheng, Image Vision Computing, vol. 23, pp. 877-
886, 2005.

Fig. 6. Lenna image (top left), the image compressed by the proposed
method (top right), and the image compressed by the previous method
(bottom left).

Fig. 7. Airplane image (top left), the image compressed by the proposed
method (top right), and the image compressed by the previous method
(bottom left).

- 364 -

Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.

You might also like