Professional Documents
Culture Documents
Abstract— We introduced discrete wavelet transform (DWT) to construct a code book (CB), decoding requires only lookup
to vector quantization (VQ) for image compression. DWT is from the CB once a CB has been constructed. Therefore,
multi-resolution analysis, and a signal energy concentrates to computational cost for decoding is remarkably smaller than
specific DWT coefficients. This characteristics is useful for
image compression. DWT coefficients are compressed using that needed to construct a CB. Furthermore, image quality
VQ with variable block size. To perform effective compression, by a VQ is higher than that of JPEG in the region of high
blocks are merged by the algorithm proposed in this paper. compression ratio. These facts are attractive points for image
Results of computational experiments show that our algorithm compression and applications [3].
is effective compared with the performance of the previous Wavelet transform (WT) is a widely used image compres-
proposed algorithm.
sion technique. In JPEG 2000, discrete WT is used as a core
I. I NTRODUCTION technology to compress still images. WT is multi-resolution
analysis and it decomposes images into wavelet coefficients
The use of low cost personal computer, Internet, and
and scaling function. In WT, a signal energy concentrates
mobile phones has been all over the world. People can now
to specific wavelet coefficients. This characteristics is useful
communicate with each other beyond physical distances.
for compressing images. In this paper, we describe decom-
The new communication tools are also changing business
position of images using discrete wavelet transform (DWT)
approaches and our way of life. New technologies have
and encode images using VQ with variable block size. To
been developed in response to demands for data transmission
compress an image effectively, we propose a block merging
bandwidth and storage space. However, these demands con-
algorithm. The rest of paper is organized as follows. A brief
tinues to outstrip the capacity of existing technologies. To
introduction of DWT is given in section 2. In section 3,
use communication channels effectively, data compression
VQ with variable block size is described . A block merging
technologies are essential [1], [2].
algorithm is presented in section 4. Results of computational
Multimedia data (e.g., images, videos, voices, music, etc.)
experiments to confirm the effectiveness of the proposed
are streaming through communication channels. Since im-
method are presented in section 5. Finally, conclusions are
ages and videos require large bandwidth and storage capacity,
given in section 6.
technologies for compression of these data are essential
to use communication channels effectively. We have been II. D ISCRETE WAVELET T RANSFORM
studying technologies to compress images. The purpose of
image compression technology is to reduce the amount of In this section, we briefly review discrete wavelet trans-
data and to achieve low bit rate digital representation without form (DWT). Fourier transform (FT) computes an inner
perceptual loss of image quality. JPEG (Joint Photographic product between a signal f (t) and an integral kernel that
Expert Group) has recently been widely used for image com- is composed by sine and cosine waves. On the other hand,
pression. We are developing technologies to compress images wavelet transform computes an inner product between a
based on vector quantization (VQ) for the following reasons. signal f (t) and wavelets. In FT, we detect similarity between
Although a VQ requires a large amount of computational cost a given signal f (t) and sine wave or cosine wave. These
sine and cosine waves are infinite functions. If a signal exists
Osamu Yamanaka is with the Department of Computer Science & locally in the time axis, the signal f (t) may not be similar to
Systems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-
cho, Muroran 050-8585, Japan (phone: +81-143-46-5435; fax: +81-143-46- the sine or cosine wave. We use a wavelet that exists locally
5430; email: osamu@athena.csse.muroran-it.ac.jp). in the time axis, and it can therefore detect a function f (t)
Tsuyoshi Yamaguchi is with the Department of Computer Science & existing locally in the time axis. This is a basic idea for using
Systems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-
cho, Muroran 050-8585, Japan (phone: +81-143-46-5435; fax: +81-143-46- a wavelet [2], [4].
5430; email: tsuyoshi@athena.csse.muroran-it.ac.jp). Wavelet transform (WT) is defined by an inner product
Kazuya Sasazaki is with the Department of Computer Science & Sys- between a wavelet function and a signal as
tems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho,
∗
Muroran 050-8585, Japan (email: sasazaki@athena.csse.muroran-it.ac.jp). 1 t−b
Junji Maeda is with the Department of Computer Science & Systems En- (Wψ f )(b, a) = √ f (t)ψ dt, (1)
gineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran a a
050-8585, Japan (email: junji@csse.muroran-it.ac.jp). R
Yukinori Suzuki is with the Department of Computer Science & Sys-
tems Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, where ψa,b (t) is a wavelet function and a, b ∈ R(a > 0) are
Muroran 050-8585, Japan (email: yuki@csse.muroran-it.ac.jp). parameters for scale and translation, respectively [4]. This
Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
continuous wavelet function is digitized by binary partition
for a and b such as a = 2j ,and b = 2j k, respectively. Then
a discrete wavelet function is obtained:
j
ψj,k (t) = 2− 2 ψ 2−j t − k . (2)
Some class of ψa,b (t) satisfies orthogonality for parameters
a and b. If ψa,b (t) satisfies orthogonality, a signal f (t) can
be expanded with a wavelet series such as
(j)
f (t) = wk ψj,k (t), (3)
j k
Fig. 1. Image of Lenna and its DWT with j = 2.
(j)
where wk is a wavelet coefficient.
Furthermore, a signal is represented by a linear combina- (0)
tion of scaling functions. An approximated signal f0 (t) of a we obtain sk , then scaling and wavelet coefficients can be
signal f (t) is generated using a scaling function computed using the two scale relations (8) and (9).
(j)
f0 (t) = sk ϕ(t − k), (4) sk = p∗n−2k sn(j−1) . (12)
k n
(j)
∗
where ϕ(t) is a scaling function such that wk = qn−2k sn(j−1) . (13)
n
1(0 ≤ t < 1)
ϕ(t) = (5) Discrete wavelet transform (DWT) for two-dimensional
0(otherwise).
data f (m, n) is computed as well as for f (n). We first carried
In this case, multiresolution level j = 0 is the highest level. out DWT in the direction of the horizontal axis and then
This continuous function is digitized by binary partition carried out DWT in the direction of the vertical axis for the
j data obtained by horizontal transform. Fig. 1 shows a Lenna
ϕj,k (t) = 2− 2 ϕ 2−j t − k . (6)
image and its DWT with level j = 2.
This scaling function also satisfies orthogonality for both
translation and scaling. The jth level approximated function, III. VARIABLE B LOCK S IZE FOR V ECTOR
fj (t), is represented using ϕj,k (t). Q UANTIZATION
(j) Vector quantization (VQ) consists of an encoder, a code
fj (t) = sk ϕj,k (t), (7) book CB) and a decoder as shown in Fig. 2. A CB first has
k
to be designed for a VQ. A sufficient number of training
(j) images are prepared and then each image is partitioned into
where sk is a scaling coefficient. Here, we mention about
two scaling relations: rectangular blocks. These rectangular blocks form training
vectors with fixed size. The sizes of vectors are usually
ϕj,k (t) = pn−2k ϕj−1,n (t), (8) 2 × 2, 4 × 4, 8 × 8, etc. The training vectors are grouped
k into clusters on the basis of an optimality criterion, and then
where sequence pn connects the jth level scaling function cluster centers provide the code vectors (CVs) of the CB.
and the j − 1th level of it. For the wavelet function ψj,k (t), Fig. 2 shows a conceptual diagram for encoding an image
there is the following relation: using a CB. A block of pixels is extracted from an image
as the input vector. Encoding involves choosing the CV that
ψj,k (t) = qn−2k ϕj−1,n (t), (9) is closest to the input vector from the CB. An index of the
n
CV is sent to the decoder through a communication cannel.
where qn also connects the jth level wavelet and the j − 1th The decoder chooses the CV corresponding to the receiving
level of it. index. A VQ of an image is carried out by repeating the
We consider to compute wavelet coefficients from a dis- above process. In a VQ, the larger the size of CVs in a
crete sequence. We represent a sampled sequence as f (n). CB is, the lower is the compression rate of the encoded
As shown in (4), a continuous signal f (t) is expanded as image. On the other hand, the smaller the size of CVs in
(0) a CB is, the higher is the quality of the encoded image. It
f (t) f0 (t) = sk ϕ(t − k). (10) is a trade-off between compression rate and quality of the
k
encoded image. Vaisey and Gersho proposed variable block
+∞
(0) size segmentation, in which a top-down quadtree (QT) was
sk = f (t)ϕ0,k (t)∗ dt. (11) employed to divided an image into variable blocks size. A
−∞
QT decomposition is based on homogeneity of local regions
(0)
In (10), sk cannot be computed, Mallet therefore proposed of an image. They obtained high quality-image reprodunction
(0)
that sampled sequence f (n) can be considered as sk . If at rates between 0.25 and 0.7 bits/pixel (bpp). Overall, their
- 360 -
Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
Fig. 3. Image ”airplane” and its LFD map. The LFD values are mapped
from the range of 2.0 - 3.0 to the range of 0 - 255. The bright level in the
LFD map is proportional to LFD values.
Fig. 2. Conceptual diagram of vector quantization.
- 361 -
Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
is given as
(Y − Yi )2 + (Cb − Cbi )2 + (Cr − Cri )2
di = , (28)
Y 2 + Cb2 + Cr2
where i = 1, 2, ..., 8. Maximum distance to its neighbor for
each pixel is
8
dmax = max (di ) . (29)
i=1
BS 2 2 2
components) in 3 × 3 window in size is Yji + Cbij + Crij
j=1
9 (32)
1
σx =
(xi − x̄)2 , (25) where Yji stand for the j pixel of the Y component of the
9 i=1 block that has its three neighboring blocks, l = i+1, i+2, i+
3. Yjl stand for the jth pixel value of lth neighboring block.
1
σ = σY + σCb + σr . (26) We specify the block as a seed block when the block
satisfies conditions (1) and (2) stated above. To confirm
σ is normalized between [0, 1], which is represented by σN . conditions (1) and (2), similarity and maximum distance are
Then, similarity of a pixel to its neighbors is defined as also employed. Since there is high similarity between the
seed block and its neighboring blocks, we merge these four
H = 1 − σN . (27) blocks as a new block. This merging process is applied to
all blocks comprising the image. For VQ, the merged blocks
The relative Euclidean distances of a pixel to its neighbors are encoded by a CB.
- 362 -
Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
TABLE I
C OMPARISON OF THE METHODS , VQ WITH VARIABLE BLOCK SIZE
PROPOSED PREVIOUSLY AND VQ WITH VARIABLE BLOCK SIZE USING A
MERGING ALGORITHM . DATA ARE FOR Y COLOR COMPONENT.
V. C OMPUTATIONAL E XPERIMENTS
We carried out computational experiments to confirm the
effectiveness of the proposed algorithm for VQ. The training
image to construct a CB is shown in Fig. 5. We constructed
the CB according to the procedures described in section 3. Fig. 5. Training image to construct a CB.
The second level DWT is computed as shown in Fig. 1. Then
we computed LFDs of the image which consists of DWT
coefficients except for the upper left image (lower frequency VQ with variable block size.
band). Based on the LFDs, the image was divided into three
blocks of 2 × 2, 4 × 4, and 8 × 8 in size. These blocks VI. C ONCLUSIONS
of pixels constitute input vectors for VQ. We encoded five We have proposed a block merging algorithm for VQ
test images (Lenna, Airplane, Balloon, Sailboat) by VQ with with variable block size. The algorithm is based on the
variable block size. To use a merging algorithm, we divided method proposed by Shih and Cheng [14]. As shown in (23),
the image of DWT coefficients into blocks of 2 × 2 and compression rate with VQ is determined by the number of
4 × 4 in size. The division was carried out by the same indexes of an encoded image. Therefore, if this number can
procedure as VQ with variable block size. We applied the be reduced, compression rate can be improved. This is a
merging algorithm described in section 4 to these images basic motivation to introduce a merging algorithm. Results of
divided by two different block sizes. After applying the computational experiments show that the proposed algorithm
merging algorithm, there were three blocks of different sizes is effective for VQ with variable block size.
in the image: 2 × 2, 4 × 4, and 8 × 8. Then we encoded
the image using VQ with variable block size. The upper left R EFERENCES
image was quantized by scalar quantization with 64 levels. [1] K. Sayood, Introduction to data compression, Morgan Kaufmann
A comparison of the methods, VQ with variable block size Publisher: Boston, 2000.
proposed previously and VQ with variable block size using [2] R. C. Gonzalez and R. E. Woods, Digital Image Processing (Third
Edition), Pearson Prentis Hall: New Jersey, 2008.
the merging algorithm, is shown in Table 1. [3] M. Fujibayashi, T. Nozawa, T. Nakayama, K. Mochizuki, M. Konda,
As shown in Table 1, in the image ”Lenna”, the number K. Kotani, S. Sugawara, and T. Ohmi, A still-image encoder based on
of blocks of 2×2 in size obtained using the previous method adaptive resolution vector quantization featuring needless calculation
elimination architecture, IEEE Journal of Solid-State Circuit, vol. 38,
is 5188, while that obtained using the proposed method is no. 5, pp. 726-733, 2003.
1709, a reduction of 32.8%. However, the number of blocks [4] K. Nakano, K. Yamamoto, Y. Yoshida, Signal and Image Processing
of 4×4 in size obtained by the proposed method increases by by Wavelet (in Japanese), Kyoritushupan: Tokyo, 1999.
[5] M. Lightstone, K. Rose, and S.K. Mitra, Locally optimal codebook
23.6% compared with the number obtained by the previous design for quadtree-based vector quantization, Proc. IEEE ICASSP ’
method. The number of blocks of 8 × 8 in size obtained 95, pp. 2479-2482, Detroit, MI, 1995.
by the proposed method increases by 19.7% compared with [6] G. J. Sullivan, R.L. Baker, Efficient quadtree coding of images and
video, IEEE Trans Image Processing, vol. 3, no. 3, pp. 327-311, 1994.
the number obtained by the previous method. As a result, [7] C.Y. Wang, S.J. Liao, and L.W. Chang, Wavelet image coding using
bpp of the proposed method decreases by 12% compared variable blocksize vector quantization with optimal quadtree segmen-
with that of the previous method. In P SN R, two decoded tation, Signal Processing: Image Communication, vol. 15, pp. 879-890,
2000.
images show almost the same values. There is no substantial [8] C-C Chang, J-C Chuang, and C-Y Chung, Quadtree-segmented image
difference in image quality. For the other images, almost compression method using vector quantization and cubic B-spline
the same results were obtained. This can be confirmed from interpolation, The Image Science Journal, vol. 52, pp. 106-116, 2004.
[9] Y-C Hu, C-C Chang, Quadtree-semented image coding schemes using
both Fig. 6 and Fig. 7. Experimental results show that our vector quantizagtion and block truncation coding, Optical Engineering,
proposed method using a merging algorithm is effective for 39(2), pp. 464-471, 2000.
- 363 -
Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.
[10] K. Sasazaki, S. Saga, J. Maeda, and Y. Suzuki, Vector quantization of
images with variable block size, Applied Soft Computing, vol. 8, pp.
634-645, 2008.
[11] M.F. Barnsely, Fractals everywhere, Academic Press Professional:
Bostan, 1993.
[12] S. Novianto, Y. Suzuki, and J. Maeda, Near optimal estimation of local
fractal dimension for image segmentation, Pattern Recognition Letter,
vol. 24, pp. 365-374, 2003.
[13] N. Otsu, T. Kurita, I. Sekita, Pattern Recognition (in Japanese),
Asakura Shoten: Tokyo, 1996.
[14] F. Y. Shih and S. Cheng, Image Vision Computing, vol. 23, pp. 877-
886, 2005.
Fig. 6. Lenna image (top left), the image compressed by the proposed
method (top right), and the image compressed by the previous method
(bottom left).
Fig. 7. Airplane image (top left), the image compressed by the proposed
method (top right), and the image compressed by the previous method
(bottom left).
- 364 -
Authorized licensed use limited to: KAMARAJA COLLEGE OF ENGINEERING AND TECHNOLOGY. Downloaded on July 20, 2009 at 03:33 from IEEE Xplore. Restrictions apply.