You are on page 1of 9

VQ-DCT Based Image Compression: A New Hybrid Approach

S. Roy , A. K. Sen , N. Sinha


1
2 3

Department of Information Technology, Assam University, Silchar 788011, Assam, India.

Department of Physics, Assam University, Silchar 788011, Assam, India.

Department of Electrical Engineering, National Institute of Technology, Silchar 788010, Assam, India. Correspondence : sudipta.it@gmail.com

Abstract:
A hybrid image compression method is proposed in this work based upon two compression techniques, namely Vector Quantization (VQ) and Discrete Cosine Transform (DCT). In this approach, the codebook is generated initially with the help of ten different images using VQ. The final codebook is then generated using DCT matrix and is ready to be used. Any image can be compressed using this code book. Appropriate codewords are generated for the selected image and ultimately the compressed form of the image is obtained. The decompression can also be done to reconstruct the original image. Proposed approach is tested on standard images and performance of the approach is compared with standard VQ method. The performance of the proposed method is better as evidenced by higher PSNR with the hybrid method as compared to VQ method.

Keywords:
Image compression, decompression, vector quantization, discrete cosine transform, DCT matrix, PSNR.

1. Introduction:
Image compression addresses the problem of reducing the amount of data required to represent a digital image. The underlying basis of the reduction process is the removal of redundant data. From a mathematical viewpoint, this amounts to transforming a 2-D pixel array into a statistically uncorrelated data set. The transformation is applied prior to storage or transmission of the image. Later the compressed image is decompressed to reconstruct the original image or an approximation of it (Gonzalez and Woods, 2006). The initial focus of research efforts in this field was on the development of analog methods for reducing video transmission bandwidth, a process called bandwidth compression. The advent of the digital computer and subsequent development of advanced integrated circuits caused the shift of interest from analog to digital compression approaches. With the relatively recent adoption of several key international image compression standards, the field has undergone significant growth through the practical application of the theoretical work that began in the 1940s, when C. E. Shannon and others first formulated the probabilistic view of information and its representation, transmission, and compression (Gonzalez and Woods, 2006; Chanda and Dutta Majumder, 2000; Taubman and Marcellin, 2002). Currently, image compression is recognized as an enabling technology. It is the natural one for handling the increased spatial resolutions of todays imaging sensors and evolving broadcast television standards. Furthermore, image compression plays a major role in many important and diverse applications, including tele-video-conferencing, remote sensing, document and medical imaging, facsimile transmission (FAX), and the control of remotely piloted vehicles in military, space and hazardous waste management applications. So, an ever-expanding number of applications depend on the efficient manipulation, storage, and transmission of binary, gray-scale and color images (http://www.data-compression.com/vq.shtml).

Compression techniques can be broadly classified into two categories, namely, loss-less compression and lossy compression. The digital signal is represented by g and g' represents the decompressed form of the compressed digital signal g. Hence, any discrepancy between g' and g is considered as error introduced by the compression technique. Usually amount of error increases as amount of data decreases. So, the objective of the compression technique is to achieve maximum compression without introducing objectionable error (Chanda and Dutta Majumder, 2000; Taubman and Marcellin, 2002). If amount of error introduced is zero, we call it loss-less compression; otherwise it is a lossy compression. Loss-less compression is perfectly invertible. That means original image can be exactly recovered from its compressed representation. Principal loss-less compression strategies are Huffman Coding, Run-length Coding, Block Coding, Quad Tree Coding, Contour Coding. In case of lossy compression, perfect recovery of original image is not possible but amount of data reduction is more than loss-less compression. Lossy compression is useful in applications in which a certain amount of error is an acceptable trade-off for increased compression performance, such as Broadcast television, Videoconferencing, Facsimile transmission etc. All the image compression techniques exploit the common characteristic of most images is that the neighboring pixels are correlated and therefore contain redundant information. The foremost task then is to find less correlated representation of the image. Two fundamental components of compression are redundancy and irrelevancy reduction. All compression techniques attempt to remove the redundant information to the possible extent and derive the less correlated representation of the image. Principal Lossy compression strategies are Transform Compression, Block Truncation Compression, Vector Quantization (VQ) Compression (Chanda and Dutta Majumder, 2000; McGowan, http://www.jmcgowan.com/avialgo.html; Linde and Gray, 1980). VQ is a powerful method for lossy compression of data like in sounds or images, because their vector representations often occupy only small fractions of their vector spaces. Like, in a 2D gray scale image the vector space can be visualized as the [0,0]-[255,255] square in the plane. If taken on two components of the vectors as XY coordinates and a dot can be plotted for each vector found in the input image. In traditional coding methods based on the DCT (Kesavan, http://www.jmcgowan.com/avialgo.html; Rao and Yip, 1990; Cabeen and Gent; Ponomarenko et al., 2002), level of compression and amount of losses are determined by the quantization of DCT coefficients. Losses in images with DCT based compression results from quantization of DCT coefficients. And quantization is essential for compression of image/information. The main advantage of the DCT is its energy compaction property, that is, the entire signal energy before applying DCT is concentrated in only a few DCT coefficients after transforming, Hence most of the other coefficients become zero or negligibly small, and hence can be ignored or truncated. Image compression research aims at reducing the number of bits needed to represent an image by removing the spatial and spectral redundancies as much as possible. As DC components of DCT coefficients reflect average energy of pixel blocks and AC components reflect pixel intensity changes, it is conceivable to index and retrieve images directly based on DCT coefficients. However, the index or representation would not be compact as the number of DCT coefficients is equal to the number of pixels. Therefore it is proposed to use coefficients of some selected image windows. But the choice of windows will affect the performance dramatically, as the objects of interest may be located anywhere in a image. Although VQ offers more compression, yet is not widely implemented. This is due to two things. The first is the time it takes to generate the codebook, and the second is the speed of the search. Many algorithms have been proposed to increase the speed of the search. Some of them reduce the mathematics used to determine the codeword that offers the minimum distortion, other algorithms preprocess the codewords. Hence, it is felt to compress an image first using VQ method which will retain most of the information of the image at the same time achieves compression and secondly the code book of VQ method will be redefined by DCT matrix. This way of hybridization of VQ and DCT will make use of good subjective performance of VQ and high compression capability of DCT resulting into a more efficient algorithm for compression of images than VQ alone.

In view of the above, the main objectives of the present work are: 1. To generate code book of images using VQ method. 2. To redefine the code book using DCT matrix. 3. Compare the results with standard VQ based method. The rest of the paper is organized as follows: In Section 2 the concept of vectors, vector quantization and formation of DCT matrix are introduced. Hybridization is described in section 3. and section 4 presents the results and discussions. Conclusions are drawn in Section 5.

2. Background Theory: 2.1 Vector Quantization (VQ)


Vector quantization (VQ) is a lossy data compression method based on the principle of block coding. It is a fixed-to-fixed length algorithm. A VQ is nothing more than an approximator. The idea is similar to that of rounding-off (say to the nearest integer) (Taubman and Marcellin, 2002; Kesavan, http://www.jmcgowan.com/avialgo.html). An example of a 1-dimensional VQ is shown in Figure 1.

Figure 1 Codewords in 1-dimentional space Here, every number less than -2 is approximated by -3. All numbers between -2 and 0 are approximated by -1. Every number between 0 and 2 are approximated by +1. Every number greater than 2 are approximated by +3. The approximate values are uniquely represented by 2 bits. This is a 1-dimensional, 2-bit VQ. It has a rate of 2 bits/dimension. In the above example, the stars are called codevectors.
k

A vector quantizer maps k-dimensional vectors in the vector space R into a finite set of vectors Y = {y : i = 1, 2, ..., N}. Each vector y is called a code vector or a codeword. and the set of all the
i i

codewords is called a codebook. Associated with each codeword, y , is a nearest neighbour region
i

called encoding region or Voronoi region [4] and it is defined by: = { : | | , }


(1)
k

The set of encoding regions partition the entire space R such that: = =

Thus the set of all encoding regions is called the partition of the space. As an example we take vectors in the two-dimensional case without loss of generality in Figure 2. In the figure, Input vectors are marked with an x, codewords are marked with solid circles, and the Voronoi regions are separated with boundary lines. The figure shows some vectors in space. Associated with each cluster of vectors is a representative codeword. Each codeword resides in its own Voronoi region. These regions are separated with imaginary boundary lines in figure 2. Given an input vector, the codeword that is chosen to represent it is the one in the same Voronoi region. The representative codeword is determined to be the closest in Euclidean distance from the input vector. The Euclidean distance is defined by:
, =

(2)

th j ij

th i

where x is the j component of the input vector, and y is the j component of the codeword y . In Figure 2 there are 13 regions and 13 solid circles, each of which can be uniquely represented by 4

Figure 2 Codewords in 2-dimensional space.


bits. Thus, this is a 2-dimensional, 4-bit VQ. Its rate is also 2 bits/dimension. 2.2. Workings of VQ in compression A vector quantizer is composed of two operations. The first is the encoder, and the second is the decoder (Gonzalez and Woods, 2006). The encoder takes an input vector and outputs the index of the codeword that offers the lowest distortion. In this case the lowest distortion is found by evaluating the Euclidean distance between the input vector and each codeword in the codebook. Once the closest codeword is found, the index of that codeword is sent through a channel (the channel could be a computer storage, communications channel, and so on). When the decoder receives the index of the codeword, it replaces the index with the associated codeword. Figure 3 shows a block diagram of the operation of the encoder and the decoder.

Figure 3 The Encoder and decoder in a vector quantizer.


In Figure 3, an input vector is given, the closest codeword is found and the index of the codeword is sent through the channel. The decoder receives the index of the codeword, and outputs the codeword.

2.3 DCT Process


The general overview of the DCT is as below 1. The image is broken into 8x8 blocks of pixels. 2. Working from left to right, top to bottom, the DCT is applied to each block. 3. Each block is quantized and codebook is generated using k-means algorithm. 4. Using codebook and the procedure used in VQ, the image is compressed 5. When desired, the image is reconstructed through decompression, a process that uses the Inverse Discrete Cosine Transform.

2.4 The DCT Equation


th

The DCT equation (Eq.3) computes the (i, j ) entry of the DCT of an image. , =
,

(3) 4)

p(x,y) is the (x,y) element of the image represented by the matrix p. N is the size of the block on that
th

= 0 = 1 > 0
th

the DCT is done. The equation calculates one entry (i,j) of the transformed image from the pixel values of the original image matrix. For the standard 8x8 block, N equals 8 and x and y range from 0 to 7. Therefore D(i,j) would be as in equation 5. , = (5) ,

Because the DCT uses cosine functions, the resulting matrix depends on the horizontal, diagonal, and vertical frequencies. Therefore an image black with a lot of change in frequency has a very random looking resulting matrix, while an image matrix of just one color, has a resulting matrix of a large value for the first element and zeroes for the other elements. To get the matrix form of equation(3), we may use the equation(6) as below: = 0 = > 0 .3536 .4904 .4619 = .4157 .3536 .2778 .1913 .0975 The DCT matrix for a block of size 8x8 is listed in Table 1.

2.5. The DCT Matrix

(6)

.1913 .1913 .0975 .4904


.3536 .4904 .4619 .2778 .3536

.3536 .4157

.0975 .4619 .4157

.3536 .2778

.3536 .0975 .4619 .2778 .3536

.4157 .4157 .0975 .4904 .1913 .1913 .4619 .4619 .4904 .4904 .4157 .2778 Table 1 DCT matrix

.3536 .0975 .4619 .3536

.2778

.3536 .2778 .1913 .3536

.4904

.3536 .4157

.3536

.1913 .0975

.3536 .4904 .4619 .4157 .3536 .2778 .1913 .0975

The first row (i = 0) of the matrix has all the entries equal to of equation (6). The columns of T form an orthogonal set, so T is an orthogonal matrix. When doing the inverse DCT the orthogonality of T is important, as the inverse of T is which is easy to calculate.

2.6 Doing the DCT on an 8x8 Block


The pixel values of a black-and-white image range from 0 to 255 in steps of 2.3, where pure black is represented by 0, and pure white by 255. Thus it can be seen how a photo, illustration, etc. can be accurately represented by the 256 shades of gray. Since an image comprises hundreds or even thousands of 8x8 blocks of pixels, the procedure is happened to one 8x8 block, and is done to all of them, in the earlier specified order. The DCT is designed to work on pixel values ranging from -128 to 127, the original block is leveled off by subtracting 128 from each entry. We are now ready to perform the Discrete Cosine Transform, which is accomplished by matrix multiplication D = TMT (7) In equation (7) matrix M is first multiplied on the left by the DCT matrix T from the previous section. This operation transforms the rows. The columns are then transformed by multiplying on the right by the transpose of the DCT matrix. This yields D. This block matrix now consists 64 DCT coefficients, Cij where i and j range from 0 to 7. The top-left coefficient C00, correlates to the low frequencies of the original image block. As we move away from C00 in all directions, the DCT coefficients correlate to higher and higher frequencies of the image block, and C77 corresponds to the highest frequency. Human eye is most sensitive to low frequencies, and results from quantization step will reflect this fact.

2.7 Quantization
Our 8x8 block of DCT coefficients is now ready for compression by quantization. A remarkable and highly useful feature is that in this step, varying levels of image compression and quality are obtainable through selection of specific quantization matrices. This enables the user to decide on quality levels ranging from 1 to 100, where 1 gives the poorest image quality and highest compression and quality are obtainable through selection of specific quantization matrices. Quantization is achieved by dividing each element in the transformed image matrix D by the corresponding element in the quantization matrix, and then rounding to the nearest integer value.

2.8 Measurement of performance


To judge the performance of a lossy compression technique we need to decide upon using the error criterion. The error criteria commonly used may be classified into two broad groups: Objective criteria and Subjective criteria. The first group of measures need mathematical formulation and restricted to statistical sense only while it is very difficult to standardize the second group of measures as it involves human observers. Objective criteria For objective measurement we can use Mean Squared Error (MSE) and Peak Signal to Noise Ratio (PSNR). MSE may be defined by equation(8).

where M is the number of elements in the image. For example, if we wanted to find the MSE between the reconstructed and the original image, then we would take the difference between the two images pixel-by-pixel, square the results, and average the results.

(8)

The PSNR may be defined by equation (9).

where n is the number of bits per symbol. As an example, if we want to find the PSNR between two 256 gray level images, then we set n to 8 bits. Subjective criteria For subjective measurement the original image and the reconstructed image are shown to a large group of examiners. Each examiner assigns a grade to the reconstructed image with respect to the original image. These grades may be drawn from a subjective scale and may be divided as excellent, good, reasonable, poor, unacceptable. Based on grades assigned by examiners, an overall grade is assigned to the reconstructed image. Complement of this grade gives an idea of the subjective error.

= 10

(9)

3. The Hybrid Approach: VQ-DCT based Compression


We are performing Discrete Cosine Transformation with VQ and the new approach may be considered as VQ-DCT based image compression.

3.1 Designing of the codebook


Designing a codebook that best represents the set of input vectors is NP-hard. This means that it requires an exhaustive search for the best possible codewords in space, and the search increases exponentially as the number of codewords increases. We therefore resort to sub optimal codebook design schemes, and the one we consider here is the simplest one. The way we have designed codebook is as below: Here we have considered ten images of size 512x512 as the basic consideration. Pixel values of the images are concatenated one after another to form a matrix of size 512x5120 and divided into blocks of 8x8 pixels and they will be subsequently processed from left to right, and top to bottom. Then the blocks are stored as vectors. Each of which is a 64 elements array. So for the obtained matrix of ten input images of size 512x5120, we get one file of size 40960x64 pixels or 40960 vectors. Now, the initial codebook has been initialized with some arbitrary values. The resultant codebook is different for different choices. Size depends on the number of the codewords in the codebook. Considering the first vector of the initial codebook, the Euclidean distance of that vector with all the other vectors of the initial codebook is found out and stored in a single dimension array with element number equal to number of codewords in the codebook. The minimum value from the array is found out and the vector of the initial codebook will be within that vector region. These operations are performed till the end of the initial codebook. Calculate the average of the vectors in a particular region of the codebook are calculated and stored into the codebook. Following these procedures, all the codewords are modified and this process continues till the two consecutive operations do not change the codewords in a significant manner or the changes are within the limit of the considered tolerance. The DCT matrix along with its transpose matrix is considered now to generate the final codebook. The first 8x8 block from the generated codebook is considered and DCT of the data is made and kept in another transformation array of size 8x8. Now the transformation array is written to dctcodebook. All the data of the codebook are processed following the above and the codebook with DCT values is ready for use.

3.2 Compression and Decompression of the images.


The image to be compressed is considered and divided into 8x8 blocks. The 8x8 blocks are then transformed into the transformation array using DCT. The transformation array is now converted into an array of size 4096 x 64 which contains the DCT values, with 4096 vectors. Each vector of this file

is now compared with the codewords of dctcodebook and the index value of the dctcodebook is stored which belongs to minimum Euclidean distance. The index array is the compressed value of the image. To decompress the image, the index value of that image is used. From the index, we can get the values for the corresponding vectors in the dctcodebook. But the original one is the ultimate desire. To do that, we can transform the dctcodebook vectors into their original values by using inverse DCT functions or in a simpler manner, we can reconstruct the image from the codebook with the help of the index array. Thus the ultimate decompressed image is reconstructed. Then the performance criterion PSNR is calculated.

4. Performance
The performance of the VQ based compression and the new VQ-DCT based compression approach is evaluated in terms of PSNR. The output results of the images considered are presented below in the Table 2 with the size of the image, size of the block, number of codewords in the codebook: The results of the algorithms on different images with the block size 8x8 are presented in Table 2. Image Size of the Image 512x512 512x512 512x512 512x512 512x512 512x512 512x512 512x512 512x512 512x512 Block Size 8x8 8x8 8x8 8x8 8x8 8x8 8x8 8x8 8x8 8x8 8x8 Number of Vectors 256 256 256 256 256 256 256 256 256 256 256 PSNR of VQ based compression 25.899132 20.858301 23.168871 25.358310 25.910507 21.531078 22.300915 25.925383 24.933610 25.875566 28.670153 PSNR of the proposed approach 27.276513 22.302915 25.725581 27.576528 27.983451 24.938610 25.625083 28.076503 25.925388 27.276513 29.929355

Airplane 512x512 Baboon Barb Boat Couple Kiel Lake Lena Man Peppers Zelda

Table 2 Performance in terms of PSNR with block size 8x8 And the results of the algorithms on different images with the block size 4x4 are presented in Table 3. It can be observed from tables 2 and 3 that performance of both the algorithms is better with smaller block size as compared to that with larger block size. When the performance of the proposed algorithm is compared with standard VQ based method it can be observed the PSNR values for all the images with proposed method is quite better than that with standard VQ based method implying that the retrieved image quality with proposed method is superior. Image Size of the Image 512x512 512x512 512x512 512x512 512x512 Block Size 4x4 4x4 4x4 4x4 4x4 4x4 Number of Vectors 256 256 256 256 256 256 PSNR of VQ based compression 25.911146 20.867771 23.907484 25.272032 25.953247 21.953107 PSNR of the proposed approach 27.566214 22.675425 25.729589 27.546538 27.883403 24.969231

Airplane 512x512 Baboon Barb Boat Couple Kiel

Lake Lena Man Peppers Zelda

512x512 512x512 512x512 512x512 512x512

4x4 4x4 4x4 4x4 4x4

256 256 256 256 256

22.277294 26.312237 24.189133 26.082239 28.725544

25.547823 28.476073 25.763429 27.276513 29.776512

Table 3 Performance in terms of PSNR with block size 4x4

5. Conclusion
Standard images are compressed using both the standard VQ method and proposed method with different block sizes and the PSNR as performance index is obtained for each case for comparison. It is observed that using the proposed image compression method the PSNR is improved for all the images which is vivid from the tables 2 & 3. Thus the quality of the image is enhanced as PSNR is increased. So, the new approach may be considered as the superior one and can be used for the further development of the compression and decompression tools.

References:
1. R. C. Gonzalez and R. E. Woods (2006), Digital Image Processing, Pearson Education, Second Impression. 2. http://www.data-compression.com/vq.shtml. 3. C. Christopoulos, A. Skodras, T. Ebrahimi, (2000), The JPEG2000 still image coding system: an Overview, IEEE Trans. on Consumer Electronics, Vol. 46, Issue: 4, pp.11031127. 4. B. Chanda and D. Dutta Majumder, (2000), Digital Image Processing and Analysis, Prentice Hall Pvt. Ltd. 5. D. Taubman, M. Marcellin, (2002), JPEG 2000: Image Compression Fundamentals, Standards and Practice, Boston: Kluwer. 6. John. McGowan, AVI Overview: Video Compression Technologies available at

http://www.jmcgowan.com/avialgo.html.
7. Hareesh Kesavan, Choosing a DCT Quantization Matrix for JPEG Encoding available at www.jmcgowan.com/avialgo.html. 8. K. Rao, K., P. Yip, (1990), Discrete Cosine Transform, Algorithms, Advantages, Applications, Academic Press. 9. John McGowan, AVI Overview: Video Compression Technologies available at http://www.jmcgowan.com/avialgo.html. 10. K. Cabeen and P. Gent, Image Compression and the Discrete Cosine Transform, available at http://online.redwoods.cc.ca.us/instruct/darnold/LAPROJ/Fall98/ PKen/dct.pdf. 11. A. B. Y. Linde and R. M. Gray, (1980), "An algorithm for vector quantization design, IEEE Transactions on Communicatinos Vol COM-28, pp. 84-95.

12. N. Ponomarenko, V. Lukin, K. Egiazarian, J. Astola (2002), Partition Schemes in DCT


Based Image Compression, Technical Report 3-2002, ISBN 952-15-0811-6, Tampere University of Technology, Finland.

You might also like