You are on page 1of 24

IMAGE COMPRESSION USING DISCRETE COSINE TRANSFORM

A SEMINAR Submitted In partial fulfillment for the award of the Degree of

Bachelor of Technology In Department of Electronics & Communication Engineering

Supervisor Mr. Rishabh Sharma

Submitted by Nancy Singal

Department of Electronics & Communication Engineering

Global Institute Of Technology,Jaipur Rajasthan Technical University 2010- 2011

CANDIDATES DECLARATION I hereby declare that the work,which,is being presented in the SEMINAR,entitled IMAGE COMPRESSION USING DCT in partial fulfillment for the award of Degree Of Bachelor of Technology in Deptt.of Electronics & Communication Engineering, Global Institute Of Technology,Jaipur,Rajasthan Technical University,Kota is a record Of my own interest,carried under the Guidance of J.P.Aggarwal.

Nancy Singal B.Tech Electronics & Communication University Roll No.:EGJEC0777 Enrolment No.:

ACKNOWLEDGEMENT
The beatitude,bliss and euphoria that accompany the successful completion of any task would not be completed without the expression of simple virtues to the people who made it possible.

I feel immense pleasure in carrying my heartiest thanks and gratitude to respected Faculty Member Mr. J.P.Aggarwal for their guidance,suggestion and encouragement.

The Acknowledgement would not complete if I fail to expess my deep sense of Obligation to almighty God and my family, without their help this work would not have been completed.

Last but not least ,I thank all the concerned ones who directly or indirectly helped me in this work.

Signature Nancy Singal

Contents
Chapter I Title Introduction 1.1 Discrete cosine transform 1.1.1 Description 1.1.2 What is DCT 1.1.3 Mathematical Analysis 1.1.4 Classification 1.1.5 Advantages 1.1.6 Disadvantages 1.2 Image compression 1.2.1 Description 1.2.2 Types of image compression 1.2.1 Lossless compression 1.2.2 Lossy compression 1.2.3 Discrete cosine transform 1.2.3.1 Joint Photographic Expert Group 1.2.3.2 Motion Picture Expert Group Page No. i

II

Literature Survey 2.1 Sub article X 2.1.1 Sub-Sub article III Problem Identification 3.1 Sub article Y 3.1.1 Sub-Sub article IV Methodology

ABSTRACT

Image Compression addresses the problem of reducing the amount of data required to represent the digital image. Compression is achieved by the removal of one or more of three basic data redundancies: (1) Coding redundancy, which is present when less than optimal (i.e. the smallest length) code words are used; (2) Interpixel redundancy, which results from correlations between the pixels of an image & (3) psycho visual redundancy which is due to data that is ignored by the human visual system (i.e. visually

nonessential information). Huffman codes contain the smallest possible number of code symbols (e.g., bits) per source symbol (e.g., grey level value) subject to the constraint that the source symbols are coded one at a time. So, Huffman coding when combined with technique of reducing helps in the image redundancies the using Discrete data to Cosine a Transform good (DCT) extent.

compressing

image

very

The Discrete Cosine Transform (DCT) is an example of transform coding. The current JPEG standard uses the DCT as its basis. The DC relocates the highest energies to the upper left corner of the image. The lesser energy or information is relocated into other areas. The DCT is fast. It can be quickly calculated and is best for images with smooth edges like photos with human subjects. The DCT coefficients are all real numbers unlike the Fourier Transform. The Inverse Discrete Cosine Transform (IDCT) can be used to retrieve the image from its transform representation. The Discrete wavelet transform (DWT) has gained widespread acceptance in signal processing and image compression. Because of their inherent multi-resolution nature, wavelet-coding schemes are especially suitable for applications where scalability and tolerable degradation are important. Recently the JPEG committee has released its new image coding standard, JPEG-2000, which has been based upon DWT

CHAPTER-I INTRODUCTION 1.1Discrete Cosine Transform


1.1.1 DISCRIPTION Compressing an image is significantly different than compressing raw binary data. Of course, general purpose compression programs can be used to compress images, but the result is less than optimal. DCT has been widely used in signal processing of image. The one-dimensional DCT is useful in processing one-dimensional signals such as speech waveforms. For analysis of two dimensional (2D) signals such as images, we need a 2D version of the DCT data, especially in coding for compression, for its near-optimal performance. JPEG is a commonly used standard method of compression for photographic images. The name JPEG stands for Joint Photographic Experts Group, the name of the committee who created the standard. JPEG provides for lossy compression of images. Image compression is the application of data compression on digital images. In effect, the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form. The best image quality at a given bit-rate (or compression rate) is the main goal of image compression. The main objectives of this paper are reducing the image storage space, Easy maintenance and providing security, Data loss cannot effect the image clarity, Lower bandwidth\requirements for transmission, Reducing cost.

1.1.2 WHAT IS DCT

A discrete cosine transform (DCT) expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio and images (where small high-frequency components can be discarded), to spectral methods for the numerical solution of partial differential equations on, it turns out that cosine functions are

much more efficient (as explained below, fewer are needed to approximate a typical signal, whereas for differential equations the cosines express a particular choice of boundary conditions. In particular, a DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. There are eight standard DCT variants, of which four are common. The most common variant of discrete cosine transform is the type-II DCT, which is often called simply "the DCT"; its inverse, the type-III DCT, is correspondingly often called simply "the inverse DCT" or "the IDCT". Two related transforms are the discrete sine transform (DST), which is equivalent to a DFT of real and odd functions, and the modified discrete cosine transform (MDCT), which is based on a DCT of overlapping data.

1.1.3 MATHMATICAL ANALYSIS

DCT Equation The DCT equation (Eq.1) computes the i, jth entry of the DCT of an image.

p (x, y) is the x,yth element of the image represented by the matrix p. N is the size of the block that the DCT is done on. The equation calculates one entry (i, j th) of the transformed image from the pixel values of the original image matrix. For the standard 8x8 block that JPEG compression uses,N equals 8 and x and y range from 0 to 7. Therefore D (i, j ) would be as in Equation (3).

Because the DCT uses cosine functions, the resulting matrix depends on the horizontal and vertica frequencies. Therefore an image black with a lot of change in frequency has a very random looking resulting matrix, while an image matrix of just one color, has a resulting matrix of a large value for the first element and zeroes for the other.

THE DCT MATRIX: To get the matrix form of Equation (1), we will use the following equation,

For an 8x8 block it results in this matrix:

The first row (i : 1) of the matrix has all the entries equal to 1/ 8 as expected from Equation (4).The columns of T form an orthonormal set, so T is an orthogonal matrix. When doing the inverse DCT the orthogonality of T is important, as the inverse of T is T which is easy to calculate.

1.1.4 CLASSIFICATION

DCT-I

The DCT-I is exactly equivalent (up to an overall scale factor of 2), to a DFT of 2N 2 real numbers with even symmetry. For example, a DCT-I of N=5 real numbers abcde is exactly equivalent to a DFT of eight real numbers abcdedcb (even symmetry), divided by two. (In contrast, DCT types II-IV involve a half-sample shift in the equivalent DFT.) Note, however, that the DCT-I is not defined for N less than 2. (All other DCT types are defined for any positive N.) Thus, the DCT-I corresponds to the boundary conditions: xn is even around n=0 and even around n=N-1; similarly for Xk

Some authors further multiply the x0 and xN-1 terms by 2, and correspondingly multiply the X0 and XN-1 terms by 1/2. This makes the DCT-I matrix orthogonal, if one further multiplies by an overall scale factor of , but breaks the direct correspondence with a real-even DFT. The DCT-I is exactly equivalent (up to an overall scale factor of 2), to a DFT of 2N 2 real numbers with even symmetry. For example, a DCT-I of N=5 real numbers abcde is exactly equivalent to a DFT of eight real numbers abcdedcb (even symmetry), divided by two. (In contrast, DCT types II-IV involve a half-sample shift in the equivalent DFT.) Note, however, that the DCT-I is not defined for N less than 2. (All other DCT types are defined for any positive N.) Thus, the DCT-I corresponds to the boundary conditions: xn is even around n=0 and even around n=N-1; similarly for Xk.

DCT-II

The DCT-II is probably the most commonly used form, and is often simply referred to as "the DCT".This transform is exactly equivalent (up to an overall scale factor of 2) to a DFT of 4N real inputs of even symmetry where the even-indexed elements are zero. That is, it is half of the DFT of the 4N inputs yn, where y2n = 0, y2n + 1 = xn for , and y4N n = yn for 0 < n < 2N.

Some authors further multiply the X0 term by 1/2 and multiply the resulting matrix by an overall scale factor of (see below for the corresponding change in DCT-III). This makes the DCTII matrix orthogonal, but breaks the direct correspondence with a real-even DFT of halfshifted input. The DCT-II implies the boundary conditions: xn is even around n=-1/2 and even around n=N1/2; Xk is even around k=0 and odd around k=N.

DCT-III

Because it is the inverse of DCT-II (up to a scale factor, see below), this form is sometimes simply referred to as "the inverse DCT" ("IDCT"). Some authors further multiply the x0 term by 2 and multiply the resulting matrix by an overall scale factor of (see above for the corresponding change in DCT-II), so that the DCT-II and DCT-III are transposes of one another. This makes the DCT-III matrix orthogonal, but breaks the direct correspondence with a real-even DFT of half-shifted output. The DCT-III implies the boundary conditions: xn is even around n=0 and odd around n=N; Xk is even around k=-1/2 and even around k=N-1/2.

DCT-IV

The DCT-IV matrix becomes orthogonal (and thus, being clearly symmetric, its own inverse) if one further multiplies by an overall scale factor of . A variant of the DCT-IV, where data from different transforms are overlapped, is called the modified discrete cosine transform (MDCT) (Malvar, 1992).

The DCT-IV implies the boundary conditions: xn is even around n=-1/2 and odd around n=N1/2; similarly for Xk.

DCT V-VIII DCT types I-IV are equivalent to real-even DFTs of even order (regardless of whether N is even or odd), since the corresponding DFT is of length 2(N1) (for DCT-I) or 4N (for DCTII/III) or 8N (for DCT-VIII). In principle, there are actually four additional types of discrete cosine transform (Martucci, 1994), corresponding essentially to real-even DFTs of logically odd order, which have factors of N in the denominators of the cosine arguments. Equivalently, DCTs of types I-IV imply boundaries that are even/odd around either a data point for both boundaries or halfway between two data points for both boundaries. DCTs of types VVIII imply boundaries that even/odd around a data point for one boundary and halfway between two data points for the other boundary. However, these variants seem to be rarely used in practice. One reason, perhaps, is that FFT algorithms for odd-length DFTs are generally more complicated than FFT algorithms for evenlength DFTs (e.g. the simplest radix-2 algorithms are only for even lengths), and this increased intricacy carries over to the DCTs as described below. (The trivial real-even array, a length-one DFT (odd length) of a single number a, corresponds to a DCT-V of length N=1.)

1.1.5 ADVANTAGES

1. DCT transform is a pretty good technique for image compression. Correctly use the advantage that provide by DCT is the key to achieve good result while keep a good compression ratio. 2. The small size DCT is suitable for mobile applications using low power devices as fast computation speed is required for real time applications. 3. Low complexity, and high fidelity image compression using fixed threshold method.

4. DCT is real-valued and provides a better approximation of a signal with fewer coefficients. 5. DCT namely simplicity, satisfactory performance, and availability of special purpose hardware for implementation. 6. The DCT is a widely used transformation in transformation for data compression. It is an orthogonal transform, which has a fixed set of (image independent) basis functions, an efficient algorithm for computation, and good energy compaction and correlation reduction properties. 7. The DCT is fast. It can be quickly calculated and is best for images with smooth edges like photos with human subjects. 8. DCT algorithms are capable of achieving a high degree of compression with only minimal loss. of data. This scheme is effective only for compressing continuous-tone images in which the differences between adjacent pixels are usually small. 9. studies have shown that DCT provides better energy compaction than DFT for most natural images. 9. The decorrelation characteristics of DCT should render a decrease in the entropy (or selfinformation) of an image. This will, in turn, decrease the number of bits required to represent the image.

1.1.5 DISADVANTAGES 1.Only spatial correlation of the pixels inside the single 2-D block is considered and the correlation from the pixels of the neighboring blocks is neglected. 2.Impossible to completely decorrelate the blocks at their boundaries using DCT 3.Undesirable blocking artifacts affect the reconstructed images or video frames. (high compression ratios or very low bit rates) 4. Since the input image needs to be ``blocked,'' correlation across the block boundaries is not

eliminated. This results in noticeable and annoying ``blocking artifacts'' particularly at low bit rates. 5. At compression ratios above 30:1, JPEG performance rapidly deteriorates, while wavelet coders degrade gracefully well beyond ratios of 100:1. at higher compression ratios, image quality degrades because of the artifacts resulting from the block-based DCT scheme. 6. Frequently changing colors in dense spaces cannot be represented well with few coefficients. For example, a row of pixels interchanging between black and white pixel-by-pixel, is viewed as a high frequency in the frequency domain. However, a high frequency cannot be represented with few coefficients, and thus dropping high-order coefficients from the DCT removes the necessary detail. This is also the reason why diagrams are not compressed using jpeg compression. 7. DCT-based encoding algorithms are always lossy by nature. 8. Removal of high-frequency coefficients results in removal of certain frequencies that were originally present in the sine wave. After losing certain frequencies. it is not possible to achieve perfect reconstruction.

1.2 IMAGE COMPRESSION

1.2.1 DISCREPTION The purpose of image compression is to represent images with less data in order to save storage costs or transmission time. Without compression, file size is significantly larger, usually several megabytes, but with compression it is possible to reduce file size to 10 percent from the

original without noticeable loss in quality. Image compression can be lossless or lossy. Lossless compression means that you are able to reconstruct the exact original data from the compressed data. Image quality is not reduced when using lossless compression. Unlike lossless compression, lossy compression reduces image quality. You can't get the original image back after using lossy compression methods. You will lose some information.

1.2.2 TYPES OF IMAGE COMPRESSION 1.2.2.1 Lossless Compression


The goal of lossless image compression is to represent an image signal with the smallest possible number of bits without loss of any information, thereby speeding up transmission and minimizing storage requirements. The number of bits representing the signal is typically expressed as an average bit rate (average number of bits per sample for still images, and average number of bits per second for video). The goal of lossy compression is to achieve the best possible fidelity given an available communication or storage bit rate capacity or to minimize the number of bits representing the image signal subject to some allowable loss of information. In this way, a much greater reduction in bit rate can be attained as compared to lossless compression, which is necessary for enabling many realtime applications involving the handling and transmission of audiovisual information.

The function of compression is often referred to as coding, for short.Coding techniques are crucial for the effective transmission or storage of data intensive visual information. In fact, a single uncompressed color image or video frame with a medium resolution of 500_500 pixels would require 100 seconds for transmission over an Integrated Services Digital Network (ISDN) link having a capacity of 64,000 bits per second (64 Kbps). The resulting delay is intolerably large considering that a delay as small as 1 to 2 seconds is needed to conduct an interactive slide show and a much smaller delay (on the order of 0.1 second) is required for video transmission or playback.

1.2.2.2 Lossy Compression


lossy compression is a data encoding method which compresses data by discarding (losing) some of it. The procedure aims to minimise the amount of data that needs to be held, handled, and/or transmitted by a computer. The different versions of the photo of the dog at the right demonstrate how much data can be dispensed with, and how the pictures become progressively coarser as the data that made up the original one is discarded (lost). Typically, a substantial amount of data can be discarded before the result is sufficiently degraded to be noticed by the user.lossy compression is most commonly used to compress multimedia data (audio, video, still images), especially in applications such as streaming media and internet telephony. By contrast, lossless compression is required for text and data files, such as bank records, text articles, etc. In many cases it is advantageous to make a master lossless file which can then be used to produce compressed files for different purposes; for example a multi-megabyte file can be used at full size to produce a full-page advertisement in a glossy magazine, and a 10 kilobyte lossy copy made for a small image on a web page.

1.2.3 DISCRETE COSINE TRANSFORM


The 2-D discrete cosine transform (DCT) is an invertible linear transform and is widely used in many practical image compression systems because of its compression performance and computational efficiency.DCT converts data (image pixels) into sets of frequencies.The first frequencies in the set are the most meaningful; the latter, the least. The least meaningful

frequencies can be stripped away based on allowable resolution loss.

DCT-based image

compression relies on two techniques to reduce data required to represent the image. The first is quantization of the images DCT coefficients; the second is entropy coding of the quantized coefficients.Quantization is the process of reducing the number of possible values of a quantity, thereby reducing the number of bits needed to represent it. Quantization is a lossy process and implies in a reduction of the color information associated with each pixel in the image.

1.2.3.1 Joint Photographic Expert Group The JPEG specification defines a minimal subset of the standard called baseline JPEG, which all JPEG-aware applications are required to support. This baseline uses an encoding scheme based on the Discrete Cosine Transform (DCT) to achieve compression. DCT is a generic name for a class of operations identified and published some years ago. DCT-based algorithms have since made their way into various compression methods. DCT-based encoding algorithms are always lossy by nature. DCT algorithms are capable of achieving a high degree of compression with only minimal loss of data. This scheme is effective only for compressing continuous-tone images in which the differences between adjacent pixels are usually small. In practice, JPEG works well only on images with depths of at least four or five bits per color channel. The baseline standard actually specifies eight bits per input sample. Data of lesser bit depth can be handled by scaling it up to eight bits per sample, but the results will be bad for low-bit-depth source data, because of the large jumps between adjacent pixel values. For similar reasons, color mapped source data does not work very well, especially if the image has been dithered. The JPEG compression scheme is divided into the following stages: 1. 2. 3. Transform the image into an optimal color space. Down sample chrominance components by averaging groups of pixels together. Apply a Discrete Cosine Transform (DCT) to blocks of pixels, thus removing redundant

image data.

4.

Quantize each block of DCT coefficients using weighting functions optimized for the

human eye. 5. Encode the resulting coefficients (image data) using a Huffman variable word-length

algorithm to remove redundancies in the coefficients. Figure-1.3 summarizes these steps, and the following subsections look at each of them in turn. Note that JPEG decoding performs the reverse of these steps.

Figure 1.3: JPEG compression and decompression Transform the image The JPEG algorithm is capable of encoding images that use any type of color space. JPEG itself encodes each component in a color model separately, and it is completely independent of any color-space model. The best compression ratios result if a luminance/chrominance color space, such as YUV or YCbCr, is used. Most of the visual information to which human eyes are most sensitive is found in the high-frequency, gray-scale, luminance component (Y) of the YCbCr color space. The other two chrominance components (Cb and Cr) contain highfrequency color information to which the human eye is less sensitive. Most of this information can therefore be discarded. All three color components would need to be encoded at the highest

quality, resulting in a poorer compression ratio. Gray-scale images do not have a color space as such and therefore do not require transforming. The simplest way of exploiting the eye's lesser sensitivity to chrominance information is simply to use fewer pixels for the chrominance channels. For example, in an image nominally 1000x1000 pixels, we might use a full 1000x1000 luminance pixels but only 500x500 pixels for each chrominance component. In this representation, each chrominance pixel covers the same area as a 2x2 block of luminance pixels. We store a total of six pixel values for each 2x2 block (four luminance values, one each for the two chrominance channels), rather than the twelve values needed if each component is represented at full resolution. Remarkably, this 50 percent reduction in data volume has almost no effect on the perceived quality of most images. Equivalent savings are not possible with conventional color models such as RGB, because in RGB each color channel carries some luminance information and so any loss of resolution is quite visible. When the uncompressed data is supplied in a conventional format (equal resolution for all channels), a JPEG compressor must reduce the resolution of the chrominance channels by downsampling, or averaging together groups of pixels. The JPEG standard allows several different choices for the sampling ratios, or relative sizes, of the downsampled channels. The luminance channel is always left at full resolution (1:1 sampling). Typically both chrominance channels are downsampled 2:1 horizontally and either 1:1 or 2:1 vertically, meaning that a chrominance pixel covers the same area as either a 2x1 or a 2x2 block of luminance pixels. Apply a Discrete Cosine Transform The image data is divided up into 8x8 blocks of pixels. (From this point on, each color component is processed independently, so a "pixel" means a single value, even in a color image.) A DCT is applied to each 8x8 block. DCT converts the spatial image representation into a frequency map: the low-order or "DC" term represents the average value in the block, while successive higher-order ("AC") terms represent the strength of more and more rapid changes across the width or height of the block. The highest AC term represents the strength of a cosine wave alternating from maximum to minimum at adjacent pixels.

The DCT calculation is fairly complex; in fact, this is the most costly step in JPEG compression. The point of doing it is that we have now separated out the high- and lowfrequency information present in the image. We can discard high-frequency data easily without losing low-frequency information. The DCT step itself is lossless except for roundoff errors. Quantize each block To discard an appropriate amount of information, the compressor divides each DCT output value by a "quantization coefficient" and rounds the result to an integer. The larger the quantization coefficient, the more data is lost, because the actual DCT value is represented less and less accurately. Each of the 64 positions of the DCT output block has its own quantization coefficient, with the higher-order terms being quantized more heavily than the low-order terms (that is, the higher-order terms have larger quantization coefficients). Furthermore, separate quantization tables are employed for luminance and chrominance data, with the chrominance data being quantized more heavily than the luminance data. This allows JPEG to exploit further the eye's differing sensitivity to luminance and chrominance. It is this step that is controlled by the "quality" setting of most JPEG compressors. The compressor starts from a built-in table that is appropriate for a medium-quality setting and increases or decreases the value of each table entry in inverse proportion to the requested quality. The complete quantization tables actually used are recorded in the compressed file so that the decompressor will know how to (approximately) reconstruct the DCT coefficients. Selection of an appropriate quantization table is something of a black art. Most existing compressors start from a sample table developed by the ISO JPEG committee. It is likely that future research will yield better tables that provide more compression for the same perceived image quality.

Encode the resulting coefficients The resulting coefficients contain a significant amount of redundant data. Huffman compression will losslessly remove the redundancies, resulting in smaller JPEG data. An optional extension to the JPEG specification allows arithmetic encoding to be used instead of Huffman for an even greater compression ratio. At this point, the JPEG data stream is ready to be transmitted across a communications channel or encapsulated inside an image file format. PEPPER EXAMPLE We can do the DCT and quantization process on the pepper image.

Figure-1.4 pepper

Each eight by eight block is hit by DCT, resulting in the image shown in figure-1.4

figure-1.5 DCT of pepper Each element in each block of the image is then quantized using a quantization matrix of quantity level 50.At this point many of the elements become zeroed out,the image takes up so much less space to store.

figure-1.6 Quantized DCT of pepper The image can now be decompressed using the inverse discrete cosine transform.At quality level 50 there is almost no visible loss in this image,but there is high compression.At lower quality level,the quality goes down by a lot,but the compression does not increase very much.

Figure-1.7 original pepper Lossless JPEG compression

Figure-1.8 Quality 50-84% Zeros

A question that commonly arises is "At what Q factor does JPEG become lossless?" The answer is "never." Baseline JPEG is a lossy method of compression regardless of adjustments you may make in the parameters. In fact, DCT-based encoders are always lossy, because roundoff errors are inevitable in the color conversion and DCT steps. You can suppress deliberate information loss in the downsampling and quantization steps, but you still won't get an exact recreation of the original bits. Further, this minimum-loss setting is a very inefficient way to use lossy JPEG.The JPEG standard does offer a separate lossless mode. This mode has nothing in common with the regular DCT-based algorithms, and it is currently implemented only in a few commercial applications. JPEG lossless is a form of Predictive Lossless Coding using a 2D Differential Pulse Code Modulation (DPCM) scheme. The basic premise is that the value of a pixel is combined with the values of up to three neighboring pixels to form a predictor value. The predictor value is then subtracted from the original pixel value. When the entire bitmap has been processed, the resulting predictors are compressed using either the Huffman or the binary arithmetic entropy encoding methods described in the JPEG standard.lossless JPEG works on images with 2 to 16 bits per pixel, but performs best on

images with 6 or more bits per pixel. For such images, the typical compression ratio achieved is 2:1. For image data with fewer bits per pixels, other compression schemes do perform better.

You might also like