You are on page 1of 1

2011 Data Compression Conference

PARALLEL PROCESSING OF DCT ON GPU


Serpil Tokdemir and S. Belkasim
Department of Computer Science, Georgia State University, Atlanta, GA., USA, 30303

The Graphic Processing Unit (GPU) is increasingly becoming an important alternative


for many applications that requires real time processing. More interestingly, digital image
processing applications such as image compression are becoming closer than ever of
being processed in real time. In this paper we explore the implementation of discrete
cosine transform (DCT) on the GPUs. Our study indicates a clear superiority of the GPU
as parallel processor for image compression using DCT over the CPU. It also indicates
that the increase in image size considerably slowed the CPU and did not affect the GPU.
Digital multimedia image compression techniques require real time streaming. In order to
achieve this, the parallel processing of both the compression and decompression stages is
an absolute necessity. Since many image compression techniques have sections with
common computations over many pixels, this fact makes image compression a prime
target for acceleration on the GPU[1]. GPU acceleration of DCT computation is
attributed to the emergence of shader languages such as Cg and CUDA (Compute
Unified Device Architecture) that can be programmed using a C-like coding [2].
The DCT compression/decompression has been implemented on both the GPU and CPU
for images with sizes ranging from 90Kb to 300Kb.
DCT Performance

0.09
0.08
0.07
e(sn)

0.06
0.05 GPU Time
Tim

0.04 CPU Time


0.03
0.02
0.01
0
12

12

12

12

12

12

12
56

12 2
1
x5

x5

x5

x5

x5

x5

x5
2

5
6x

0 12x

12

12

12

12

12

12
25

,5

,5

,5

,5

,5

,5

,5
19 , 5
,

K
K

8
66

92

21

24

25

26

27

30
Size (Kb)

Figure1 Original image Figure2 IDCT Figure3 Processing Time versus image size
A sample image and its inverse DCT version is shown in Figure 1 and Figure 2. As
evident from the chart displayed in Figure 3, the CPU time increases when the image size
increases. On the other hand, the increase in image size results in a negligible increase in
GPU processing time. We noticed that uploading the image from the CPU into the GPU
requires more time than the entire time used by the GPU process. In light of this we can
conclude that implementing the DCT image compression on GPU is far superior to
having it computed solely by CPU.
Our experimental results clearly indicate the GPU is much more efficient as a parallel
processor for DCT image compression. It also indicates that the increase in image size
slows the performance of the CPU while the GPU was not affected. Although we gained
considerable time by processing the DCT blocks in parallel on the GPU, we encountered
some limitations from Cg due to the fact that Cg is not developed for general purpose
programming on GPU.

References
[1]- Owens, J.D. Houston, M. Luebke, D. Green, S. Stone, J.E. Phillips, J.C., GPU
Computing, Proceedings of the IEEE, May 2008, Volume: 96, Issue: 5, pp: 879-899
[2]- Ing. Vaclav Simek, Ram Rakesh ASN, GPU Acceleration of 2D-DWT Image
Compression in MATLAB with CUDA, Proceedings 2nd UKSim European Symposium
on Computer Modelling and Simulation, Liverpool, GB, IEEE CS, p. 274-277, 2008

1068-0314/11 $26.00 2011 IEEE 479


DOI 10.1109/DCC.2011.95

You might also like