You are on page 1of 16

A

D
U
C
R
U
T
C
E
T
I
H
C
AR

Prepared By:
Shubham Agrawal(120750107060)
Jainam Jain(120750107036)
Dwij Erda(120750107025)

Affiliated To:
SHANKERSINH VAGHELA BAPU
INSTITUTE OF TECHNOLOGY

INTRODUCTION
CUDA(Compute Unified Device
Architecture) is aparallel
computingplatform and programming
model created byNVIDIAand
implemented by thegraphics processing
units(GPUs) that they produce.CUDA
gives program developers direct access
to the virtualinstruction setand
memory of the parallel computational
elements in CUDA GPUs.
Using CUDA, the GPUs can be used for
general purpose processing (i.e., not
exclusively graphics); this approach is
known asGPGPU. Unlike CPUs, however,

The CUDA platform is accessible to software


developers through CUDA-accelerated
libraries,compiler directives and extensions to
industry-standard programming languages,
includingC,C++andFortran.
In thecomputer gameindustry, GPUs are used
not only for graphics rendering but also ingame
physics calculations(physical effects like debris,
smoke, fire, fluids) examples
includePhysXandBullet. CUDA has also been
used to accelerate non-graphical applications
incomputational biology,cryptographyand other
fields by anorder of magnitudeor more.
CUDA provides both a low levelAPI(Application
Programming Interface)and a higher level API.

Concept of
CUDA(GPGPU)
Latency
Processor +
Throughput Processor

HISTORY
The initial CUDASDKwas made public on 15
February 2007, forMicrosoft Windows
andLinux.Mac OS Xsupport was later added in
version 2.0,which supersedes the beta released
February 14, 2008.CUDA works with all Nvidia
GPUs from the G8x series onwards,
includingGeForce,Quadroand theTeslaline.
CUDA is compatible with most standard
operating systems. Nvidia states that programs
developed for the G8x series will also work
without modification on all future Nvidia video
cards, due to binary compatibility.

EVALUATION
Nvidia
Quadro

Nvidia Tesla

Quadro
K6000

Tesla K20X

Quadro
K5000
Quadro
K4000
Quadro
K2000D
Quadro
K2000
Quadro K600
Quadro 6000
Quadro 5000

Tesla K40
Tesla K20
Tesla K10
Tesla
C2050/2070

GeForce GTX
880M
GeForce GTX
870M

GeForce
860M
Tesla
GeForce
M2050/M2070
850M
Tesla S2050
GeForce
Tesla S1070
GeForce
Tesla M1060
Tesla C1060

Quadro 4000

Tesla C870

Quadro 2000

Tesla D870

Quadro 600

Tesla S870

Quadro FX

Nvidia GeForce
Mobile

GTX
GTX

Nvidia GeForce
GeForce GTX
Titan Z
GeForce GTX
TITAN Black
GeForce GTX
TITAN
GeForce GTX
780 Ti

GeForce 830M
GeForce GTX
780M

GeForce GTX
780
GeForce GTX
770
GeForce GTX
760

GeForce GTX
770M
GeForce GTX

GeForce GTX
750 Ti
GeForce GTX

845M
840M

PROCESSING FLOW

GPU ARCHITECTURE
Two Main Components
Global memory
Analogous to RAM in a CPU
server
Accessible by both GPU and
CPU
Currently up to 6 GB
Bandwidth currently up to
150 GB/s for Quadro and
Tesla products
ECC on/off option for
Quadro and Tesla products
Streaming Multiprocessors
(SMs)
Perform the actual

GPU ARCHITECTURE
Fermi: Streaming
Multiprocessor (SM)
32 CUDA Cores per SM
32 fp32 ops/clock
16 fp64 ops/clock
32 int32 ops/clock
2 warp schedulers
Up to 1536 threads
concurrently
4 special-function units
64KB shared mem+ L1
cache
32K 32-bit registers
Register

GPU ARCHITECTURE
Fermi: CUDA Core
Floating point & Integer
unit
IEEE 754-2008 floatingpoint standard
Fused multiply-add (FMA)
instruction for both single
and double precision
Logic unit
Move, compare unit
Branch unit
Register

Recent Applications
GEFORCE GTX TITAN Z
NVIDIA TABLET
NVIDIA SHIELD
NVIDIA SHADOWPLAY

REFERENCES

We are very thankful to:


GOOGLE-For Images and gadgets
WIKIPEDIA-For information
YOU TUBE-For video
And the most important one the
official site of nvidia

www.nvidia.in

THANK
YOU

You might also like