Professional Documents
Culture Documents
D
U
C
R
U
T
C
E
T
I
H
C
AR
Prepared By:
Shubham Agrawal(120750107060)
Jainam Jain(120750107036)
Dwij Erda(120750107025)
Affiliated To:
SHANKERSINH VAGHELA BAPU
INSTITUTE OF TECHNOLOGY
INTRODUCTION
CUDA(Compute Unified Device
Architecture) is aparallel
computingplatform and programming
model created byNVIDIAand
implemented by thegraphics processing
units(GPUs) that they produce.CUDA
gives program developers direct access
to the virtualinstruction setand
memory of the parallel computational
elements in CUDA GPUs.
Using CUDA, the GPUs can be used for
general purpose processing (i.e., not
exclusively graphics); this approach is
known asGPGPU. Unlike CPUs, however,
Concept of
CUDA(GPGPU)
Latency
Processor +
Throughput Processor
HISTORY
The initial CUDASDKwas made public on 15
February 2007, forMicrosoft Windows
andLinux.Mac OS Xsupport was later added in
version 2.0,which supersedes the beta released
February 14, 2008.CUDA works with all Nvidia
GPUs from the G8x series onwards,
includingGeForce,Quadroand theTeslaline.
CUDA is compatible with most standard
operating systems. Nvidia states that programs
developed for the G8x series will also work
without modification on all future Nvidia video
cards, due to binary compatibility.
EVALUATION
Nvidia
Quadro
Nvidia Tesla
Quadro
K6000
Tesla K20X
Quadro
K5000
Quadro
K4000
Quadro
K2000D
Quadro
K2000
Quadro K600
Quadro 6000
Quadro 5000
Tesla K40
Tesla K20
Tesla K10
Tesla
C2050/2070
GeForce GTX
880M
GeForce GTX
870M
GeForce
860M
Tesla
GeForce
M2050/M2070
850M
Tesla S2050
GeForce
Tesla S1070
GeForce
Tesla M1060
Tesla C1060
Quadro 4000
Tesla C870
Quadro 2000
Tesla D870
Quadro 600
Tesla S870
Quadro FX
Nvidia GeForce
Mobile
GTX
GTX
Nvidia GeForce
GeForce GTX
Titan Z
GeForce GTX
TITAN Black
GeForce GTX
TITAN
GeForce GTX
780 Ti
GeForce 830M
GeForce GTX
780M
GeForce GTX
780
GeForce GTX
770
GeForce GTX
760
GeForce GTX
770M
GeForce GTX
GeForce GTX
750 Ti
GeForce GTX
845M
840M
PROCESSING FLOW
GPU ARCHITECTURE
Two Main Components
Global memory
Analogous to RAM in a CPU
server
Accessible by both GPU and
CPU
Currently up to 6 GB
Bandwidth currently up to
150 GB/s for Quadro and
Tesla products
ECC on/off option for
Quadro and Tesla products
Streaming Multiprocessors
(SMs)
Perform the actual
GPU ARCHITECTURE
Fermi: Streaming
Multiprocessor (SM)
32 CUDA Cores per SM
32 fp32 ops/clock
16 fp64 ops/clock
32 int32 ops/clock
2 warp schedulers
Up to 1536 threads
concurrently
4 special-function units
64KB shared mem+ L1
cache
32K 32-bit registers
Register
GPU ARCHITECTURE
Fermi: CUDA Core
Floating point & Integer
unit
IEEE 754-2008 floatingpoint standard
Fused multiply-add (FMA)
instruction for both single
and double precision
Logic unit
Move, compare unit
Branch unit
Register
Recent Applications
GEFORCE GTX TITAN Z
NVIDIA TABLET
NVIDIA SHIELD
NVIDIA SHADOWPLAY
REFERENCES
www.nvidia.in
THANK
YOU