You are on page 1of 185

Introduction to OpenCL and GPU Programming

Katharine Hyatt

February 12, 2012

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Outline

1 Basics

2 GPGPU Concepts

3 Beginning Example

4 Getting Code Working

Introduction to OpenCL and GPU Programming Katharine Hyatt

1 Basics
Basics GPGPU Concepts Beginning Example Getting Code Working

2 GPGPU Concepts

3 Beginning Example

4 Getting Code Working

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What is OpenCL?

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What is OpenCL?

Extension of the C programming language

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What is OpenCL?

Extension of the C programming language Allows control of heterogenous systems

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What is OpenCL?

Extension of the C programming language Allows control of heterogenous systems Code runs on CPU, GPU, Cell.

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What is OpenCL?

Extension of the C programming language Allows control of heterogenous systems Code runs on CPU, GPU, Cell. Open standard developed by OpenCL group

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Why use a GPU?

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Why use a GPU?

GPUs designed for massively parallel computing

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Why use a GPU?

GPUs designed for massively parallel computing Multiple-instruction-multiple-data architecture

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Why use a GPU?

GPUs designed for massively parallel computing Multiple-instruction-multiple-data architecture Use asynchronous control between host and device

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Why use a GPU?

GPUs designed for massively parallel computing Multiple-instruction-multiple-data architecture Use asynchronous control between host and device Eective for some CPU-infeasible problems

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Why use a GPU?

GPUs designed for massively parallel computing Multiple-instruction-multiple-data architecture Use asynchronous control between host and device Eective for some CPU-infeasible problems Far cheaper per GFLOP than CPUs

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL vs Alternatives

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL vs Alternatives

OpenCL is a cross-platform open standard

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL vs Alternatives

OpenCL is a cross-platform open standard CUDA has closed source components

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL vs Alternatives

OpenCL is a cross-platform open standard CUDA has closed source components So far, CUDA only works on nVidia

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL vs Alternatives

OpenCL is a cross-platform open standard CUDA has closed source components So far, CUDA only works on nVidia MP is CPU-only

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL vs Alternatives

OpenCL is a cross-platform open standard CUDA has closed source components So far, CUDA only works on nVidia MP is CPU-only OpenCL ecosystem is less developed

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What do you need?

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What do you need?

Any computer with a CPU or Cell

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What do you need?

Any computer with a CPU or Cell Can also have a GPU (more parallelism!)

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What do you need?

Any computer with a CPU or Cell Can also have a GPU (more parallelism!) Developer driver and OpenCL compiler

Introduction to OpenCL and GPU Programming Katharine Hyatt

1 Basics
Basics GPGPU Concepts Beginning Example Getting Code Working

2 GPGPU Concepts

3 Beginning Example

4 Getting Code Working

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

How is it dierent?

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

How is it dierent?

GPUs have more restrictions than CPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

How is it dierent?

GPUs have more restrictions than CPU Designed for one task, not many

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

How is it dierent?

GPUs have more restrictions than CPU Designed for one task, not many Performance greatly aected by two factors:

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

How is it dierent?

GPUs have more restrictions than CPU Designed for one task, not many Performance greatly aected by two factors: Memory access pattern

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

How is it dierent?

GPUs have more restrictions than CPU Designed for one task, not many Performance greatly aected by two factors: Memory access pattern Instruction conguration

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

How is it dierent?

GPUs have more restrictions than CPU Designed for one task, not many Performance greatly aected by two factors: Memory access pattern Instruction conguration Must keep track of memory spaces!

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Talking to the GPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Talking to the GPU

All functions are controlled from CPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Talking to the GPU

All functions are controlled from CPU CPU launches a GPU fuction (kernel)

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Talking to the GPU

All functions are controlled from CPU CPU launches a GPU fuction (kernel) CPU regains control before function nishes

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Talking to the GPU

All functions are controlled from CPU CPU launches a GPU fuction (kernel) CPU regains control before function nishes Memory transfers can occur alongside computation

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernels

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernels

Called from host

- execute on device

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernels

Called from host

- execute on device
Function

instances execute concurrently on threads

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernels

Called from host

- execute on device
Function

instances execute concurrently on threads


Must tell device

how many threads to use

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2 3 4 5 6 7

More Kernels

const char p a r t i c l e s = k e r n e l void update state (

global

f l o a t 4 p

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2 3 4 5 6 7

More Kernels
Device performs identical

operations on data

const char p a r t i c l e s = k e r n e l void update state (

global

f l o a t 4 p

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

More Kernels
Device performs identical

operations on data
Launch kernels using task

queue
1 2 3 4 5 6 7 const char p a r t i c l e s = k e r n e l void update state ( global

f l o a t 4 p

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

More Kernels
Device performs identical

operations on data
Launch kernels using task

queue
1 Information about kernel 2

given to device

3 4 5 6 7

const char p a r t i c l e s = k e r n e l void update state (

global

f l o a t 4 p

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

More Kernels
Device performs identical

operations on data
Launch kernels using task

queue
1 Information about kernel 2

given to device
How many work-groups

and work-items needed?

3 4 5 6 7

const char p a r t i c l e s = k e r n e l void update state (

global

f l o a t 4 p

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

More Kernels
Device performs identical

operations on data
Launch kernels using task

queue
1 Information about kernel 2

given to device
How many work-groups

and work-items needed?


Which arguments does

3 4 5 6 7

const char p a r t i c l e s = k e r n e l void update state (

global

f l o a t 4 p

kernel take?

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

More Kernels
Device performs identical

operations on data
Launch kernels using task

queue
1 Information about kernel 2

given to device
How many work-groups

and work-items needed?


Which arguments does

3 4 5 6 7

const char p a r t i c l e s = k e r n e l void update state (

global

f l o a t 4 p

kernel take?
Function denition passed

as a string

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Work-groups and Work-items

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Work-groups and Work-items


Logical structures

used to group processing

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Work-groups and Work-items


Logical structures

used to group processing


Workgroups

processed independently on device cores

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Work-groups and Work-items


Logical structures

used to group processing


Workgroups

processed independently on device cores


Each workgroup

contains $INTEGER wavefronts

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Work-groups and Work-items


Logical structures

used to group processing


Workgroups

processed independently on device cores


Each workgroup

contains $INTEGER wavefronts


Scheduling of

workgroups handled by GPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Work-groups and Work-items


Logical structures

used to group processing


Workgroups

processed independently on device cores


Each workgroup

contains $INTEGER wavefronts


Scheduling of

workgroups handled by GPU


Can create more

workgroups than cores

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Wavefronts

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Wavefronts

Work-items per wavefront is device dependent

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Wavefronts

Work-items per wavefront is device dependent nVidia and some AMD cards - 32

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Wavefronts

Work-items per wavefront is device dependent nVidia and some AMD cards - 32 Newer AMD cards - 64

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Wavefronts

Work-items per wavefront is device dependent nVidia and some AMD cards - 32 Newer AMD cards - 64 Dierent instructions within wavefront causes serialization

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Wavefronts

Work-items per wavefront is device dependent nVidia and some AMD cards - 32 Newer AMD cards - 64 Dierent instructions within wavefront causes serialization Dierent instructions between wavefronts is ne

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Synchronization

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Synchronization

Two types of synchronization

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Synchronization

Two types of synchronization Work-group

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Synchronization

Two types of synchronization Work-group Work-items in wavefront execute same instruction Command

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Synchronization

Two types of synchronization Work-group Work-items in wavefront execute same instruction No work-item proceeds until all nished Command

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Synchronization

Two types of synchronization Work-group Work-items in wavefront execute same instruction No work-item proceeds until all nished Command Orders commands in instruction queue

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Synchronization

Two types of synchronization Work-group Work-items in wavefront execute same instruction No work-item proceeds until all nished Command Orders commands in instruction queue Change memory value subsequent commands notice

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Host and Device Memory Spaces

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Host and Device Memory Spaces

Generally, GPU cannot access CPU memory

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Host and Device Memory Spaces

Generally, GPU cannot access CPU memory CPU indirectly accesses GPU through API

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Host and Device Memory Spaces

Generally, GPU cannot access CPU memory CPU indirectly accesses GPU through API Can map CPU pointers to GPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Host and Device Memory Spaces

Generally, GPU cannot access CPU memory CPU indirectly accesses GPU through API Can map CPU pointers to GPU Kernels on separate devices

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Within The Kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Within The Kernel

Kernel can discover information about itself

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Within The Kernel

Kernel can discover information about itself Location within a work-group

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Within The Kernel

Kernel can discover information about itself Location within a work-group Which workgroup contains the kernel instance

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Within The Kernel

Kernel can discover information about itself Location within a work-group Which workgroup contains the kernel instance Kernel can access three types of memory

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Within The Kernel

Kernel can discover information about itself Location within a work-group Which workgroup contains the kernel instance Kernel can access three types of memory Will use this later during example

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Global memory - RAM on GPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Global memory - RAM on GPU Far from computing chip - slow access

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Global memory - RAM on GPU Far from computing chip - slow access More space than any other type

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Global memory - RAM on GPU Far from computing chip - slow access More space than any other type Local memory - shared within wavefront

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Global memory - RAM on GPU Far from computing chip - slow access More space than any other type Local memory - shared within wavefront Physically close to chip - fast access

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Global memory - RAM on GPU Far from computing chip - slow access More space than any other type Local memory - shared within wavefront Physically close to chip - fast access Small amount of space available

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Private Memory - unique to work-item

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Private Memory - unique to work-item Most non-local variables declared within kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Private Memory - unique to work-item Most non-local variables declared within kernel Stored in global memory - slow

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Private Memory - unique to work-item Most non-local variables declared within kernel Stored in global memory - slow Registers - unique to work-item

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Private Memory - unique to work-item Most non-local variables declared within kernel Stored in global memory - slow Registers - unique to work-item Similar to CPU registers

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Kernel Memory

Private Memory - unique to work-item Most non-local variables declared within kernel Stored in global memory - slow Registers - unique to work-item Similar to CPU registers Physically close to chip - fast access

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What does CPU control mean?

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What does CPU control mean?

GPU memory managed from CPU code

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What does CPU control mean?

GPU memory managed from CPU code All kernels launched from CPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

What does CPU control mean?

GPU memory managed from CPU code All kernels launched from CPU

Introduction to OpenCL and GPU Programming Katharine Hyatt

1 Basics
Basics GPGPU Concepts Beginning Example Getting Code Working

2 GPGPU Concepts

3 Beginning Example

4 Getting Code Working

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Physical Problem

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Physical Problem

n point charges aected by potential

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Physical Problem

n point charges aected by potential Source located at (0, 0)

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Physical Problem

n point charges aected by potential Source located at (0, 0)


r Potential has form r

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Physical Problem

n point charges aected by potential Source located at (0, 0)


r Potential has form r For now, particles dont interact

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Physical Problem

n point charges aected by potential Source located at (0, 0)


r Potential has form r For now, particles dont interact

Write OpenCL to model system

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Getting Started

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Getting Started

Need to include relevant libraries

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Getting Started

Need to include relevant libraries Initialize OpenCL API

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Getting Started

Need to include relevant libraries Initialize OpenCL API Must detect and select usable devices

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Getting Started

Need to include relevant libraries Initialize OpenCL API Must detect and select usable devices Set up command queue and context

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Getting Started

Need to include relevant libraries Initialize OpenCL API Must detect and select usable devices Set up command queue and context Specify runtime compilation of kernels

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2

Preprocessor

#i n c l u d e <CL/ c l . h> // i n c l u d e t h e OpenCL l i b r a r y #i n c l u d e <s t d i o . h>

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2 3 4 5

Starting OpenCL and nding a GPU

c l p l a t f o r m i d p l a t f o r m ; // f i n d i n g an a p p r o p r i a t e p l a t f o r m c l G e t P l a t f o r m I d s ( 1 , &p l a t f o r m , NULL ) ; // o n l y l o o k f o r one c l d e v i c e i d d e v i c e ; // f i n d i n g an a p p r o p r i a t e GPU c l G e t D e v i c e I d s ( p l a t f o r m , CL DEVICE TYPE GPU , 1 , &d e v i c e , NULL ) ; // o n l y

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2

Command queue and the context

= c l C r e a t e C o n t e x t (NULL , 1 , &d e v i c e , NULL , NULL , N cl context context c l c o m m a n d q u e u e queue = clCreateCommandQueue ( c o n t e x t , d e v i c e , 0 , NULL ) ;

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2 3 4

Building a kernel

c l p r o g r a m program = c l C r e a t e P r o g r a m W i t h S o u r c e ( c o n t e x t , 1 , & p a r t i c l e s , N c l B u i l d P r o g r a m ( program , 1 , &d e v i c e , NULL , NULL , NULL ) ; cl kernel kernel = c l C r e a t e K e r n e l ( program , p a r t i c l e s , NULL ) ;

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Setting Up CPU Storage

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Setting Up CPU Storage

Create initial state rst on CPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Setting Up CPU Storage

Create initial state rst on CPU Must copy state to GPU

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Setting Up CPU Storage

Create initial state rst on CPU Must copy state to GPU Use same data structure for arrays

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Setting Up CPU Storage

Create initial state rst on CPU Must copy state to GPU Use same data structure for arrays Choose one ecient for device architecture

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts 1 2 3 4 5 Getting Code 6 Working 7 8 9 10 11 12 13 14 15 Beginning Example

Allocating and lling host arrays

c l f l o a t 4 h pos = ( c l f l o a t 4 ) malloc ( num particles s i z e o f ( c l f l o a c l f l o a t 4 h vel = ( c l f l o a t 4 ) malloc ( num particles s i z e o f ( c l f l o a f o r ( i n t i i = 0 ; i i < n u m p a r t i c l e s ; ++ i i ) { // S t o r i n g t h e c h a r g e o f p a r t i c l e i n p o s h p o s [ i i ] . w = ( f l o a t ) ( i i % 2 ) ( 1 ) ( i i % 3 + 1 ) ; // L e t s s t a r t a l l o u r c h a r g e s i n a r i n g a r o u n d t h e s o u r c e h p o s [ i i ] . x = 2 c o s ( 2 M PI ( f l o a t ) i i / n u m p a r t i c l e s ) ; h p o s [ i i ] . y = 2 s i n ( 2 M PI ( f l o a t ) i i / n u m p a r t i c l e s ) ; //And k e e p them s t a t i o n a r y h vel [ i i ] . x = 0. f ; h vel [ i i ] . y = 0. f ; }

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Creating GPU Storage

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Creating GPU Storage

Create typed buers to store data

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Creating GPU Storage

Create typed buers to store data Copy data from host to device

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Creating GPU Storage

Create typed buers to store data Copy data from host to device Can do both with clCreateBuffer call

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Creating GPU Storage

Create typed buers to store data Copy data from host to device Can do both with clCreateBuffer call Want to pick appropriate data structure

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Creating GPU Storage

Create typed buers to store data Copy data from host to device Can do both with clCreateBuffer call Want to pick appropriate data structure Vectors better than scalars on AMD

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Creating GPU Storage

Create typed buers to store data Copy data from host to device Can do both with clCreateBuffer call Want to pick appropriate data structure Vectors better than scalars on AMD Store position and velocites in float4

GPU Arrays

1 2 3 4

cl mem p o s = c l C r e a t e B u f f e r ( c o n t e x t , CL MEM READ WRITE | CL num particles sizeof ( cl float4 ) , cl mem v e l = c l C r e a t e B u f f e r ( c o n t e x t , CL MEM READ WRITE | CL num particles sizeof ( cl float4 ) ,

MEM COPY HOS h p o s , NULL ) MEM COPY HOS h v e l , NULL )

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Calling the Kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Calling the Kernel

Set kernel arguments

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Calling the Kernel

Set kernel arguments Push kernel launch into task queue

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Calling the Kernel

Set kernel arguments Push kernel launch into task queue Launch kernel once for each iteration

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2 3 4 5 6 7 8 9

Setting Arguments and Launching

clSetKernelArg ( particles , 0 , num particles sizeof ( c l f l o a t 4 clSetKernelArg ( particles , 1 , num particles sizeof ( c l f l o a t 4 c l S e t K e r n e l A r g ( p a r t i c l e s , 2 , s i z e o f ( f l o a t ) , &s t r e n g t h ) ; c l S e t K e r n e l A r g ( p a r t i c l e s , 3 , s i z e o f ( f l o a t ) , &d e l t a t ) ; f o r ( i n t j j = 0 ; j j < n u m i t e r a t i o n s ; ++ j j ) { c l E n q u e u e N D R a n g e K e r n e l ( queue , p a r t i c l e s , 1 , 0 , &g w o r k NULL , }

) , &p o s ) ; ) , &v e l ) ;

size , 0 , NULL ,

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Writing the Kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Writing the Kernel

Designate function as kernel using

kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Writing the Kernel

Designate function as kernel using

kernel

Must designate where arguments reside

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Writing the Kernel

Designate function as kernel using

kernel

Must designate where arguments reside Particles dont interact use one array

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Writing the Kernel

Designate function as kernel using

kernel

Must designate where arguments reside Particles dont interact use one array One-to-one map between threads and elements

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Writing the Kernel

Designate function as kernel using

kernel

Must designate where arguments reside Particles dont interact use one array One-to-one map between threads and elements Need to nd thread number

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working 1 2 3 4 5 6 7 8 9

Beginning Kernel

k e r n e l v o i d u p d a t e s t a t e ( g l o b a l f l o a t 4 pos , { // F i g u r e o u t w h i c h p a r t i c l e we a r e h a n d l i n g uint current = get global id (0);

g l o b a l f l o a t 4 vel , f l o a t strength , float delta t , int num particles )

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Updating the State

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Updating the State

Need to pick integration scheme

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Updating the State

Need to pick integration scheme Euler is easy, but unstable

x(t + t) =x(t) + t vx (t) vx (t + t) =v (t) + t ax (t)

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Updating the State

Need to pick integration scheme Euler is easy, but unstable

x(t + t) =x(t) + t vx (t) vx (t + t) =v (t) + t ax (t)

Find particles position in polar coordinates

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Updating the State

Need to pick integration scheme Euler is easy, but unstable

x(t + t) =x(t) + t vx (t) vx (t + t) =v (t) + t ax (t)

Find particles position in polar coordinates Update position, then velocity

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Updating the State

Need to pick integration scheme Euler is easy, but unstable

x(t + t) =x(t) + t vx (t) vx (t + t) =v (t) + t ax (t)

Find particles position in polar coordinates Update position, then velocity Avoid array overruns

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts 1 2 3 Beginning 4 Example 5 6 Getting Code 7 Working 8 9 10 11 12 13 14 15 16 17

Kernel Body

i f ( current < num particles ) { // C a l c u l a t e new a c c e l e r a t i o n float4 accel ; a c c e l .w = pos [ c u r r e n t ] . w s t r e n g t h / ( pos [ c u r r e n t ] . x pos [ c u r r e n t ] . x + po f l o a t t h e t a = atan2 ( pos [ c u r r e n t ] . y / pos [ c u r r e n t ] . x ) ; accel . x = a c c e l .w cos ( theta ) d e l t a t ; accel . y = a c c e l .w s i n ( theta ) d e l t a t ; // F i n d new p o s i t i o n s and p o s [ c u r r e n t ] . x += d e l t a t p o s [ c u r r e n t ] . y += d e l t a t v e l [ c u r r e n t ] . x += d e l t a t v e l [ c u r r e n t ] . y += d e l t a t } velocities vel [ current ] . x ; vel [ current ] . y ; accel . x ; accel . y ;

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

tableofcontents[currentsection]

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Moving and Compiling

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Moving and Compiling

scp code to

high-fructose-corn-syrup.csclub.uwaterloo.ca:

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Moving and Compiling

scp code to

high-fructose-corn-syrup.csclub.uwaterloo.ca:
ssh in to this machine

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Moving and Compiling

scp code to

high-fructose-corn-syrup.csclub.uwaterloo.ca:
ssh in to this machine Developer drivers and compiler already installed

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Moving and Compiling

scp code to

high-fructose-corn-syrup.csclub.uwaterloo.ca:
ssh in to this machine Developer drivers and compiler already installed Two steps necessary:

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Moving and Compiling

scp code to

high-fructose-corn-syrup.csclub.uwaterloo.ca:
ssh in to this machine Developer drivers and compiler already installed Two steps necessary: Compile code with g++

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Moving and Compiling

scp code to

high-fructose-corn-syrup.csclub.uwaterloo.ca:
ssh in to this machine Developer drivers and compiler already installed Two steps necessary: Compile code with g++ Link against OpenCL library

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Compilation Steps

g++ -o mycode.o -DATI OS LINUX -c mycode.cl -I$ATISTREAMSDKROOT/include g++ -o mycode mycode.o -lOpenCL -L$ATISTREAMSDKROOT/lib/x86 64

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Testing

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Testing

Kernels compile JIT pass options then

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Testing

Kernels compile JIT pass options then Can use gdb to test program

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Testing

Kernels compile JIT pass options then Can use gdb to test program Can also set breakpoints in kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Testing

Kernels compile JIT pass options then Can use gdb to test program Can also set breakpoints in kernel Lets see if program works

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Using gdb

Before launching gdb, use: AMD OCL BUILD OPTIONS="-g -O0" Then use: gdb mycode.out Use r to run the code

Introduction to OpenCL and GPU Programming Katharine Hyatt

1 Basics
Basics GPGPU Concepts Beginning Example Getting Code Working

2 GPGPU Concepts

3 Beginning Example

4 Getting Code Working

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Improvements

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Improvements

Incorporate OpenGL - graph particle positions

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Improvements

Incorporate OpenGL - graph particle positions More accurate simulation - make particles interact

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Improvements

Incorporate OpenGL - graph particle positions More accurate simulation - make particles interact Use local memory to speed up kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Improvements

Incorporate OpenGL - graph particle positions More accurate simulation - make particles interact Use local memory to speed up kernel Do time iteration within kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Improvements

Incorporate OpenGL - graph particle positions More accurate simulation - make particles interact Use local memory to speed up kernel Do time iteration within kernel Use AMD Proler to analyze code

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Good Practices

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Good Practices

Keep work-items within wavefront instruction coherent

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Good Practices

Keep work-items within wavefront instruction coherent Use local and register memory

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Good Practices

Keep work-items within wavefront instruction coherent Use local and register memory Use appropriate data structure for architecture

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Good Practices

Keep work-items within wavefront instruction coherent Use local and register memory Use appropriate data structure for architecture Minimize control ow instructions within kernel

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Learning More

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Learning More

Kronos groups OpenCL spec:

http://www.khronos.org/opencl/

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Learning More

Kronos groups OpenCL spec:

http://www.khronos.org/opencl/
AMDs OpenCL tutorials and documentation:

http://developer.amd.com/

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Learning More

Kronos groups OpenCL spec:

http://www.khronos.org/opencl/
AMDs OpenCL tutorials and documentation:

http://developer.amd.com/
nVidias OpenCL sample code:

http://developer.nvidia.com/opencl

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

Learning More

Kronos groups OpenCL spec:

http://www.khronos.org/opencl/
AMDs OpenCL tutorials and documentation:

http://developer.amd.com/
nVidias OpenCL sample code:

http://developer.nvidia.com/opencl
Heterogenous Computing with OpenCL - CSC has copies

Questions?

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL contest

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL contest

Two categories:

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL contest

Two categories: Open submission - make something awesome!

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL contest

Two categories: Open submission - make something awesome! Problem - ...

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL contest

Two categories: Open submission - make something awesome! Problem - ... Contest code party - March 02 2012

Introduction to OpenCL and GPU Programming Katharine Hyatt Basics GPGPU Concepts Beginning Example Getting Code Working

OpenCL contest

Two categories: Open submission - make something awesome! Problem - ... Contest code party - March 02 2012 Win a laptop or graphics card!

You might also like