PostgreSQL OpenCL Procedural Language #Pgeast March 2011

Introducing PgOpenCL
A New PostgreSQL
Procedural Language
Unlocking the Power of the GPU!
By
Tim Child
Bio
Tim Child
• 35 years experience of software development
• Formerly
• VP Oracle Corporation
• VP BEA Systems Inc.
• VP Informix
• Leader at Illustra, Autodesk, Navteq, Intuit, …
• 30+ years experience in 3D, CAD, GIS and DBMS
Terminology
Term Description
Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )
GPU Graphics Processing Unit (highly specialized CPU for graphics)
GPGPU General Purpose GPU (non-graphics programming on a GPU)
CUDA Nvidia’s GPU programming environment
AMD Fusion Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip)
AVX Advanced Vector Extensions
Intel SandyBridge Multi-core CPU with AVX Instructions
ISO C99 Modern standard version of the C language
OpenCL Open Compute Language
OpenMP Open Multi-Processing (parallelizing compilers)
SIMD Single Instruction Multiple Data (Vector instructions )
SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions
ALU Arithmetic Logic Unit CPU components that does Arithmetic
Kernel Functions that execute on a OpenCL Device

Work Item Instance of a Kernel
Workgroup A group of Work Items

Compute Intensive
Database Applications
• Bioinformatics
• Signal/Audio/Image Processing/Video
• Data Mining & Analytics
• Searching
• Sorting
• Spatial Selections and Joins
• Map/Reduce
• Scientific Computing
• Database Internal Operations
• Many Others …
GPU vs CPU
Vendor NVidia ATI Radeon Intel
Architecture Fermi Evergreen Nehalem
Cores 448 1600 4
Simple Simple Complex
Transistors 3.1 B 2.15 B 731 M
Clock 1.5 G Hz 851 M Hz 3 G Hz
Peak Float 1500 G 2720 G 96 G
Performance FLOP / s FLOP / s FLOP / s
Peak Double 750 G 544 G 48 G
Performance FLOP / s FLOP / s FLOP / s
Memory ~ 190 G / s ~ 153 G / s ~ 30 G / s
Bandwidth
Power 250 W > 250 W 80 W
Consumption
SIMD / Vector Many Many SSE4+
Instructions
Multi-Core Performance
5000 AMD
6990
Mar
2011
Source NVidia
Scalar vs. SIMD
Scalar Instruction
C=A+B 1 + 2 = 3
SIMD Instruction 1 3 5 7
+
Vector C = Vector A + Vector B 2 4 6 8
=
3 7 11 15
OpenCL
Vector lengths 2,4,8,16 for char, short, int, float, double
Scaling PostgreSQL Queries
on xPU’s
Multi-Core CPU Many Core GPU
PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL

Threads Threads Threads Threads Threads Threads Threads Thread Thread
PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL

Postgres Threads Threads Threads Thread Thread
Process
PgOpenCL PgOpenCL PgOpenCL PgOpenCL

PgOpenCL
Threads Threads Thread Thread
Threads
Using More
Transistors
Parallel
Programming Systems
Category CUDA OpenMP OpenCL
Language C C, Fortran C
Cross Platform X √ √
Standard Vendor OpenMP Khronos
CPU X √ √
GPU √ X √
Clusters X √ X
Compilation / Link Static Static Dynamic

What is OpenCL?
• OpenCL - Open Compute Language
– Subset of C 99
– Open Specification
– Proposed by Apple
– Many Companies Collaborated on the Specification
– Portable, Device Agnostic
– Specification maintained by Khronos Group
• PgOpenCL
– OpenCL as a PostgreSQL Procedural Language
System Overview
DBMS Server
PgOpenCL
PgOpenCL
Web HTTP Web SQL SQL
SQL
Browser Server Statement Procedure
Procedure
PCIe x2 Bus
TCP/IP
App
PostgreSQL GPGPU
Server
Disk I/O Tables

TCP/IP
PostgreSQL
Client
OpenCL
Language
• A subset of ISO C99
– - But without some C99 features such as standard C99 headers,
– function pointers, recursion, variable length arrays, and bit fields
• A superset of ISO C99 with additions for:
– - Work-items and Workgroups
– - Vector types
– - Synchronization
– - Address space qualifiers
• Also includes a large set of built-in functions
– - Image manipulation
– - Work-item manipulation,
– - Specialized math routines, etc.
PgOpenCL
Components
• New PostgreSQL Procedural Language
– Language Handler
• Determines N Dimensional Range Work Group Size ( Number of Threads)
• Maps Arguments to Buffers
• Retains Buffers for Reuse
• Calls Kernel Function
• Returns results
– Language Validator
• Creates Function – Kernel Binding
– Parameter Modes, Names, Types, Qualifiers and Attribute
• Compiles
– Syntax Checking
– Generate Program Binary
• New data types
– cl_double4, cl_double8, ….
• System Admin Pseudo-Tables
– Platform, Device, Run-Time, …
• Map – Increase Function
– Maps a Query to GPU sized Chunks
PgOpenCL
Admin
PGOpenCL
Function Declaration
CREATE or REPLACE FUNCTION VectorAdd(IN Id int[], IN a real[], IN B real[], OUT C
real[] )
AS $BODY$
__kernel void VectorAdd( __global int * id,

__global float *a,
__global float *b,
__global float *c)
{
int i = get_global_id(0); /* Query OpenCL for the Array Subscript **/
c[i] = a[i] + b[i];

}
$BODY$
Language PgOpenCL;
Running the SQL
--- Create a vectors table and insert values into columns a and b
Create table vectors ( id serial, a float[4], b

float[4], c float[4] );
Insert into vectors ( a, b) values ( …. ), ( … );
-- Select the OpenCL Platform and Device
Select OpenCL.SetPlatform(‘ATI Stream’, ‘GPU’);
-- Run the OpenCL Vector Add Function as an Update statement
Update vectors set vectors.c = temp.c
From
Select Vectoradd( Array( select Row( id, a, b,
C) from vectors) ) as temp
Where vectors.id = id;

PgOpenCL
Execution Model
A
Table
B
Select Table 100’s - 1000’s of

to Array Threads (Kernels)
xPU
VectorAdd(A, B)
A + B Returns C = C
Copy Unnest Array

Copy To Table
Table
C C C C C C C C C C C C C
Using
Re-Shaped Tables
100’s - 1000’s of
Table of Threads (Kernels) Table of
Arrays Arrays
A + B = C
A
C C C C
B
xPU
VectorAdd(A, B)
Returns C
A
C C C C
B
Copy
Copy
Today’s GPGPU
Challenges
• No Pre-emptive Multi-Tasking
• No Virtual Memory
• Limited Bandwidth to discrete GPGPU
– 1 – 8 G/s over PCIe Bus
• Hard to Program
– New Parallel Algorithms and constructs
– “New” C language dialect
• Immature Tools
– Compilers, IDE, Debuggers, Profilers - early years
• Data organization really matters
– Types, Structure, and Alignment
– SQL needs to Shape the Data
• Profiling and Debugging is not easy
Solves Well for Problem Sets with the Right Shape!

Making a Problem
Work for You
• Determine % Parallelism Possible
for ( i = 0, i < ∞, i++)
for ( j = 0; j < ∞; j++ )
for ( k = 0; k < ∞; k++ )
• Arrange data to fit available GPU RAM

• Ensure calculation time >> I/O transfer overhead
• Learn about Parallel Algorithms and the OpenCL language
• Learn new tools
• Carefully choose Data Types, Organization and Alignments
• Profile and Measure at Every Stage
Summarizing
PgOpenCL
• Supports Heterogeneous Parallel Compute Environments
• CPU’s, GPU’s, APU’s
• OpenCL
• Portable and high-performance framework
–Ideal for computationally intensive algorithms
–Access to all compute resources (CPU, APU, GPU)
–Well-defined computation/memory model
•Efficient parallel programming language
–C99 with extensions for task and data parallelism
–Rich set of built-in functions
•Open standard for heterogeneous parallel computing
• PgOpenCL
• Integrates PostgreSQL with OpenCL
• Provides Easy SQL Access ..
• APU, CPU, GPGPU Power
• Integrates OpenCL with …
• SQL + Web Apps(PHP, Ruby, … )
Computation Databases
PostgreSQL
OpenCL
Database
Open Internal
Source Operations
Release
More
Information
• PGOpenCL
• Twitter @3DMashUp
• www.scribd.com/3dmashup
• OpenCL
• www.khronos.org/opencl/
• www.amd.com/us/products/technologies/stream-technology/opencl/
• http://software.intel.com/en-us/articles/intel-opencl-sdk
• http://www.nvidia.com/object/cuda_opencl_new.html
• http://developer.apple.com/technologies/mac/snowleopard/opencl.html

PostgreSQL OpenCL Procedural Language #Pgeast March 2011

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PostgreSQL OpenCL Procedural Language #Pgeast March 2011

Uploaded by

Copyright:

Available Formats

Introducing PgOpenCL

GPU Graphics Processing Unit (highly specialized CPU for graphics)

GPGPU General Purpose GPU (non-graphics programming on a GPU)

CUDA Nvidia’s GPU programming environment

AVX Advanced Vector Extensions

Intel SandyBridge Multi-core CPU with AVX Instructions

ISO C99 Modern standard version of the C language

OpenCL Open Compute Language

OpenMP Open Multi-Processing (parallelizing compilers)

SIMD Single Instruction Multiple Data (Vector instructions )

SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions

ALU Arithmetic Logic Unit CPU components that does Arithmetic

Kernel Functions that execute on a OpenCL Device

Workgroup A group of Work Items

• Data Mining & Analytics

• Spatial Selections and Joins

• Database Internal Operations

PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL

PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL

PgOpenCL PgOpenCL PgOpenCL PgOpenCL

Compilation / Link Static Static Dynamic

Disk I/O Tables

__kernel void VectorAdd( __global int * id,

c[i] = a[i] + b[i];

Create table vectors ( id serial, a float[4], b

Select OpenCL.SetPlatform(‘ATI Stream’, ‘GPU’);

-- Run the OpenCL Vector Add Function as an Update statement

Update vectors set vectors.c = temp.c

Where vectors.id = id;

Select Table 100’s - 1000’s of

Copy Unnest Array

Solves Well for Problem Sets with the Right Shape!

• Arrange data to fit available GPU RAM

You might also like

kernel void VectorAdd( global int * id,