Gpumat User Guide: Version 0.1, April 2009

GPUmat User Guide
Version 0.1, April 2009

Contents
Contents 2
1 Introduction 8
1.1 About GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 System requirements . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Credits and licensing . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 How to install . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Documentation overview . . . . . . . . . . . . . . . . . . . . . . 10
2 Quick start 11
2.1 Matrix addition example . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Matrix multiplication example . . . . . . . . . . . . . . . . . . . 15
2.3 FFT calculation example . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Performance analisys . . . . . . . . . . . . . . . . . . . . . . . . 16
3 GPUmat overview 19
3.1 Starting the GPU environment . . . . . . . . . . . . . . . . . . . 20
3.2 Creating a GPU variable . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Performing calculations on the GPU . . . . . . . . . . . . . . . . 24
3.4 Porting existing Matlab code . . . . . . . . . . . . . . . . . . . 25
3.5 Converting a GPU variable into a Matlab variable . . . . . . . . 26
3.6 GPUmat functions . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 GPU memory management . . . . . . . . . . . . . . . . . . . . 28
3.8 Coding guidelines . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8.1 Memory transfers . . . . . . . . . . . . . . . . . . . . . 30
3.8.2 Vectorized code and for-loops . . . . . . . . . . . . . . . 30
3.8.3 Matlab and GPUsingle variables . . . . . . . . . . . . . . 32
3.9 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . 33
2
CONTENTS
CONTENTS
4 Developer’s section 34
4.1 The GPUsingle class . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 GPUsingle constructor . . . . . . . . . . . . . . . . . . 36
4.1.2 GPUsingle properties . . . . . . . . . . . . . . . . . . . 38
4.1.3 GPUsingle methods . . . . . . . . . . . . . . . . . . . . 40
4.2 Low level GPU memory management . . . . . . . . . . . . . . . 41
4.2.1 Memory management using the GPUsingle class . . . . . 41
4.2.2 Memory management using low level functions . . . . . . 41
4.3 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 CUBLAS functions . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 CUFFT functions . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Frequently Asked Questions 48

5.1 What happens if GPUmat and Matlab variables are used together? 48
5.2 Is any Matlab function executed on GPU by using GPUsingle? . 49
5.3 What operations should I perform on the GPU? . . . . . . . . . 50
6 Function Reference 51
6.1 Functions - by category . . . . . . . . . . . . . . . . . . . . . . 51
6.1.1 GPU startup and management . . . . . . . . . . . . . . 51
6.1.2 GPU variables management . . . . . . . . . . . . . . . . 51
6.1.3 GPU memory management . . . . . . . . . . . . . . . . 52
6.1.4 Numerical functions . . . . . . . . . . . . . . . . . . . . 52
6.1.5 General information . . . . . . . . . . . . . . . . . . . . 53
6.1.6 Complex numbers . . . . . . . . . . . . . . . . . . . . . 54
6.1.7 CUBLAS functions . . . . . . . . . . . . . . . . . . . . . 54
6.1.8 CUDA Driver functions . . . . . . . . . . . . . . . . . . 55
6.1.9 CUFFT functions . . . . . . . . . . . . . . . . . . . . . 55
6.1.10 CUDA run-time functions . . . . . . . . . . . . . . . . . 56
6.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2.1 A & B . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.2 A’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2.3 A == B . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.4 A >= B . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2.5 A > B . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.6 A <= B . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2.7 A < B . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.8 A - B . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.9 A / B . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.10 A * B . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.11 A ~= B . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3 GPUmat Guide Version 0.1. Copyright gp-you.ch.

CONTENTS
CONTENTS
6.2.12 ~A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.13 A | B . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.14 A + B . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2.15 A . ^B . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.16 A ./ B . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.17 A(I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.18 A .* B . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.19 A .’ . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.20 [A;B] . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 High level functions - alphabetical list . . . . . . . . . . . . . . . 78
6.3.1 abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.2 acos . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.3 acosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.4 and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3.5 asin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.6 asinh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3.7 atan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.8 atanh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3.9 ceil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.10 colon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.11 conj . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.12 cos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3.13 cosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.14 ctranspose . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.15 display . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.16 double . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.17 eq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.18 exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.3.19 fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3.20 fft2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.21 floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3.22 ge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3.23 GPUinfo . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.24 GPUmem . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.25 GPUsingle . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.26 GPUstart . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.27 GPUsync . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.28 gt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3.29 ifft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.30 ifft2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.31 iscomplex . . . . . . . . . . . . . . . . . . . . . . . . . . 106

CONTENTS
CONTENTS
6.3.32 isempty . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.3.33 isreal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.34 isscalar . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3.35 ldivide . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.36 le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.3.37 length . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3.38 log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3.39 log10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3.40 log1p . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.41 log2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3.42 lt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.43 minus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.44 mrdivide . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.45 mtimes . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.3.46 ndims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3.47 ne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.48 not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.49 numel . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.3.50 ones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3.51 or . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3.52 plus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3.53 power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3.54 rdivide . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3.55 round . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3.56 sin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.3.57 single . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3.58 sinh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.59 size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3.60 sqrt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3.61 subsref . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.62 sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3.63 tan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.64 tanh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.65 times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3.66 transpose . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3.67 uminus . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.3.68 vertcat . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3.69 zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.4 Low level functions - alphabetical list . . . . . . . . . . . . . . . 145
6.4.1 cublasAlloc . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.4.2 cublasCgemm . . . . . . . . . . . . . . . . . . . . . . . 146

CONTENTS
CONTENTS
6.4.3 cublasCheckStatus . . . . . . . . . . . . . . . . . . . . . 147

6.4.4 cublasError . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.4.5 cublasFree . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.4.6 cublasGetError . . . . . . . . . . . . . . . . . . . . . . . 149
6.4.7 cublasGetVector . . . . . . . . . . . . . . . . . . . . . . 150
6.4.8 cublasInit . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.4.9 cublasIsamax . . . . . . . . . . . . . . . . . . . . . . . . 152
6.4.10 cublasIsamin . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4.11 cublasResult . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4.12 cublasSasum . . . . . . . . . . . . . . . . . . . . . . . . 154
6.4.13 cublasSaxpy . . . . . . . . . . . . . . . . . . . . . . . . 155
6.4.14 cublasScopy . . . . . . . . . . . . . . . . . . . . . . . . 156
6.4.15 cublasSdot . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.4.16 cublasSetVector . . . . . . . . . . . . . . . . . . . . . . 158
6.4.17 cublasSgemm . . . . . . . . . . . . . . . . . . . . . . . 159
6.4.18 cublasShutdown . . . . . . . . . . . . . . . . . . . . . . 160
6.4.19 cublasSnrm2 . . . . . . . . . . . . . . . . . . . . . . . . 160
6.4.20 cublasSrot . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4.21 cublasSscal . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4.22 cuCheckStatus . . . . . . . . . . . . . . . . . . . . . . . 162
6.4.23 cudaCheckStatus . . . . . . . . . . . . . . . . . . . . . . 162
6.4.24 cudaGetDeviceCount . . . . . . . . . . . . . . . . . . . . 163
6.4.25 cudaGetDeviceMajorMinor . . . . . . . . . . . . . . . . . 163
6.4.26 cudaGetDeviceMemory . . . . . . . . . . . . . . . . . . . 164
6.4.27 cudaGetDeviceMultProcCount . . . . . . . . . . . . . . . 164
6.4.28 cudaGetLastError . . . . . . . . . . . . . . . . . . . . . 165
6.4.29 cudaSetDevice . . . . . . . . . . . . . . . . . . . . . . . 165
6.4.30 cudaThreadSynchronize . . . . . . . . . . . . . . . . . . 165
6.4.31 cufftCheckStatus . . . . . . . . . . . . . . . . . . . . . . 166
6.4.32 cufftDestroy . . . . . . . . . . . . . . . . . . . . . . . . 167
6.4.33 cufftExecC2C . . . . . . . . . . . . . . . . . . . . . . . 168
6.4.34 cufftExecC2R . . . . . . . . . . . . . . . . . . . . . . . 169
6.4.35 cufftExecR2C . . . . . . . . . . . . . . . . . . . . . . . 169
6.4.36 cufftPlan1d . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.4.37 cufftPlan2d . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.4.38 cufftPlan3d . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.39 cufftResult . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.40 cufftTransformDirections . . . . . . . . . . . . . . . . . . 173
6.4.41 cufftType . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.4.42 cuInit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.4.43 cuMemGetInfo . . . . . . . . . . . . . . . . . . . . . . . 174

CONTENTS
CONTENTS
6.4.44 getPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

6.4.45 getSizeOf . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.4.46 getType . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.4.47 GPUallocVector . . . . . . . . . . . . . . . . . . . . . . 178
6.4.48 GPUdeviceInit . . . . . . . . . . . . . . . . . . . . . . . 179
6.4.49 istrans . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.4.50 packfC2C . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.4.51 packfR2C . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.4.52 setComplex . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.4.53 setReal . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.4.54 setSize . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.4.55 unpackfC2C . . . . . . . . . . . . . . . . . . . . . . . . 186
6.4.56 unpackfC2R . . . . . . . . . . . . . . . . . . . . . . . . 187
Bibliography 188

Chapter 1
Introduction
GPUmat enables Matlab code to run on the Graphical Processing Unit

(GPU). The following is a summary of GPUmat most important features:
• GPU computational power can be easily accessed from Matlab without

any GPU knowledge.
• Matlab code is directly executed on the GPU. The execution is trans-

parent to the user.
• GPUmat speeds up Matlab functions by using the GPU multi-processor

architecture.
• Existing Matlab code can be ported and executed on GPUs with few
modifications.
• GPU resources are accessed using Matlab scripting language. The fast
code protyping capability of the scripting language is combined with
the fast code execution on the GPU.
• GPUmat can be used as a Source Development Kit to create new func-

tions and extend the library functionality.
1.1 About GPUs

Although GPUs have been traditionally used only for computer graphics,
a recent technique called GPGPU (General-purpose computing on graph-
ics processing units) allows the GPUs to perform numerical computations
usually handled by CPU. The advantage of using GPUs for general purpose
computation is the performance speed up that can be achieved due to the
parallel architecture of these devices.
8
CHAPTER 1. Introduction
1.2. SYSTEM REQUIREMENTS
One of the most promising GPGPU technologies is called CUDA SDK [1],
developed by NVIDIA. For further information about CUDA, GPGPU and
related topics please check [2] [3].
1.2 System requirements

GPUmat was tested under Windows and Linux with Matlab ver. R2007a
or newer installed. CUDA should be installed on the system. Follow the
instructions on NVIDIA’s CUDA website [2] to download and install the
software.
1.3 Credits and licensing

Copyright gp-you.ch. GPUmat is distribuited as Freeware. By using GPUmat,
you accept all the terms and conditions specified in the license.txt file in the
GPUmat installation folder. Please send any suggestions, questions or bug
report to gp-you@gp-you.ch.
1.4 How to install

To install the library unpack the downloaded package and follow these steps:
• STEP1: start Matlab and change directory to the folder where the
library was unpacked.
• STEP2: start GPUmat using the GPUstart command.
• STEP3 (optional but suggested): add the library path to the Matlab
path by using the "File->Set Path" menu. The Matlab documenta-
tion describes how to add a new path. This step is not mandatory if the
GPUstart command is started from the directory where the library
was unpacked.
The GPUstart command should generate the following output in your
Matlab command window:
>> GPUstart
Starting GPU
- CUDA compute capability x.x
...done

1.5. TERMINOLOGY
If you get the following error, then GPUstart command was not found in
the Matlab path. Try again the installation steps from STEP1 to STEP3.
>> GPUstart
??? Undefined function or variable ’GPUstart’.
The GPU environment will not correctly work if a CUDA compatible

graphic card and CUDA toolkit are not installed on the system, and you will
probably get an error as follows:
Starting GPU
??? Invalid MEX-file
...
The specified module could not be found.
Error in ==> GPUstart at xx
1.5 Terminology
The following is a summary of common terms and concepts used in this
manual:
• GPU: Graphics Processing Unit. It is the graphic card. We assume

that the GPU is compatible with NVIDIA’s CUDA SDK.
• HOST: The computer where the GPU is installed.
• CPU: The Central Processing Unit installed on the HOST.
• GPU memory: the memory available on the GPU.
• CPU memory: the memory available on the HOST.
• CUDA capable GPU: a GPU compatible with NVIDIA CUDA SDK.
1.6 Documentation overview

This manual is organized as follows:

1.6. DOCUMENTATION OVERVIEW
• Quick start: describes GPUmat basic concepts by using simple exam-

ples.
• Overview: describes GPUmat high level functions.
• Developer’s section: describes low-level functions and how to imple-

ment new functions in GPUmat.
The first two chapters contains enough information to understand the basic
concepts of the library and are intended for users with at least some ex-
perience with Matlab. Chapter 4 is intended for users familiar with GPU
programming concepts, in particular with the CUDA SDK. The Function
reference can be found in Chapter 6.

Chapter 2
Quick start
The most important concepts about GPUmat are the following:

• GPU variables are allocated from Matlab using the GPUsingle class,
which corresponds to a single precision floating point variable. Cur-
rently GPUmat supports only single precision floating point variables
and it will be extended to the double precision in the future.
• A GPUsingle is effectively allocated on the GPU memory and it is
available from Matlab as any other Matlab variable. In this manual
we will call GPU variable a GPUsingle and we distinguish it from a
common Matlab variable that is allocated on the CPU memory.
• GPUmat defines functions and operators that are called from Matlab
and executed on the GPU. These functions work on data allocated on
the GPU memory using the GPUsingle class.
The next example creates two single precision Matlab variables Ah and A,
allocated on the CPU memory and on the GPU memory respectively. Ah is
used to initialize A.
Ah = single(rand(100,100)); % Ah in on CPU memory

A = GPUsingle(Ah); % A is on GPU memory
In the above code the function single in used to create the single precision
Matlab array Ah, and similarly the GPUsingle function is used to create a
single precision GPU variable. If a double precision Matlab array is used to
initialize a GPUsingle variable, it is converted to a single precision variable
resulting in a loss of precision:
Ah = rand(100,100); % Ah in on CPU memory, double precision

A = GPUsingle(Ah); % A is on GPU memory, single precision
12
CHAPTER 2. Quick start
During the initialization of the GPU variable A, the data in the Matlab array
Ah is copied from the CPU memory to the GPU memory. The data transfer
is transparent to the user.
There are several ways to create a GPUsingle, as explained in Section 3.2.
The command
A = colon(0,2,6,GPUsingle) % A is on GPU memory
results in
A =
0 2 4 6
Using the colon function to create a vector with arbitrary real increments
between the elements,
A = colon(0,.1,.5,GPUsingle) % A is on GPU memory
results in
A =
0 0.1000 0.2000 0.3000 0.4000 0.5000
In the following example, the function single is used to convert the GPU
variable C into the Matlab variable Ch. Every time a GPU variable is con-
verted into a Matlab variable, the data is copied from GPU memory to CPU
memory.

A = GPUsingle(Ah); % Create GPU variable A
Ch = single(A); % convert C (GPU) to Ch (CPU)
The following example shows:
• The creation of the GPU variable A, initialized with Matlab array Ah.
• The calculation of exp(A). The execution is on GPU and the result is

stored on the GPU variable C.
• The conversion of the result C into the Matlab variable Ch.

2.1. MATRIX ADDITION EXAMPLE

A = GPUsingle(Ah); % Create A (GPU) initialized with Ah (CPU)
C = exp(A); % exp(A) performed on GPU
Ch = single(C); % convert C (GPU) to Ch (CPU)
To visualize the contents of a GPUsingle, type the name of the variable on

the Matlab command window:
A = GPUsingle(rand(5));
ans =
0.8147 0.0975 0.1576 0.1419 0.6557

0.9058 0.2785 0.9706 0.4218 0.0357
0.1270 0.5469 0.9572 0.9157 0.8491
0.9134 0.9575 0.4854 0.7922 0.9340
0.6324 0.9649 0.8003 0.9595 0.6787
Single precision REAL GPU type.
Next sections show different examples: matrix addition, matrix multiplica-

tion and FFT calculation.
2.1 Matrix addition example

The following code can be found in the QuickStart.m file located in the
examples folder, and it shows how to port existing Matlab code and run it
on the GPU. The example creates two variables A and B, add them and store
the result into the variable C. The original Matlab code is the following:
A = single(rand(100)); % A is on CPU memory

B = single(rand(100)); % B is on CPU memory
C = A+B; % executed on CPU. C is on CPU memory
The ported GPUmat code is the following:

2.1. MATRIX ADDITION EXAMPLE
A = GPUsingle(rand(100)); % A is on GPU memory

B = GPUsingle(rand(100)); % B is on GPU memory
C = A+B; % executed on GPU. C is on GPU memory
Please note the difference between the original code and the modified code.
Every Matlab variable has been converted to the GPUsingle class: "A =
rand(100)" becomes "A = GPUsingle(rand(100))".
Any operation on GPUsingle variables generates a GPUsingle, i.e. C
(in the modified code) is also a GPUsingle. Functions involving GPUsingle
variables, like A + B in the above example, are executed on the GPU. To
convert the GPU variables A, B and C into the Matlab variables Ah, Bh and
Ch use the function single, as follows:

B = GPUsingle(rand(100)); % B is on GPU memory
Ah = single(A); %Ah is on HOST, A is on GPU

Bh = single(B); %Bh is on HOST, B is on GPU
Ch = single(C); %Ch is on HOST, C is on GPU
The following code shows a different way to initialize the arrays A and B by
using the colon function. The original Matlab code is the following:
A = single(colon(0,1,1000)); % A is on CPU memory

B = single(colon(0,1,1000)); % B is on CPU memory
C = A+B; % executed on CPU. C is on CPU memory
The ported GPUmat code is the following:
A = colon(0,1,1000,GPUsingle); % A is on GPU memory

B = colon(0,1,1000,GPUsingle); % B is on GPU memory
The Matlab expression
A = single(colon(0,1,1000));
is equivalent to

2.2. MATRIX MULTIPLICATION EXAMPLE
A = single([0:1:1000]);
and creates a vector with single precision elements having values from 0 to
1000.
Element-by-element operations, such as the the matrix addition A + B,
are highly optimized for the GPU. It is suggested to use this kind of opera-
tions as explained in Section 3.8.
2.2 Matrix multiplication example

This section describes the code to perform the following tasks:
• Create A and B on the GPU memory.
• Multiply A and B and store the results in C.
• Convert the result C into the Matlab variable Ch.
A = GPUsingle(rand(100,100)); % A is on GPU memory

B = GPUsingle(rand(100,100)); % B is on GPU memory
C = A*B; % executed on GPU, C is on GPU memory
Ch = single(C); % Ch is on CPU memory
The equivalent code on the CPU is the following:
A = single(rand(100,100)); % A is on CPU memory

B = single(rand(100,100)); % B is on CPU memory
C = A*B; % executed on CPU, C is on CPU memory
2.3 FFT calculation example

This section describes the code to perform the following tasks:
• Create two arrays A and B on the GPU.
• Calculate 1D FFT of A.
• Calculate 2D FFT of B.

2.4. PERFORMANCE ANALISYS
• Transfer results from GPU into Matlab variables Ah and Bh.
A = GPUsingle(rand(1,100)); % GPU
B = GPUsingle(rand(100,100)); % GPU
%% 1D FFT
FFT_A = fft(A); % executed on GPU
%% 2D FFT
FFT_B = fft2(B); % executed on GPU
%% Convert GPU into Matlab variables

Ah = single(A); % Ah is on HOST
Bh = single(B); % Bh is on HOST
FFT_Ah = single(FFT_A); % FFT_Ah is on HOST
FFT_Bh = single(FFT_B); % FFT_Bh is on HOST
The equivalent code that executes above operations entirely on the CPU is
the following:
A = single(rand(1,100)); % CPU
B = single(rand(100,100)); % CPU
%% 1D FFT
FFT_A = fft(A); % executed on CPU
%% 2D FFT
FFT_B = fft2(B); % executed on CPU
2.4 Performance analisys

The easiest way to evaluate the performance in Matlab are the tic and toc
commands, as follows:
A = rand(1000,1000); % A is on CPU
B = rand(1000,1000); % B is on CPU
tic;A.*B;toc; % executed on CPU

The GPU code performance can be evaluated in a similar way by using tic,
toc and the GPUsync command, as follows:
A = GPUsingle(rand(1000,1000));
B = GPUsingle(rand(1000,1000));
tic;A.*B;GPUsync;toc;
The following example shows a simple Matlab script to compare the ex-
ecution time of the element-by-element multiplication between two matrices
A and B on the GPU and on the CPU.
N = 100:100:4000;
timecpu = zeros(1,length(N));
timegpu = zeros(1,length(N));
index=1;
for i=N
Ah = single(rand(i)); % CPU
A = GPUsingle(Ah); % GPU
%% Execution on GPU
tic;
A.*A;
GPUsync;
timegpu(index) = toc;
%% Execution on CPU
tic;
Ah.*Ah;
timecpu(index) = toc;
% increase index
index = index +1;
end
The above code calculates the two vectors timecpu and timegpu that can be
used to evaluate the speed-up between the GPU and the CPU as follows:

speedup = timecpu./timegpu

Chapter 3
GPUmat overview
GPUmat functions are grouped into high level and low level functions. High
level functions can be used in a similar way as existing Matlab functions, while
to use low level functions the user needs some experience in GPU program-
ming. For example, low level functions can directly manage GPU memory,
which is automatically handled on high level functions. Low level functions
can also directly access CUDA libraries such as CUBLAS and CUFFT. The
detailed list of high level and low level functions can be found in Chapter 6.
GPUmat can be used in the following ways:
• As any other Matlab toolbox by using high level functions. This is the
easiest way to use GPUmat.
• As a GPU Source Development Kit, in order to integrate functions
that are not available in the library, by using both low and high level
functions.
This chapter describes how to use the GPUmat high level functions. Users
can find further information about low level functions in Chapter 4. The full
function reference is in Chapter 6. This chapter describes the following topics:
• Starting the GPU environment
• Creating a GPU variable
• Performing calculations on the GPU
• Converting a GPU variable into a Matlab variable
• GPUmat functions
• GPU memory management
• Compatibility between Matlab and GPUmat
• GPUmat code performance
20
CHAPTER 3. GPUmat overview
3.1. STARTING THE GPU ENVIRONMENT
3.1 Starting the GPU environment
Name Description
GPUstart Starts GPU environment and loads the
required library components
GPUinfo Prints information about available
CUDA capable GPUs
GPUdeviceInit Initializes a CUDA capable GPU de-
vice
Table 3.1: GPU management functions.
Table 3.1 shows functions used to start GPUmat and to manage the GPU.
The GPUstart command is used to start GPUmat. The system might have
more than one GPU installed. By default GPUstart selects the first available
GPU device. The command GPUinfo prints information about installed
GPUs:
GPUinfo
Found 1 devices
* Device N. 0
Compute capability is 1.1
Total memory is 255.6875MB
Mult. processors = 2
It is possible to select a different GPU by using the GPUdeviceInit com-

mand.
3.2 Creating a GPU variable

A GPU variable is a Matlab variable that is allocated on GPU memory
and is created using the Matlab class GPUsingle. The GPUsingle class
is equivalent to the single precision real/complex type in Matlab. Double
precision type is supported by CUDA and some GPU devices, but is not
currently implemented in GPUmat.
Functions to create a GPUsingle variable are shown in table 3.2, and ex-
plained with more details in the next paragraphs. It is important to know

3.2. CREATING A GPU VARIABLE
that a memory transfer between GPU and CPU is required if the GPU vari-
able is initialized with a Matlab array. A memory transfer is a time consuming
task and might reduce the performance of the code.
Function Description
A = GPUsingle(Ah) Creates a GPU array A initial-
ized with the Matlab array Ah.
Requires GPU-CPU memory
transfer.
A = zeros(size, GPUsingle) Creates a GPU array initialized
with zeros.
A = ones(size, GPUsingle) Creates a GPU array initialized
with ones.
A = colon(begin, spacing, A = colon(begin, spacing,
end, GPUsingle) end, GPUsingle) creates a regu-
larly spaced GPU vector A with
values in the range [begin:end].
C = vertcat(A,B) or C = [A;B] Vertical concatenation. Can be
applied to more than 2 GPU vec-
tors.
Table 3.2: Functions used to create GPU variables.
A = GPUsingle(Ah)
Creates a GPU single precision variable A initialized with the
Matlab array Ah. A has the same properties as Ah, such as
the size and the number of elements. Requires GPU-CPU
memory transfer.
Example:
Ah = single(rand(1000));% Ah is a Matlab variable

A = GPUsingle(Ah); % GPU variable
If the GPU variable is initialized with a double precision Matlab array Ah,

Ah = rand(1000); % Ah is a double precision Matlab variable

A = GPUsingle(Ah);% GPU variable
there will be a loss of precision in the conversion between double and single
precision.

A = colon(begin, spacing, end, GPUsingle)

Creates a GPU single precision variable A with values between
begin and end and spaced by spacing. This command is similar
to the Matlab colon command.
Example:
A = colon(0,2,1000,GPUsingle); % A is a GPU variable
The syntax to create a Matlab variable is very similar to the above code:
Ah = colon(0,2,1000); % A is a CPU variable
Existing variables can be efficiently used also to create others. The follow-
ing example shows how to create a complex GPU variable using the colon
function:
A = colon(0,2,6,GPUsingle); % A is a real GPU variable

B = sqrt(-1)*A; % B is a complex GPU variable
C = 1 + B % All real elements of B are set to 1
The previous commands results in
>> A
ans =
0 2 4 6

>> B
ans =
0 0 + 2.0000i 0 + 4.0000i 0 + 6.0000i
Single precision COMPLEX GPU type.

>> C
ans =
1.0000 1.0000 + 2.0000i 1.0000 + 4.0000i 1.0000 + 6.0000i
Using the function colon is a very efficient way to create a GPU variable
because array values are directly created on the GPU memory without any

3.3. PERFORMING CALCULATIONS ON THE GPU
data transfer between CPU and GPU.
A = zeros(size, GPUsingle)
Has the same behavior as Matlab zeros function. Creates a GPU
array with zeros.
Example:
A = zeros(1,1000,GPUsingle); % A is a GPU variable
A = ones(size, GPUsingle)
Has the same behavior as Matlab ones function. Creates a GPU
array with ones.
Example:
A = ones(1,1000,GPUsingle); % A is a GPU variable
Find some examples of GPU variables creation in the file CreateGPUVari-

ables.m located in the example folder.
3.3 Performing calculations on the GPU

The following example explains the mechanism that allows Matlab functions
to be executed on the GPU.
A = GPUsingle(rand(10)); % A is on GPU
B = exp(A) % exp calculated on GPU
The exp function in the above code that is executed by Matlab is the one
implemented in GPUmat and not the built-in function. Matlab uses the
GPUmat function because the argument of the exp is a GPUsingle type.
The following example shows similar code executed on CPU:
A = single(rand(10)); % A is on CPU

3.4. PORTING EXISTING MATLAB CODE
B = exp(A) % exp calculated on CPU
From the above example we conclude that:

• Functions involving the GPUsingle type are executed on GPU by using
GPUmat functions.
• Not every Matlab function is defined in GPUmat. This means that not
every Matlab code is executed on the GPU just by using the GPUsingle
type, but only the Matlab code that uses functions defined in GPUmat
(The complete function reference can be found in Chapter 6).
GPUmat implements also Matlab operators, such as +, -, .*. It means
that algebraic expressions such as A + B are also defined in GPUmat and
executed on the GPU. GPUsingle operators are shown on table 3.8. Here is
an example:
A = GPUsingle(rand(100,100)); %GPU variable

B = A/5 + A.*A*2 + 1; %run on GPU
C = A < B; %run on GPU
% Same operation performed on CPU

A = single(A); %CPU variable
B = A/5 + A.*A*2 + 1; %run on CPU
C = A < B; %run on CPU
3.4 Porting existing Matlab code

To port existing Matlab code, Matlab variables have to be converted to the
GPUsingle class. The easiest way to do it is to use the GPUsingle initialized
with the existing Matlab variable, but this is not the most efficient approach
because it involves a memory transfer between CPU and GPU. Here is an
example:
Ah = [0:10:1000]; % Ah is on CPU
A = GPUsingle(Ah); % A is on GPU
The above code can be written more efficiently using the colon function, as
follows:

3.5. CONVERTING A GPU VARIABLE INTO A MATLAB VARIABLE
Name Description
a + b Binary addition
a - b Binary subtraction
-a Unary minus
a.*b Element-wise multiplication
a*b Matrix multiplication
a./b Right element-wise division
a./ b Left element-wise division
a.^b Element-wise power
a < b Less than
a > b Greater than
a <= b Less than or equal to
a >= b Greater than or equal to
a ~= b Not equal to
a == b Equality
a & b Logical AND
a | b Logical OR
~a Logical NOT
a’ Complex conjugate trans-
pose
a.’ Matrix transpose
Table 3.8: Operators defined for GPUsingle type
A = colon(0,10,1000,GPUsingle); % A is on GPU
3.5 Converting a GPU variable into a Matlab

variable
Although the GPUsingle variable is available from Matlab, its content is
stored on the GPU memory. Converting a GPU variable into a Matlab vari-
able means that we transfer the content of the variable from the GPU to
the CPU memory. The following example describes how to convert a GPU
variable A into a Matlab array Ah, by using the function single:

3.6. GPUMAT FUNCTIONS
A = GPUsingle(Ah); %A is on GPU memory

Ah = single(A); %Ah is on HOST memory
To visualize the content of a GPU variable on the Matlab command window,

just type its name as any other Matlab array:
ans =
0.8147 0.0975 0.1576 0.1419 0.6557

0.9058 0.2785 0.9706 0.4218 0.0357
0.1270 0.5469 0.9572 0.9157 0.8491
0.9134 0.9575 0.4854 0.7922 0.9340
0.6324 0.9649 0.8003 0.9595 0.6787
Every time the content of a GPUsingle is read in Matlab, the system performs
a memory transfer from the GPU to the CPU. The same happens when
a GPUsingle is created and initialized using a Matlab array. Because of
the limited memory bandwidth between the HOST and the GPU, the data
transfer between CPU and GPU may be time consuming and therefore its
usage should be limited.
3.6 GPUmat functions

GPUmat currently implements only a subset of Matlab functions. The most
important operators and numerical functions are implemented and users with
programming experience can extend the library by using low level and high
level functions that are available and documented in the library. Table 3.9
shows a short summary of implemented functions and operators.

3.7. GPU MEMORY MANAGEMENT
Implemented functions Example
Matlab operators A = GPUsingle(rand(1000));

(A*B, A-B, A.*B, B = GPUsingle(rand(1000));
A+B, etc.) C = A + B;
Numerical functions A = GPUsingle(rand(1000));

(exp, sqrt, log, etc.) B = GPUsingle(rand(1000));
C = exp(A);
D = sqrt(C) + B;
Fast Fourier Transform RE = GPUsingle(rand(1000));

IM = i*GPUsingle(rand(1000));
C = fft(RE + IM);
Table 3.9: Some GPUmat functions.
3.7 GPU memory management

The memory is managed automatically by GPUmat. Any GPU variable is
automatically destroyed following exactly the same life-cycle as any other
Matlab variable. Nevertheless, the GPU memory is limited and eventually
the user can manually remove GPU variables by using the Matlab built-in
command clear. Table 3.10 shows functions to manage the GPU memory.
Name Description
clear Matlab built-in command, removes the
specified variables
GPUmem Returns available GPU memory in
bytes
Table 3.10: Functions used to manage the GPU memory
The following code shows a typical situation where the GPU memory is
not enough, and some variables must be manually removed:

3.8. CODING GUIDELINES
A = GPUsingle(rand(6000,3000)); % A is on GPU
B = GPUsingle(rand(6000,3000)); % B is on GPU
C = GPUsingle(rand(6000,3000)); % C is on GPU
Device memory allocation error.
Available memory is 65274 KB, required 70312 KB
In the above example, it is not possible to allocate the variable C because

the GPU memory is not enough (see the error message). In this case we
must delete other variable, such as A or B. If we need also A and B, then our
GPU card has not enough memory to manage all the variables. To delete a
variable (for example A), use the clear command, as follows:
clear A
Check the file MemoryExample.m, located in the example folder, to under-

stand how to use functions for memory management. The file performs the
following actions:
• Displays the GPU available memory.
• Creates a GPUsingle variable on the GPU workspace and displays the

available free memory.
• Cleans up the GPU variable and displays once more the available GPU
memory.
A very useful Matlab command is the whos, which can be used to check
how many GPUsingle variables are on the Matlab workspace. The following
Matlab output shows the result of the whos command and the presence of
a GPUsingle A on the Matlab workspace:
>> whos
Name Size Bytes Class Attributes
A 1x1000000 924 GPUsingle

ans 1x1 4 uint32
3.8 Coding guidelines

To maximize the execution performance keep in mind the following points:

• Memory Transfers. Avoid excessive memory transfers between GPU/CPU

memory.
• Vectorized operations and for-loops. The best performance in both

Matlab and GPUmat can be achieved by using vectorized operations
and avoiding for-loops. More information can be found at the following
link: Matlab Code Vectorization Guide
Next section explains previous points with more details.
3.8.1 Memory transfers

The most time consuming task is the memory transfer from/to GPU, such as
initializing a GPUsingle variable with a Matlab array. Here is an example:
Ah = rand(1000); % Ah is on CPU memory

A = GPUsingle(Ah); % A is on GPU memory
In the above code, the variable Ah is used to initialize the GPU variable A,
which means that data is transferred from the CPU to the GPU memory.
Vice versa, when a GPU variable is converted into a Matlab variable there is
a memory transfer from the GPU to the CPU:

Ah = single(A); % Ah is on CPU memory
The fastest way to initialize or create a GPUsingle is to use existing variables

on the GPU memory to create other GPU variables, or to use functions
such as zeros or colon which directly create values on the GPU without
transferring data from Matlab. Please check Section 3.2 for more information
about creating new GPU variables with GPUmat.
3.8.2 Vectorized code and for-loops

Another way to improve the code performance is to avoid for loops by using
vectorized operations. For example:
for i=1:1e6
A = rand(3,3);
B = rand(3,3);
C = A.*B;

%% do something with C
end
The above code can be executed as-is on the GPU by converting A and B
to GPUsingle, as follows:
for i=1:1e6
C = A.*B;
%% do something with C
end
Nevertheless, matrix operations can be used instead of the for-loop by

creating two arrays with 3 x 3e6 elements and multiplying them element-
by-element:
A = GPUsingle(rand(3,3e6)); % A is on GPU
B = GPUsingle(rand(3,3e6)); % B is on GPU
C = A.*B; % C is on GPU
The following Matlab code perform the matrix addition C = A + B using a

for-loop statement.
A = rand(100);
B = rand(100);
C = zeros(100);
for i=1:size(A,1)
for j=1:size(B,2)
C(i,j) = A(i,j) + B(i,j);
end
end
To port the code to the GPU, it is suggested to use the element-by-element

addition instead of using the for-loop:
B = GPUsingle(rand(100)); % B is on GPU
C = A + B; % C is on GPU

3.8.3 Matlab and GPUsingle variables

Operations and functions involving Matlab and GPUsingle variables at the
same time are not defined, except operations involving GPUsingle and Matlab
scalars. The following is an example:
Ah = rand(5); % Ah is on CPU
A = GPUsingle(rand(5));% A is on GPU
Bh = 1; % Bh is on CPU
Ah + A
Unknown operation + between ’double’ and ’GPUsingle’
A + Bh
ans =
1.8147 1.0975 1.1576 1.1419 1.6557

1.9058 1.2785 1.9706 1.4218 1.0357
1.1270 1.5469 1.9572 1.9157 1.8491
1.9134 1.9575 1.4854 1.7922 1.9340
1.6324 1.9649 1.8003 1.9595 1.6787
Adding Ah and A generates an error, whereas adding A and Bh is possible

because Bh is a scalar. A can be converted into a Matlab variable and added
to Ah or in a similar way Ah can be converted into a GPU variable and
added to A, as follows:
Ah = rand(5);
Ah + single(A); % A converted into Matlab
Ch = single(A); % A converted into Matlab Ch

Ah + Ch; % adding Ah and Ch
D = GPUsingle(Ah); % Ah converted into the GPUsingle D

A + D; % adding A and D
A + GPUsingle(Ah); % A added directly to GPUsingle(Ah)

3.9. PERFORMANCE ANALYSIS
3.9 Performance analysis

The easiest way to evaluate the performance in Matlab are the tic and toc
commands, as follows:
A = rand(1000,1000); % A is on CPU
B = rand(1000,1000); % B is on CPU
tic;A.*B;toc; % executed on CPU
The GPU code performance can be evaluated in a similar way by using tic,
toc and the GPUsync command, as follows:
The GPUsync command is used to synchronize the GPU code. It means

that Matlab waits until the GPU execution is completed. The execution of
the GPU code is asynchronous, i.e. the control is returned to Matlab after
calling the GPUmat function. But this does not necessarily mean that the
GPU has finished its task. To force Matlab to wait until the GPU has finished
his task, the GPUsync command must be used. Here is an example:
Elapsed time is 0.010231 seconds.
tic;A.*B;toc;
Elapsed time is 0.003808 seconds.
Asynchronous execution is entirely managed by GPUmat and is transparent

to the user.

Chapter 4
Developer’s section
This chapter explains how to use GPUmat low level functions. Low level
functions can be used for the following purpose:
• To develop new GPUsingle class methods and functions.
• To access CUDA libraries (CUBLAS, CUFFT, CUDA run-time).
• To directly access GPU memory by using low level memory manage-

ment functions.
4.1 The GPUsingle class

The GPUsingle class is used to create and initialize GPU variables, either
using the empty constructor or using and existing Matlab variable. Here is
an example:
Ah = rand(1000); % Matlab variable

B = GPUsingle(rand(100)); % GPU variable
The GPUsingle class implements a destructor, which frees the GPU memory
that is not used anymore. The life-time of a GPUsingle is the same as any
other Matlab variable. In the following example, the second assignment to
A automatically deletes the previously created variable and frees the corre-
sponding GPU memory occupied by an array with size=100x100:
35
CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS
In the following example we introduce some of the properties of the GPUs-

ingle class with a simple example: the low level function cublasGetVector
is used to retrieve the content of the GPUsingle A into the Matlab variable
Ah.
A = GPUsingle([1 2; 3 4]);
% Ah should be single precision, because
% A is single precision
Ah = single(zeros(1,numel(A)));
[status Ah] = cublasGetVector (numel(A), ...
getSizeOf(A), getPtr(A), 1, Ah, 1);
cublasCheckStatus( status, ...
’Unable to retrieve variable values from GPU.’);
Ah
ans =
1 3 2 4
In the result Ah the data is stored using column-major storage, the same
format as Matlab and Fortran. Complex numbers are stored interleaving
in memory imaginary and real part values, as explained in section 4.3. In
the above example we use the CUBLAS function ([4]) cublasGetVector to
transfer the data from the GPU to the CPU memory. The function numel is
used to get the number of elements in A. The function getSizeOf returns the
size of a single element of A. Finally the function getPtr returns the pointer
to the GPU memory.

4.1.1 GPUsingle constructor

Constructor summary
A = GPUsingle(Ah) Creates the GPUsingle A initialized
with existing Matlab array Ah
A = GPUsingle() Creates an empty GPUsingle. Using
this constructor GPU memory is not al-
located.
Constructor details
A = GPUsingle(Ah)
Creates a GPU variable A initialized with the Matlab array Ah.
A has the same properties as Ah, such as the size and the number
of elements.
Example:
Ah = rand(1000); % Matlab variable

B = GPUsingle(rand(100)); % GPU variable
A = GPUsingle()
Creates an empty GPU variable. GPU memory is not automat-
ically allocated and the following steps must be performed to
allocate the memory:
• step1: initialize the size of the array by using setSize.
• step2: set the type of the GPUsingle by using setComplex

or setReal if the stored data is complex or real respectively.
• step3: use the GPUallocVector function. Please note that

this function should be used only after step1 and step2.
There is no memory transfer between the CPU and the GPU

when using the empty constructor .

Example:
A = GPUsingle(); %empty constructor

setSize(A,[100 100]); %set variable size
setReal(A); %set variable as real
GPUallocVector(A); %allocate on GPU memory
A = GPUsingle(); %empty constructor

setSize(A,[10 10]); %set variable size
setComplex(A); %set variable as complex
GPUallocVector(A); %allocate on GPU memory

4.1.2 GPUsingle properties

Fields summary
GPUPTR Pointer to the GPU memory
COMPLEX Complex type flag
TRANS Transposed flag
SIZE Variable size
SIZEOF Datatype size (similar to sizeof in C)
Property details
GPUPTR
GPUPTR is the pointer to the GPU memory region.
The pointer is indirectly set by using GPUallocVector.
Its value can be retrieved by using the getPtr function.
Example:
N = 10;
A = GPUsingle(rand(1,N));
Isamin = cublasIsamin(N, getPtr(A), 1);
COMPLEX
COMPLEX is a flag and defines a complex GPUsingle.
Check section 4.3 for further information about complex
numbers representation. It is set using setComplex and
reset using setReal. Use iscomplex to check its value.
The flag must be set using setComplex before allocating
the variable memory using GPUallocVector. The flag
has no effect if set after calling GPUallocVector.
Example:
iscomplex(A)
A = GPUsingle(rand(5)+i*rand(5));
iscomplex(A)

TRANS
TRANS is an internal flag. Use the function istrans to
check whether the flag is set or not. The flag is set to 1
to identify a matrix that is virtually transposed, which
means that values are not exchanged in memory. For
some operations, such as matrix-matrix multiplication
with CUBLAS functions, there is no need to effectively
transpose the matrix in memory (which is time consum-
ing). The high level function transpose(A) sets the
flag to 1, whereas the function transpose(A,1) is used
to effectively perform necessary memory operations to
transpose array elements. High level functions treat cor-
rectly a GPUsingle with TRANS set to 1, but low level
functions do not. The following example shows how the
values of the array A are stored in memory. Please note
that data is stored in column-major format.
Example:
A = GPUsingle([1 2; 3 4]);
A = transpose(A) % A = A.’ is the same
Ah = single(zeros(1,numel(A)));
[status Ah]= cublasGetVector (numel(A), ...
getSizeOf(A), getPtr(A), ...
1, Ah, 1);
cublasCheckStatus( status, ’Memory error.’);
Ah
ans =
1 3
2 4
ans =
1 3 2 4

SIZE
SIZE stores the variable size. The functions to modify
it and to get its value are setSize and size respectively.
The SIZE must be defined before using GPUallocVector.
Modifying the SIZE on initialized variables has no effect
on memory values.
Example:
A = GPUsingle();
setSize(A,[100 100]);
GPUallocVector(A);
size(A)
4.1.3 GPUsingle methods

Set and Get methods summary
getPtr(A) Get GPUPTR
setSize(A,size) Set SIZE
size(A) Get SIZE
setReal(A) Set REAL type
isreal(A) Returns 1 if REAL
setComplex(A) Set COMPLEX type
iscomplex(A) Returns 1 if COMPLEX
istrans(A) Get TRANS flag

4.2. LOW LEVEL GPU MEMORY MANAGEMENT
4.2 Low level GPU memory management

Memory management using high level functions is explained in section 3.7.
Memory management methods summary
GPUallocVector Allocates a variable on GPU memory.
GPU variables are managed in the following way:
• The GPUsingle implements a destructor which takes care of clearing

unused memory regions. There is no need to explicitly clean up the
GPU memory. If necessary it can be done using the Matlab clear com-
mand.
• If the user creates a Matlab pointer to the GPU memory using low level
functions, the memory is not automatically cleaned when the variable
is not used anymore. In this case the user must manually clean the
GPU memory.
Above concepts are explained in next sections.
4.2.1 Memory management using the GPUsingle class

The following code shows how to allocate and delete a GPUsingle.
clear A;
B = GPUsingle(); % creates empty GPUsingle

setReal(B); % REAL type
setSize(B,[100 100]); % must set GPUsingle size
GPUallocVector(B); % allocate GPU memory
clear B;
4.2.2 Memory management using low level functions

The following code shows how to allocate a variable with 100 single precision
floating point elements by using CUBLAS functions:

4.3. COMPLEX NUMBERS
% create a new pointer

GPUptr = 0;
% allocate using cublasAlloc

SIZE_OF_FLOAT = 4;
NUMEL = 100;
[status GPUptr]= cublasAlloc(NUMEL,SIZE_OF_FLOAT,GPUptr);
cublasCheckStatus( status, ’Device memory allocation error’);
The function cublasFree is used to free the memory:
status = cublasFree(GPUptr);
cublasCheckStatus( status, ’!!!! memory free error (GPUptr)’);
4.3 Complex numbers

A single precision complex number is represented with two floating point val-
ues, the real and imaginary part respectively. A complex vector is a sequence
of complex numbers, i.e. a sequence of interleaved real and imaginary values.
There are different methods to create a complex GPU variable:
• Initializing a GPUsingle with a Matlab complex number
• Using functions unpackfC2C and packfC2C (see function reference,

Chapter 6)
• Multiply a real number by the imaginary unit
Above points are explained in the following example:
% 1) Initialize a GPUsingle with a Matlab complex array
Gh = rand(10) + sqrt(-1)*rand(10); %Matlab complex variable

G = GPUsingle(Gh); %GPU single complex
% 2) Using unpackC2C to separate the values in A into

% B and C
A = GPUsingle([1 2 3 4 5] + sqrt(-1)*[6 7 8 9 10]);
B = GPUsingle(zeros(1,5));

4.4. CUBLAS FUNCTIONS
C = GPUsingle(zeros(1,5));
unpackfC2C(A, B, C);
single(B)
single(C)
unpackfC2R(A, B);
single(B)
single(C)
% 3) Multiply a real array by the imaginary unit
Gh = rand(10); % Matlab real variable

G = GPUsingle(Gh)*sqrt(-1); % sqrt(-1) gives imaginary unit
4.4 CUBLAS functions

The following code shows how to use low level CUBLAS functions using
GPUmat wrappers. The code can be found in the file simpleCUBLAS.m lo-
cated in the examples folder CUBLAS. Make sure that the GPU environment
was started using GPUstart before running the example.
function simpleCUBLAS
% This is the GPUmat translation of the code in the
% CUDA SDK projects called with the same name (simpleCUBLAS).
% The example shows how to access CUBLAS functions from GPUmat
SIZEOF_FLOAT = sizeoffloat();
%% Allocate HOST arrays and initialize with random numbers

N = 500;
h_A = single(rand(N));
h_B = single(rand(N));
h_C = single(rand(N));
%% Allocate GPU arrays

d_A = GPUsingle(h_A);
d_B = GPUsingle(h_B);

d_C = GPUsingle(h_C);
% Although d_A was already initialized with h_A values, we can

% call cublasSetVector to do that again
status = cublasSetVector(N*N, SIZEOF_FLOAT, ...
h_A, 1, getPtr(d_A), 1);
cublasCheckStatus( status, ’!!!! device access error (write A)’);
% Calculate reference in Matlab

alpha = 2.0;
h_C_ref = alpha * h_A*h_B;
% Execute on GPU
cublasSgemm(’n’,’n’, N, N, N, alpha, getPtr(d_A), ...
N, getPtr(d_B), N, 0.0, getPtr(d_C), N);
status = cublasGetError();
cublasCheckStatus( status, ’!!!! kernel execution error.’);
% Copy results back to HOST

h_C = single(d_C);
compareArrays(h_C_ref, h_C, 5e-6);
% Clean up GPU memory

% THERE IS NO NEED TO CLEAN UP MEMORY
% NEVERTHELESS, IF NECESSARY, ALWAYS USE
% CLEAR WITH GPUSINGLE
clear d_A
clear d_B
clear d_C
end
GPUmat defines wrappers to CUBLAS functions. The list of these functions

can be found in the function reference under the category CUBLAS functions
(Chapter 6). Some examples can be found in the example folder CUBLAS. In
general CUBLAS wrappers have the same interface as the original CUBLAS
functions. When a CUBLAS function needs a pointer to a GPU variable A,
the pointer is obtained using getPtr(A). For example:

r = cublasIsamax(numel(A),getPtr(A),1)
The original declaration of the CUBLAS function cublasIsamax is:
int
cublasIsamax (int n, const float *x, int incx)
Note the mapping between variables in the above example:
int n -> numel(A)

const float *x -> getPtr(A)
int incx -> 1
The following code performs complex matrix-matrix multiplication using

cublasCgemm:
N = 10;
I = sqrt(-1);
A = GPUsingle(rand(N,N) + I*rand(N,N));
B = GPUsingle(rand(N,N) + I*rand(N,N));
% C needs to be complex as well, thats why we multiply by I

C = zeros(N,N,GPUsingle)*I;
% alpha is complex
alpha = 2.0+I*3.0;
beta = 0.0;
opA = ’n’;
opB = ’n’;
cublasCgemm(opA, opB, N, N, N, ...

alpha, getPtr(A), N, getPtr(B), ...
N, beta, getPtr(C), N);
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

4.5. CUFFT FUNCTIONS
C_mat = alpha * single(A)*single(B);

compareArrays(C_mat, single(C), 1e-6);
The original declaration of the CUBLAS function cublasCgemm is:
void cublasCgemm (char transa, char transb, int m, int n, int k,

cuComplex alpha, const cuComplex *A, int lda,
const cuComplex *B, int ldb, cuComplex beta,
cuComplex *C, int ldc)
Please note the mapping between variables in the above example:
char transa -> ’n’

char transb -> ’n’
int m -> N
int n -> N
int k -> N
cuComplex alpha -> 2.0+I*3.0
const cuComplex *A -> getPtr(d_A)
int lda -> N
const cuComplex *B -> getPtr(d_B)
int ldb -> N
cuComplex beta -> 0.0
cuComplex *C -> getPtr(d_C)
int ldc -> N
Complex numbers are stored interleaving real and imaginary values on the
GPU (see section 4.3), the same format expected by the cublasCgemm func-
tion and other CUFFT functions. For a complete description of CUBLAS
functions check the CUDA CUBLAS manual. For a complete list of imple-
mented wrappers check the functions reference section (Chapter 6).
4.5 CUFFT functions

The following code shows how to call low level CUFFT functions using
GPUmat wrappers. The code can be found in the file simpleCUFFT.m lo-
cated in the examples folder CUFFT. Make sure that the GPU environment
was started using GPUstart before testing the example.

4.5. CUFFT FUNCTIONS
%% CUFFT example
%% Allocate HOST arrays and initialize with random numbers

N = 512;
h_A = single(rand(1,N)+i*rand(1,N));
d_A = GPUsingle(h_A);
d_B = GPUsingle(h_A);
fftType = cufftType;
fftDir = cufftTransformDirections;
% FFT plan
plan = 0;
[status, plan] = cufftPlan1d(plan, numel(d_A), ...
fftType.CUFFT_C2C, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);
% Run GPU FFT

[status] = cufftExecC2C(plan, getPtr(d_A), getPtr(d_B), ...
fftDir.CUFFT_FORWARD);
cufftCheckStatus(status, ’Error in cufftExecC2C’);
% Run GPU IFFT

[status] = cufftExecC2C(plan, getPtr(d_B), getPtr(d_A), ...
fftDir.CUFFT_INVERSE);
% results should be scaled by 1/N if compared to CPU

h_B = 1/N*single(d_A);
[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);

Chapter 5
Frequently Asked Questions
5.1 What happens if GPUmat and Matlab vari-

ables are used together?
Operations and functions involving Matlab and GPUmat variables together
are not supported, with the exception of Matlab scalars (either real or com-
plex). Matlab scalars can be used together with GPUsingle without convert-
ing them to GPUsingle. In the following example we add a Matlab array Ah
to a GPUsingle array A:
A = GPUsingle(rand(2)); % GPU
Ah = rand(2); % CPU
A + Ah;
??? Error using ==> ...
Unknown operation + between ’GPUsingle’ and ’double’
Performing the same operation with a Matlab scalar does not generate any
error:
Ah = 1+2*i; % complex
A + Ah
ans =
1.3500 + 2.0000i 1.2511 + 2.0000i

1.1966 + 2.0000i 1.6160 + 2.0000i
49
CHAPTER 5. Frequently Asked Questions
5.2. IS ANY MATLAB FUNCTION EXECUTED ON GPU BY USING
GPUSINGLE?
The Matlab array Ah can be converted into a GPUsingle and added to A, as

follows:
Ah = rand(2); % CPU
A + GPUsingle(Ah)
ans =
1.0230 1.1167
1.2689 1.3425
The main concept is that CPU and GPU variables are stored on different
memory regions, and if we want to do operations on both we have first to
transfer the CPU variable to the GPU or the other way around. Matlab
scalars are an exception, but the same doesn’t work with GPU scalars which
cannot be added directly to Matlab variables.
5.2 Is any Matlab function executed on GPU by

using GPUsingle?
Only GPUmat functions, which are a subset of the existing Matlab functions,
are executed on GPU. Find the complete function and operators list in Chap-
ter 6. If the user tries to use a function that is not defined in GPUmat an
error message is generated. For example, the function trapz is not defined
and Matlab output is the following:
%
A = GPUsingle([1 2; 3 4]);
trapz(A)
??? Error using ==> ...
...
GPUmat is currently under development and new functions will be added

in future version. The user has the possibility to define new functions as
explained in Chapter 4, using GPUmat as a Source Development Kit (SDK).

CHAPTER 5. Frequently Asked Questions
5.3. WHAT OPERATIONS SHOULD I PERFORM ON THE GPU?
5.3 What operations should I perform on the

GPU?
The GPU is useful for computationally intensive operations, such as:
• Matrix-matrix multiplications
• Element-by-element operations such as the Matlab .* operator
• Fast Fourier Transform
The above operations should be performed on large arrays to efficiently

use the GPU. Coding guidelines can be found in Chapter 3.

Chapter 6
Function Reference
6.1 Functions - by category

6.1.1 GPU startup and management
Name Description
GPUdeviceInit Initializes a CUDA capable GPU device
GPUinfo Prints information about the GPU device
GPUstart Starts the GPU environment and loads re-
quired components
6.1.2 GPU variables management
Name Description
colon Colon
double Converts a GPU single precision variable into
a Matlab double precision variable
GPUsingle GPUsingle constructor
GPUsync Wait until all GPU operations are completed
ones GPU single precision ones array
setComplex Set a GPUsingle as complex
setReal Set a GPUsingle as real
setSize Set GPUsingle size
single Converts a GPU variable into a Matlab single
precision variable
zeros GPU single precision zeros array
52
CHAPTER 6. Function Reference
6.1. FUNCTIONS - BY CATEGORY
6.1.3 GPU memory management
Name Description
GPUallocVector Variable allocation on GPU memory
GPUmem Returns the free memory (bytes) on selected
GPU device
6.1.4 Numerical functions
Name Description
abs Absolute value
acos Inverse cosine
acosh Inverse hyperbolic cosine
and Logical AND
asin Inverse sine
asinh Inverse hyperbolic sine
atan Inverse tangent, result in radians
atanh Inverse hyperbolic tangent
ceil Round towards plus infinity
conj CONJ(X) is the complex conjugate of X
cos Cosine of argument in radians
cosh Hyperbolic cosine
ctranspose Complex conjugate transpose
eq Equal
exp Exponential
fft Discrete Fourier transform
fft2 Two-dimensional discrete Fourier Transform
floor Round towards minus infinity
ge Greater than or equal
gt Greater than
ifft Inverse discrete Fourier transform
ifft2 Two-dimensional inverse discrete Fourier
transform
ldivide Left array divide
le Less than or equal

log Natural logarithm

log10 Common (base 10) logarithm
log1p Compute log(1+z) accurately
log2 Base 2 logarithm and dissect floating point
number
lt Less than
minus Minus
mrdivide Slash or right matrix divide
mtimes Matrix multiply
ne Not equal
not Logical NOT
or Logical OR
plus Plus
power Array power
rdivide Right array divide
round Round towards nearest integer
sin Sine of argument in radians
sinh Hyperbolic sine
sqrt Square root
subsref Subscripted reference
sum Sum of elements
tan Tangent of argument in radians
tanh Hyperbolic tangent
times Array multiply
transpose Transpose
uminus Unary minus
vertcat Vertical concatenation
6.1.5 General information
Name Description
display Display GPU variable
getPtr Get GPUsingle pointer on GPU memory
getSizeOf Get the size of the GPU datatype (similar to
sizeof in C)
getType Get the type of the GPU variable

iscomplex True for complex array

isempty True for empty GPUsingle array
isreal True for real array
isscalar True if array is a scalar
istrans True if GPUsingle TRANS flag is set to 1
length Length of vector
ndims Number of dimensions
numel Number of elements in an array or sub-
scripted array expression.
size Size of array
6.1.6 Complex numbers
Name Description
packfC2C Pack two arrays into an interleaved complex
array
packfR2C Transforms a real array into a complex array
with zero complex elements.
unpackfC2C Unpack one complex array into two single
precision arrays
unpackfC2R Transforms a complex array into a real array
discarding the complex part
6.1.7 CUBLAS functions
Name Description
cublasAlloc Wrapper to CUBLAS cublasAlloc function
cublasCgemm Wrapper to CUBLAS cublasCgemm function

cublasCheckStatus Check the CUBLAS status.
cublasError Returns a structure with CUBLAS result
codes
cublasFree Wrapper to CUBLAS cublasFree function

cublasGetError Wrapper to CUBLAS cublasGetError func-

tion
cublasGetVector Wrapper to CUBLAS cublasGetVector func-
tion
cublasInit Wrapper to CUBLAS cublasInit function
cublasIsamax Wrapper to CUBLAS cublasIsamax function
cublasIsamin Wrapper to CUBLAS cublasIsamin function
cublasResult Returns a structure with CUBLAS error re-
sults
cublasSasum Wrapper to CUBLAS cublasSasum function
cublasSaxpy Wrapper to CUBLAS cublasSaxpy function
cublasScopy Wrapper to CUBLAS cublasScopy function
cublasSdot Wrapper to CUBLAS cublasSdot function
cublasSetVector Wrapper to CUBLAS cublasSetVector func-
tion
cublasSgemm Wrapper to CUBLAS cublasSgemm function
cublasShutdown Wrapper to CUBLAS cublasShutdown func-
tion
cublasSnrm2 Wrapper to CUBLAS cublasSnrm2 function
cublasSrot Wrapper to CUBLAS cublasSrot function
cublasSscal Wrapper to CUBLAS cublasSscal function
6.1.8 CUDA Driver functions
Name Description
cuCheckStatus Check the CUDA DRV status.
cuInit Wrapper to CUDA driver function cuInit
cuMemGetInfo Wrapper to CUDA driver function
cuMemGetInfo
6.1.9 CUFFT functions
Name Description
cufftCheckStatus Checks the CUFFT status

cufftDestroy Wrapper to CUFFT cufftDestroy

function
cufftExecC2C Wrapper to CUFFT cufftExecC2C
function
cufftExecC2R Wrapper to CUFFT cufftExecC2R
function
cufftExecR2C Wrapper to CUFFT cufftExecR2C
function
cufftPlan1d Wrapper to CUFFT cufftPlan1d
function
cufftPlan2d Wrapper to CUFFT cufftPlan2d
function
cufftResult Returns a structure with CUFFT re-
sult codes
cufftTransformDirections Returns a structure with CUFFT
transform direction codes
cufftType Returns a structure with CUFFT
transform type codes
6.1.10 CUDA run-time functions
Name Description
cudaCheckStatus Check the CUDA run-time status
cudaGetDeviceCount Wrapper to CUDA cudaGetDe-
viceCount function.
cudaGetDeviceMajorMinor Returns CUDA compute capabil-
ity major and minor numbers.
cudaGetDeviceMemory Returns device total memory
cudaGetDeviceMultProcCount Returns device multi-processors
count
cudaGetLastError Wrapper to CUDA cudaGet-
LastError function
cudaSetDevice Wrapper to CUDA cudaSetDe-
vice function
cudaThreadSynchronize Wrapper to CUDA cud-
aThreadSynchronize function.

6.2. OPERATORS
6.2 Operators
Operators are used in mathematical expression such as A + B. GPUmat over-
loads Matlab operators for the GPUsingle class.
Name Description
a + b Binary addition
a - b Binary subtraction
-a Unary minus
a.*b Element-wise multiplication
a*b Matrix multiplication
a./b Right element-wise division
a./ b Left element-wise division
a.^b Element-wise power
a < b Less than
a > b Greater than
a <= b Less than or equal to
a >= b Greater than or equal to
a ~= b Not equal to
a == b Equality
a & b Logical AND
a | b Logical OR
~a Logical NOT
a’ Complex conjugate trans-
pose
a.’ Matrix transpose

6.2. OPERATORS
6.2.1 A & B
and - Logical AND
SYNTAX
R = A & B
R = and(A,B)
A - GPUsingle
B - GPUsingle
R - GPUsingle
DESCRIPTION
A & B performs a logical AND of arrays A and B and returns an
array containing elements set to either logical 1 (TRUE) or logical
0 (FALSE).
Real type supported since version 0.1
EXAMPLE
A = GPUsingle([1 3 0 4]);
B = GPUsingle([0 1 10 2]);
R = A & B;
single(R)

6.2. OPERATORS
6.2.2 A’
ctranspose - Complex conjugate transpose
SYNTAX
R = X’
R = ctranspose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X’ is the complex conjugate transpose of X.
Complex type supported since version 0.1

EXAMPLE
X = GPUsingle(rand(10)+i*rand(10));
R = A’
R = ctranspose(X)

6.2. OPERATORS
6.2.3 A == B
eq - Equal
SYNTAX
R = X == Y
R = eq(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A == B (eq(A, B)) does element by element comparisons between
A and B.

EXAMPLE
B = GPUsingle([1 0 0 4]);
R = A == B;
single(R)
R = eq(A, B);
single(R)

6.2. OPERATORS
6.2.4 A >= B
ge - Greater than or equal
SYNTAX
R = X >= Y
R = ge(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A >= B (ge(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A >= B;
single(R)
R = ge(A, B);
single(R)

6.2. OPERATORS
6.2.5 A > B
gt - Greater than
SYNTAX
R = X > Y
R = gt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A > B (gt(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A > B;
single(R)
R = gt(A, B);
single(R)

6.2. OPERATORS
6.2.6 A <= B
le - Less than or equal
SYNTAX
R = X <= Y
R = le(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A <= B (le(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A <= B;
single(R)
R = le(A, B);
single(R)

6.2. OPERATORS
6.2.7 A < B
lt - Less than
SYNTAX
R = X < Y
R = lt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A < B (lt(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A < B;
single(R)
R = lt(A, B);
single(R)

6.2. OPERATORS
6.2.8 A - B
minus - Minus
SYNTAX
R = X - Y
R = minus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X - Y subtracts matrix Y from X. X and Y must have the same
dimensions unless one is a scalar. A scalar can be subtracted from
anything.

EXAMPLE
X = GPUsingle(rand(10));
Y = GPUsingle(rand(10));
R = Y - X

6.2. OPERATORS
6.2.9 A / B
mrdivide - Slash or right matrix divide
SYNTAX
R = X / Y
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Slash or right matrix divide.

EXAMPLE
B = A / 5
MATLAB COMPATIBILITY
Supported only A / n where n is scalar.

6.2. OPERATORS
6.2.10 A * B
mtimes - Matrix multiply
SYNTAX
R = X * Y
R = mtimes(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
* (mtimes(X, Y)) is the matrix product of X and Y.

EXAMPLE
B = GPUsingle(rand(10));
R = A * B
B = GPUsingle(rand(10)+i*rand(10));
R = A * B

6.2. OPERATORS
6.2.11 A ~= B
ne - Not equal
SYNTAX
R = X ~= Y
R = ne(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A ~= B (ne(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A ~= B;
single(R)
R = ne(A, B);
single(R)

6.2. OPERATORS
6.2.12 ~A
not - Logical NOT
SYNTAX
R = ~X
X - GPUsingle
R - GPUsingle
DESCRIPTION
~A (not(A)) performs a logical NOT of input array A.
EXAMPLE
R = ~A;
single(R)

6.2. OPERATORS
6.2.13 A | B
or - Logical OR
SYNTAX
R = X | Y
R = or(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A | B (or(A, B)) performs a logical OR of arrays A and B.
EXAMPLE
R = A | B;
single(R)
R = or(A, B);
single(R)

6.2. OPERATORS
6.2.14 A + B
plus - Plus
SYNTAX
R = X + Y
R = plus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X + Y (plus(X, Y)) adds matrices X and Y. X and Y must have
the same dimensions unless one is a scalar (a 1-by-1 matrix). A
scalar can be added to anything.

EXAMPLE
R = A + B
R = A + B

6.2. OPERATORS
6.2.15 A . ^B
power - Array power
SYNTAX
R = X .^ Y
R = power(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Z = X.^Y denotes element-by-element powers.

EXAMPLE
B = 2;
R = A .^ B
R = A .^ B
Implemented for REAL exponents only.

6.2. OPERATORS
6.2.16 A ./ B
rdivide - Right array divide
SYNTAX
R = X ./ Y
R = rdivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A./B denotes element-by-element division. A and B must have the
same dimensions unless one is a scalar. A scalar can be divided
with anything.

EXAMPLE
R = A ./ B
R = A ./ B

6.2. OPERATORS
6.2.17 A(I)
subsref - Subscripted reference
SYNTAX
R = X(I)
X - GPUsingle
I - GPUsingle
R - GPUsingle
DESCRIPTION
A(I) (subsref) is an array formed from the elements of A specified
by the subscript vector I. The resulting array is the same size as
I except for the special case where A and I are both vectors. In
this case, A(I) has the same number of elements as I but has the
orientation of A.

EXAMPLE
A = GPUsingle([1 2 3 4 5]);
idx = GPUsingle([1 2]);
B = A(idx)

6.2. OPERATORS
6.2.18 A .* B
times - Array multiply
SYNTAX
R = X .* Y
R = times(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X.*Y denotes element-by-element multiplication. X and Y must
have the same dimensions unless one is a scalar. A scalar can be
multiplied into anything.

EXAMPLE
R = A .* B
R = A .* B

6.2. OPERATORS
6.2.19 A .’
transpose - Transpose
SYNTAX
R = X.’
R = transpose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X.’ or transpose(X) is the non-conjugate transpose.

EXAMPLE
R = X.’
R = transpose(X)

6.2. OPERATORS
6.2.20 [A;B]
vertcat - Vertical concatenation
SYNTAX
R = [X;Y]
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
[A;B] is the vertical concatenation of matrices A and B. A and B
must have the same number of columns. Any number of matrices
can be concatenated within one pair of brackets.

EXAMPLE
A = [zeros(10,1,GPUsingle);colon(0,1,10,GPUsingle)’];

6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST
6.3 High level functions - alphabetical list

6.3.1 abs
abs - Absolute value
SYNTAX
R = abs(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ABS(X) is the absolute value of the elements of X. When X is com-
plex, ABS(X) is the complex modulus (magnitude) of the elements
of X.

EXAMPLE
X = GPUsingle(rand(1,5)+i*rand(1,5));
R = abs(X)

6.3.2 acos
acos - Inverse cosine
SYNTAX
R = acos(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ACOS(X) is the arccosine of the elements of X. NaN (Not A Number)
results are obtained if ABS(x) > 1.0 for some element.
EXAMPLE
R = acos(X)
NaN returned if ABS(x) > 1.0 . In this case Matlab returns a
complex number. Not implemented for complex X.

6.3.3 acosh
acosh - Inverse hyperbolic cosine
SYNTAX
R = acosh(X)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
ACOSH(X) is the inverse hyperbolic cosine of the elements of X.
EXAMPLE
X = GPUsingle(rand(10)) + 1;
R = acosh(X)
NaN is returned if X<1.0 . Not implemented for complex X.

6.3.4 and
and - Logical AND
SYNTAX
R = A & B
R = and(A,B)
A - GPUsingle
B - GPUsingle
R - GPUsingle
DESCRIPTION
A & B performs a logical AND of arrays A and B and returns an
array containing elements set to either logical 1 (TRUE) or logical
0 (FALSE).
EXAMPLE
B = GPUsingle([0 1 10 2]);
R = A & B;
single(R)

6.3.5 asin
asin - Inverse sine
SYNTAX
R = asin(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ASIN(X) is the arcsine of the elements of X. NaN (Not A Number)
results are obtained if ABS(x) > 1.0 for some element.
EXAMPLE
R = asin(X)
NaN returned if ABS(x) > 1.0 . In this case Matlab returns a
complex number. Not implemented for complex X.

6.3.6 asinh
asinh - Inverse hyperbolic sine
SYNTAX
R = asinh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ASINH(X) is the inverse hyperbolic sine of the elements of X.
EXAMPLE
R = asinh(X)
Not implemented for complex X.

6.3.7 atan
atan - Inverse tangent, result in radians
SYNTAX
R = atan(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ATAN(X) is the arctangent of the elements of X.
EXAMPLE
R = atan(X)

6.3.8 atanh
atanh - Inverse hyperbolic tangent
SYNTAX
R = atanh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ATANH(X) is the inverse hyperbolic tangent of the elements of X.
EXAMPLE
R = atanh(X)

6.3.9 ceil
ceil - Round towards plus infinity
SYNTAX
R = ceil(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
CEIL(X) rounds the elements of X to the nearest integers towards
infinity.
EXAMPLE
R = ceil(X)

6.3.10 colon
colon - Colon
SYNTAX
R = colon(J,K,GPUsingle)
R = colon(J,D,K,GPUsingle)
DESCRIPTION
COLON(J,K,GPUsingle) is the same as J:K and
COLON(J,D,K,GPUsingle) is the same as J:D:K. J:K is the
same as [J, J+1, ..., K]. J:K is empty if J > K. J:D:K is the
same as [J, J+D, ..., J+m*D] where m = fix((K-J)/D). J:D:K
is empty if D == 0, if D > 0 and J > K, or if D < 0 and J < K.

EXAMPLE
A = colon(1,2,10,GPUsingle)

6.3.11 conj
conj - CONJ(X) is the complex conjugate of X
SYNTAX
R = conj(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
For a complex X, CONJ(X) = REAL(X) - i*IMAG(X).

EXAMPLE
A = GPUsingle(rand(1,5) + i*rand(1,5));
B = conj(A)

6.3.12 cos
cos - Cosine of argument in radians
SYNTAX
R = cos(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
COS(X) is the cosine of the elements of X.
EXAMPLE
R = cos(X)

6.3.13 cosh
cosh - Hyperbolic cosine
SYNTAX
R = cosh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
COSH(X) is the hyperbolic cosine of the elements of X.
EXAMPLE
R = cosh(X)

6.3.14 ctranspose
ctranspose - Complex conjugate transpose
SYNTAX
R = X’
R = ctranspose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X’ is the complex conjugate transpose of X.

EXAMPLE
X = GPUsingle(rand(10)+i*rand(10));
R = A’
R = ctranspose(X)

6.3.15 display
display - Display GPU variable
SYNTAX
display(X)
X - GPUsingle
DESCRIPTION
Prints GPU single information. DISPLAY(X) is called for the ob-
ject X when the semicolon is not used to terminate a statement.
EXAMPLE
display(A)
A

6.3.16 double
double - Converts a GPU single precision variable into a Matlab
double precision variable
SYNTAX
R = single(A)
A - GPUsingle variable
R - single precision Matlab variable
DESCRIPTION
B = SINGLE(A) converts the content of the GPU single precision
variable A into a double precision Matlab array. Loss of precision
occurs in the conversion.

EXAMPLE
Ah = double(A);

6.3.17 eq
eq - Equal
SYNTAX
R = X == Y
R = eq(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A == B (eq(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A == B;
single(R)
R = eq(A, B);
single(R)

6.3.18 exp
exp - Exponential
SYNTAX
R = exp(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
EXP(X) is the exponential of the elements of X, e to the X. For
complex Z=X+i*Y, EXP(Z) = EXP(X)*(COS(Y)+i*SIN(Y)).

EXAMPLE
R = exp(X)

6.3.19 fft
fft - Discrete Fourier transform
SYNTAX
R = fft(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
FFT(X) is the discrete Fourier transform (DFT) of vector X.

EXAMPLE
R = fft(X)

6.3.20 fft2
fft2 - Two-dimensional discrete Fourier Transform
SYNTAX
R = fft2(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
FFT2(X) returns the two-dimensional Fourier transform of matrix
X.

EXAMPLE
R = fft2(X)

6.3.21 floor
floor - Round towards minus infinity
SYNTAX
R = floor(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
FLOOR(X) rounds the elements of X to the nearest integers towards
minus infinity.
EXAMPLE
X = GPUsingle(rand(1,5));
R = floor(X)

6.3.22 ge
ge - Greater than or equal
SYNTAX
R = X >= Y
R = ge(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A >= B (ge(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A >= B;
single(R)
R = ge(A, B);
single(R)

6.3.23 GPUinfo
GPUinfo - Prints information about the GPU device
SYNTAX
GPUinfo
DESCRIPTION
GPUinfo displays information about each CUDA capable device
installed on the system. Printed information includes total memory
and number of processors. GPUinfo(N) displays information about
the specific device with index= N.
EXAMPLE
GPUinfo(0)
6.3.24 GPUmem
GPUmem - Returns the free memory (bytes) on selected GPU
device
SYNTAX
GPUmem
DESCRIPTION
Returns the free memory (bytes) on selected GPU device.
EXAMPLE
GPUmem
GPUmem/1024/1024

6.3.25 GPUsingle
GPUsingle - GPUsingle constructor
SYNTAX
R = GPUsingle()
R = GPUsingle(A)
A - Either a GPUsingle or a Matlab array
R - GPUsingle variable
DESCRIPTION
GPUsingle is used to create a Matlab variable that is effectively
allocated on the GPU memory. Operations on GPUsingle objects
are executed on GPU.

EXAMPLE
GPUsingle(rand(100,100))
Ah = rand(100);
A = GPUsingle(Ah);
Bh = rand(100) + i*rand(100);
B = GPUsingle(Bh);

6.3.26 GPUstart
GPUstart - Starts the GPU environment and loads required com-
ponents
SYNTAX
GPUstart
DESCRIPTION
Start GPU environment and load required components.
EXAMPLE
GPUstart
6.3.27 GPUsync
GPUsync - Wait until all GPU operations are completed
SYNTAX
GPUsync
DESCRIPTION
Wait until all GPU operations are completed.
EXAMPLE
tic;A + B;GPUsync;toc;

6.3.28 gt
gt - Greater than
SYNTAX
R = X > Y
R = gt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A > B (gt(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A > B;
single(R)
R = gt(A, B);
single(R)

6.3.29 ifft
ifft - Inverse discrete Fourier transform
SYNTAX
R = ifft(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
IFFT(X) is the inverse discrete Fourier transform of X.

EXAMPLE
R = fft(X);
X = ifft(R);

6.3.30 ifft2
ifft2 - Two-dimensional inverse discrete Fourier transform
SYNTAX
R = ifft2(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
IFFT2(F) returns the two-dimensional inverse Fourier transform of
matrix F.

EXAMPLE
R = fft2(X);
X = ifft2(R);

6.3.31 iscomplex
iscomplex - True for complex array
SYNTAX
R = iscomplex(X)
X - GPUsingle
R - logical (0 or 1)
DESCRIPTION
ISCOMPLEX(X) returns 1 if X does have an imaginary part and 0
otherwise.

EXAMPLE
iscomplex(A)
iscomplex(A)

6.3.32 isempty
isempty - True for empty GPUsingle array
SYNTAX
R = isempty(X)
X - GPUsingle
DESCRIPTION
ISEMPTY(X) returns 1 if X is an empty GPUsingle array and 0
otherwise. An empty GPUsingle array has no elements, that is
prod(size(X))==0.

EXAMPLE
A = GPUsingle();
isempty(A)
isempty(A)

6.3.33 isreal
isreal - True for real array
SYNTAX
R = isreal(X)
X - GPUsingle
DESCRIPTION
ISREAL(X) returns 1 if X does not have an imaginary part and 0
otherwise.

EXAMPLE
isreal(A)
isreal(A)

6.3.34 isscalar
isscalar - True if array is a scalar
SYNTAX
R = isscalar(X)
X - GPUsingle
DESCRIPTION
ISSCALAR(S) returns 1 if S is a 1x1 matrix and 0 otherwise.

EXAMPLE
isscalar(A)
A = GPUsingle(1);
isscalar(A)

6.3.35 ldivide
ldivide - Left array divide
SYNTAX
R = X .\ Y
R = ldivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A.\B denotes element-by-element division. A and B must have the
with anything.

EXAMPLE
R = A .\ B
R = A .\ B

6.3.36 le
le - Less than or equal
SYNTAX
R = X <= Y
R = le(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A <= B (le(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A <= B;
single(R)
R = le(A, B);
single(R)

6.3.37 length
length - Length of vector
SYNTAX
R = length(X)
X - GPUsingle
DESCRIPTION
LENGTH(X) returns the length of vector X. It is equivalent to
MAX(SIZE(X)) for non-empty arrays and 0 for empty ones.

EXAMPLE
length(A)

6.3.38 log
log - Natural logarithm
SYNTAX
R = log(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
LOG(X) is the natural logarithm of the elements of X. NaN results
are produced if X is not positive.
EXAMPLE
R = log(X)

6.3.39 log10
log10 - Common (base 10) logarithm
SYNTAX
R = log10(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
LOG10(X) is the base 10 logarithm of the elements of X. NaN results
are produced if X is not positive.
EXAMPLE
R = log10(X)

6.3.40 log1p
log1p - Compute log(1+z) accurately
SYNTAX
R = log1p(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
LOG1P(Z) computes log(1+z). Only REAL values are accepted.
EXAMPLE
R = log1p(X)

6.3.41 log2
log2 - Base 2 logarithm and dissect floating point number
SYNTAX
R = log2(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
Y = LOG2(X) is the base 2 logarithm of the elements of X.
EXAMPLE
R = log2(X)

6.3.42 lt
lt - Less than
SYNTAX
R = X < Y
R = lt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A < B (lt(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A < B;
single(R)
R = lt(A, B);
single(R)

6.3.43 minus
minus - Minus
SYNTAX
R = X - Y
R = minus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X - Y subtracts matrix Y from X. X and Y must have the same
dimensions unless one is a scalar. A scalar can be subtracted from
anything.

EXAMPLE
Y = GPUsingle(rand(10));
R = Y - X

6.3.44 mrdivide
mrdivide - Slash or right matrix divide
SYNTAX
R = X / Y
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Slash or right matrix divide.

EXAMPLE
B = A / 5
Supported only A / n where n is scalar.

6.3.45 mtimes
mtimes - Matrix multiply
SYNTAX
R = X * Y
R = mtimes(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
* (mtimes(X, Y)) is the matrix product of X and Y.

EXAMPLE
R = A * B
R = A * B

6.3.46 ndims
ndims - Number of dimensions
SYNTAX
R = ndims(X)
X - GPUsingle
DESCRIPTION
N = NDIMS(X) returns the number of dimensions in the array X.
The number of dimensions in an array is always greater than or
equal to 2. Trailing singleton dimensions are ignored. Put simply,
it is LENGTH(SIZE(X)).

EXAMPLE
ndims(X)

6.3.47 ne
ne - Not equal
SYNTAX
R = X ~= Y
R = ne(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A ~= B (ne(A, B)) does element by element comparisons between
A and B.

EXAMPLE
R = A ~= B;
single(R)
R = ne(A, B);
single(R)

6.3.48 not
not - Logical NOT
SYNTAX
R = ~X
X - GPUsingle
R - GPUsingle
DESCRIPTION
~A (not(A)) performs a logical NOT of input array A.
EXAMPLE
R = ~A;
single(R)

6.3.49 numel
numel - Number of elements in an array or subscripted array ex-
pression.
SYNTAX
R = numel(X)
X - GPUsingle
R - number of elements
DESCRIPTION
N = NUMEL(A) returns the number of elements N in array A.

EXAMPLE
numel(X)

6.3.50 ones
ones - GPU single precision ones array
SYNTAX
ones(N,GPUsingle)
ones(M,N,GPUsingle)
ones([M,N],GPUsingle)
ones(M,N,P,?,GPUsingle)
ones([M N P ...],GPUsingle)
DESCRIPTION
ones(N,GPUsingle) is an N-by-N GPU matrix of single preicision
ones.
ones(M,N,GPUsingle) or ones([M,N],GPUsingle) is an M-by-N
GPU matrix of single precision ones.
ones(M,N,P,...,GPUsingle) or ones([M N P ...,GPUsingle])
is an M-by-N-by-P-by-... GPU array of single precision ones.
EXAMPLE
A = ones(10,GPUsingle)
B = ones(10, 10,GPUsingle)
C = ones([10 10],GPUsingle)

6.3.51 or
or - Logical OR
SYNTAX
R = X | Y
R = or(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A | B (or(A, B)) performs a logical OR of arrays A and B.
EXAMPLE
R = A | B;
single(R)
R = or(A, B);
single(R)

6.3.52 plus
plus - Plus
SYNTAX
R = X + Y
R = plus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X + Y (plus(X, Y)) adds matrices X and Y. X and Y must have
the same dimensions unless one is a scalar (a 1-by-1 matrix). A
scalar can be added to anything.

EXAMPLE
R = A + B
R = A + B

6.3.53 power
power - Array power
SYNTAX
R = X .^ Y
R = power(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Z = X.^Y denotes element-by-element powers.

EXAMPLE
B = 2;
R = A .^ B
R = A .^ B
Implemented for REAL exponents only.

6.3.54 rdivide
rdivide - Right array divide
SYNTAX
R = X ./ Y
R = rdivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A./B denotes element-by-element division. A and B must have the
with anything.

EXAMPLE
R = A ./ B
R = A ./ B

6.3.55 round
round - Round towards nearest integer
SYNTAX
R = round(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ROUND(X) rounds the elements of X to the nearest integers.
EXAMPLE
R = round(X)

6.3.56 sin
sin - Sine of argument in radians
SYNTAX
R = sin(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
SIN(X) is the sine of the elements of X.
EXAMPLE
R = sin(X)

6.3.57 single
single - Converts a GPU variable into a Matlab single precision
variable
SYNTAX
R = single(X)
X - GPUsingle
R - Matlab variable
DESCRIPTION
B = SINGLE(A) returns the contents of the GPU variable A into a
single precision Matlab array.

EXAMPLE
A = GPUsingle(rand(100))
Ah = single(A);

6.3.58 sinh
sinh - Hyperbolic sine
SYNTAX
R = sinh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
SINH(X) is the hyperbolic sine of the elements of X.
EXAMPLE
R = sinh(X)

6.3.59 size
size - Size of array
SYNTAX
R = size(X)
X - GPUsingle
DESCRIPTION
D = SIZE(X), for M-by-N matrix X, returns the two-element row
vector D = [M,N] containing the number of rows and columns in
the matrix.

EXAMPLE
size(X)

6.3.60 sqrt
sqrt - Square root
SYNTAX
R = sqrt(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
SQRT(X) is the square root of the elements of X. NaN results are
produced if X is not positive.
EXAMPLE
R = sqrt(X)

6.3.61 subsref
subsref - Subscripted reference
SYNTAX
R = X(I)
X - GPUsingle
I - GPUsingle
R - GPUsingle
DESCRIPTION
A(I) (subsref) is an array formed from the elements of A specified
by the subscript vector I. The resulting array is the same size as
I except for the special case where A and I are both vectors. In
this case, A(I) has the same number of elements as I but has the
orientation of A.

EXAMPLE
A = GPUsingle([1 2 3 4 5]);
idx = GPUsingle([1 2]);
B = A(idx)

6.3.62 sum
sum - Sum of elements
SYNTAX
R = sum(X)
R = sum(X, DIM)
X - GPUsingle
DIM - integer
R - GPUsingle
DESCRIPTION
S = SUM(X) is the sum of the elements of the vector X. S =
SUM(X,DIM) sums along the dimension DIM.
Note: currently the performance of the sum(X,DIM) with DIM>1 is
3x or 4x better than the sum(X,DIM) with DIM=1.

EXAMPLE
R = sum(X);
E = sum(X,2);

6.3.63 tan
tan - Tangent of argument in radians
SYNTAX
R = tan(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
TAN(X) is the tangent of the elements of X.
EXAMPLE
R = tan(X)

6.3.64 tanh
tanh - Hyperbolic tangent
SYNTAX
R = tanh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
TANH(X) is the hyperbolic tangent of the elements of X.
EXAMPLE
R = tanh(X)

6.3.65 times
times - Array multiply
SYNTAX
R = X .* Y
R = times(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X.*Y denotes element-by-element multiplication. X and Y must
have the same dimensions unless one is a scalar. A scalar can be
multiplied into anything.

EXAMPLE
R = A .* B
R = A .* B

6.3.66 transpose
transpose - Transpose
SYNTAX
R = X.’
R = transpose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X.’ or transpose(X) is the non-conjugate transpose.

EXAMPLE
R = X.’
R = transpose(X)

6.3.67 uminus
uminus - Unary minus
SYNTAX
R = -X
R = uminus(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
-A negates the elements of A.

EXAMPLE
R = -X
R = uminus(X)

6.3.68 vertcat
vertcat - Vertical concatenation
SYNTAX
R = [X;Y]
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
[A;B] is the vertical concatenation of matrices A and B. A and B
must have the same number of columns. Any number of matrices
can be concatenated within one pair of brackets.

EXAMPLE
A = [zeros(10,1,GPUsingle);colon(0,1,10,GPUsingle)’];

6.3.69 zeros
zeros - GPU single precision zeros array
SYNTAX
zeros(N,GPUsingle)
zeros(M,N,GPUsingle)
zeros([M,N],GPUsingle)
zeros(M,N,P,?,GPUsingle)
zeros([M N P ...],GPUsingle)
DESCRIPTION
zeros(N,GPUsingle) is an N-by-N GPU matrix of single preicision
zeros.
zeros(M,N,GPUsingle) or zeros([M,N],GPUsingle) is an M-by-N
GPU matrix of single precision zeros.
zeros(M,N,P,...,GPUsingle) or zeros([M N P
...,GPUsingle]) is an M-by-N-by-P-by-... GPU array of
single precision zeros.
EXAMPLE
A = zeros(10,GPUsingle)
B = zeros(10, 10,GPUsingle)
C = zeros([10 10],GPUsingle)

6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST
6.4 Low level functions - alphabetical list

6.4.1 cublasAlloc
cublasAlloc - Wrapper to CUBLAS cublasAlloc function
SYNTAX
[status d_A] = cublasAlloc(N,SIZE,d_A);

N - number of elements to allocate
SIZE - size of the elements to allocate
d_A - pointer to GPU memory
status - CUBLAS status
d_A - pointer to GPU memory
DESCRIPTION
Wrapper to CUBLAS cublasAlloc function.
Original function declaration:
cublasStatus
cublasAlloc (int n, int elemSize, void **devicePtr)
Mapping:
[status d_A] = cublasAlloc(N, SIZE, d_A)

N -> int n
SIZE -> int elemSize
d_A -> void **devicePtr
status -> cublasStatus
EXAMPLE
N = 10;
% GPU variable d_A
d_A = 0;
[status d_A] = cublasAlloc(N,SIZEOF_FLOAT,d_A);
’!!!! device memory allocation error (d_A)’);

6.4.2 cublasCgemm
cublasCgemm - Wrapper to CUBLAS cublasCgemm function
DESCRIPTION
Wrapper to CUBLAS cublasCgemm function. Original function
declaration:
void cublasCgemm
(char transa, char transb, int m, int n, int k,
cuComplex alpha, const cuComplex *A, int lda,
const cuComplex *B, int ldb, cuComplex beta,
cuComplex *C, int ldc)
EXAMPLE
I = sqrt(-1);
A = GPUsingle(rand(N,N) + I*rand(N,N));
B = GPUsingle(rand(N,N) + I*rand(N,N));
% C needs to be complex as well
C = zeros(N,N,GPUsingle)*I;
% alpha is complex
alpha = 2.0+I*3.0;
beta = 0.0;
opA = ’n’;
opB = ’n’;
cublasCgemm(opA, opB, N, N, N, ...


6.4.3 cublasCheckStatus
cublasCheckStatus - Check the CUBLAS status.
DESCRIPTION
cublasCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an er-
ror with message ’MSG’.
EXAMPLE
cublasCheckStatus( status, ’Kernel execution error’);
6.4.4 cublasError
cublasError - Returns a structure with CUBLAS result codes
DESCRIPTION
Returns a structure with CUBLAS result codes.
EXAMPLE
cublasError
ans =
CUBLAS_STATUS_SUCCESS: 0
CUBLAS_STATUS_NOT_INITIALIZED: 1
CUBLAS_STATUS_ALLOC_FAILED: 3
CUBLAS_STATUS_INVALID_VALUE: 7
CUBLAS_STATUS_ARCH_MISMATCH: 8
CUBLAS_STATUS_MAPPING_ERROR: 11
CUBLAS_STATUS_EXECUTION_FAILED: 13
CUBLAS_STATUS_INTERNAL_ERROR: 14

6.4.5 cublasFree
cublasFree - Wrapper to CUBLAS cublasFree function
DESCRIPTION
Wrapper to CUBLAS cublasFree function.
cublasStatus
cublasFree (const void *devicePtr)
Mapping:
status = cublasFree(d_A)
d_A -> const void *devicePtr
status -> cublasStatus
EXAMPLE
N = 10;
% GPU variable d_A

d_A = 0;
[status d_A] = cublasAlloc(N,SIZEOF_FLOAT,d_A);
’!!!! device memory allocation error (d_A)’);
% Clean up memory
status = cublasFree(d_A);
’!!!! memory free error (d_A)’);

6.4.6 cublasGetError
cublasGetError - Wrapper to CUBLAS cublasGetError function
DESCRIPTION
Wrapper to CUBLAS cublasGetError function. Original function
declaration:
cublasStatus
cublasGetError (void)
EXAMPLE
cublasCheckStatus( status, ’Kernel execution error’);

6.4.7 cublasGetVector
cublasGetVector - Wrapper to CUBLAS cublasGetVector function
DESCRIPTION
Wrapper to CUBLAS cublasGetVector function. Original function
declaration:
cublasStatus
cublasGetVector
(int n, int elemSize, const void *x, int incx,
void *y, int incy)
EXAMPLE
% Ah should be of the correct type. GPUsingle is single

% precision floating point, also Ah should be single
% precision
Ah = single(zeros(size(A)));
% The function getSizeOf returns the size of the

% stored elements in A (for example float or complex)
[status Ah] = cublasGetVector (numel(A), getSizeOf(A),...
getPtr(A), 1, Ah, 1);
cublasCheckStatus( status, ’Error.’);
disp(Ah);

6.4.8 cublasInit
cublasInit - Wrapper to CUBLAS cublasInit function
DESCRIPTION
Wrapper to CUBLAS cublasInit function. Original function decla-
ration:
cublasStatus
cublasInit (void)
EXAMPLE
status = cublasInit;
cublasCheckStatus(status, ’Error.’);

6.4.9 cublasIsamax
cublasIsamax - Wrapper to CUBLAS cublasIsamax function
DESCRIPTION
Wrapper to CUBLAS cublasIsamax function. Original function
declaration:
int
cublasIsamax (int n, const float *x, int incx)
Mapping:
RET = cublasIsamax(N, d_A, INCX)

N -> int n
d_A -> void **devicePtr
INCX -> int incx
RET -> cublasIsamax result
EXAMPLE
%% High level implementation

Isamax = cublasIsamax(N, getPtr(A), 1);

[value, Isamax_mat] = max(single(A));

compareArrays(Isamax, Isamax_mat, 1e-6);

6.4.10 cublasIsamin
cublasIsamin - Wrapper to CUBLAS cublasIsamin function
DESCRIPTION
Wrapper to CUBLAS cublasIsamin function. Original function dec-
laration:
int
cublasIsamin (int n, const float *x, int incx)
EXAMPLE
N = 10;
Isamin = cublasIsamin(N, getPtr(A), 1);

[value, Isamin_mat] = min(single(A));

compareArrays(Isamin, Isamin_mat, 1e-6);
6.4.11 cublasResult
cublasResult - Returns a structure with CUBLAS error results
DESCRIPTION
Returns a structure with CUBLAS error results.

6.4.12 cublasSasum
cublasSasum - Wrapper to CUBLAS cublasSasum function
DESCRIPTION
Wrapper to CUBLAS cublasSasum function.
float
cublasSasum (int n, const float *x, int incx)
EXAMPLE
N = 10;
Sasum = cublasSasum( N, getPtr(A), 1);

Sasum_mat = sum(single(A));
compareArrays(Sasum, Sasum_mat, 1e-6);

6.4.13 cublasSaxpy
cublasSaxpy - Wrapper to CUBLAS cublasSaxpy function
DESCRIPTION
Wrapper to CUBLAS cublasSaxpy function. Original function dec-
laration:
void
cublasSaxpy
(int n, float alpha, const float *x, int incx, float *y,
int incy)
EXAMPLE
N = 10;
B = GPUsingle(rand(1,N));
alpha = 2.0;
Saxpy_mat = alpha * single(A) + single(B);
cublasSaxpy(N, alpha, getPtr(A), 1, getPtr(B), 1);
compareArrays(Saxpy_mat, single(B), 1e-6);

6.4.14 cublasScopy
cublasScopy - Wrapper to CUBLAS cublasScopy function
DESCRIPTION
Wrapper to CUBLAS cublasScopy function. Original function dec-
laration:
void
cublasScopy
(int n, const float *x, int incx, float *y, int incy)
EXAMPLE
N = 10;
cublasScopy(N, getPtr(A), 1, getPtr(B), 1);

compareArrays(single(A), single(B), 1e-6);

6.4.15 cublasSdot
cublasSdot - Wrapper to CUBLAS cublasSdot function
DESCRIPTION
Wrapper to CUBLAS cublasSdot function. Original function dec-
laration:
float
cublasSdot
(int n, const float *x, int incx, const float *y, int incy)
EXAMPLE
N = 10;
Sdot_mat = sum(single(A).*single(B));
Sdot = cublasSdot(N, getPtr(A), 1, getPtr(B), 1);
compareArrays(Sdot_mat, Sdot, 1e-6);

6.4.16 cublasSetVector
cublasSetVector - Wrapper to CUBLAS cublasSetVector function
DESCRIPTION
Wrapper to CUBLAS cublasSetVector function. Original function
declaration:
cublasStatus
cublasSetVector
(int n, int elemSize, const void *x, int incx,
void *y, int incy)
EXAMPLE
B =single( [1 2 3 4]);
% Create empty GPU variable A

A = GPUsingle();
setSize(A, size(B));
GPUallocVector(A);
status = cublasSetVector(numel(A), getSizeOf(A), ...

B, 1, getPtr(A), 1);
cublasCheckStatus( status, ’Error.’);
disp(single(A));

6.4.17 cublasSgemm
cublasSgemm - Wrapper to CUBLAS cublasSgemm function
DESCRIPTION
Wrapper to CUBLAS cublasSgemm function. Original function
declaration:
void
cublasSgemm
(char transa, char transb, int m, int n, int k,
float alpha, const float *A, int lda,
const float *B, int ldb, float beta,
float *C, int ldc)
EXAMPLE
N = 10;
A = GPUsingle(rand(N,N));
B = GPUsingle(rand(N,N));
C = zeros(N,N,GPUsingle);
alpha = 2.0;
beta = 0.0;
opA = ’n’;
opB = ’n’;
cublasSgemm(opA, opB, N, N, N, ...


6.4.18 cublasShutdown
cublasShutdown - Wrapper to CUBLAS cublasShutdown function
DESCRIPTION
Wrapper to CUBLAS cublasShutdown function. Original function
declaration:
cublasStatus
cublasShutdown (void)
6.4.19 cublasSnrm2
cublasSnrm2 - Wrapper to CUBLAS cublasSnrm2 function
DESCRIPTION
Wrapper to CUBLAS cublasSnrm2 function. Original function dec-
laration:
float
cublasSnrm2 (int n, const float *x, int incx)
EXAMPLE
N = 10;
Snrm2_mat = sqrt(sum(single(A).*single(A)));
Snrm2 = cublasSnrm2(N, getPtr(A),1);

6.4.20 cublasSrot
cublasSrot - Wrapper to CUBLAS cublasSrot function
DESCRIPTION
Wrapper to CUBLAS cublasSrot function.
void
cublasSrot (int n, float *x, int incx,
float *y, int incy, float sc,
float ss)
6.4.21 cublasSscal
cublasSscal - Wrapper to CUBLAS cublasSscal function
DESCRIPTION
Wrapper to CUBLAS cublasSscal function.
void
sscal (int n, float alpha, float *x, int incx)
EXAMPLE
N = 10;
alpha = 1/10.0;
A_mat = single(A)*alpha;
cublasSscal(N, alpha, getPtr(A), 1);

6.4.22 cuCheckStatus
cuCheckStatus - Check the CUDA DRV status.
DESCRIPTION
cuCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an
error with message ’MSG’.
EXAMPLE
[status]=cuInit();
cuCheckStatus( status, ’Error initialize CUDA driver.’);
6.4.23 cudaCheckStatus
cudaCheckStatus - Check the CUDA run-time status
DESCRIPTION
RET = cudaCheckStatus(STATUS,MSG) returns EXIT FAILURE(1)
or EXIT SUCCESS(0) depending on STATUS value, and throws an
error with message ’MSG’.
EXAMPLE
status = cudaGetLastError();
cudaCheckStatus( status, ’Kernel execution error.’);

6.4.24 cudaGetDeviceCount
cudaGetDeviceCount - Wrapper to CUDA cudaGetDeviceCount
function.
DESCRIPTION
Wrapper to CUDA cudaGetDeviceCount function.
EXAMPLE
count = 0;
[status,count] = cudaGetDeviceCount(count);
if (status ~=0)
error(’Unable to get the number of devices’);
end
6.4.25 cudaGetDeviceMajorMinor
cudaGetDeviceMajorMinor - Returns CUDA compute capability
major and minor numbers.
DESCRIPTION
Returns CUDA compute capability major and minor numbers.
[STATUS, MAJOR, MINOR] = cudaGetDeviceMajorMinor(DEV)
returns the compute capability number (major, minor) of the
device=DEV. STATUS is the result of the operation.
EXAMPLE
dev = 0;
[status,major,minor] = cudaGetDeviceMajorMinor(dev);
if (status ~=0)
error([’Unable to get the compute capability’]);
end
major
minor

6.4.26 cudaGetDeviceMemory
cudaGetDeviceMemory - Returns device total memory
DESCRIPTION
[STATUS, TOTMEM] = cudaGetDeviceMemory(DEV) returns the to-
tal memory of the device=DEV. STATUS is the result of the oper-
ation.
EXAMPLE
dev = 0;
[status,totmem] = cudaGetDeviceMemory(dev);
if (status ~=0)
error(’Error getting total memory’);
end
totmem = totmem/1024/1024;
disp([’Total memory=’ num2str(totmem) ’MB’]);
6.4.27 cudaGetDeviceMultProcCount
cudaGetDeviceMultProcCount - Returns device multi-processors
count
DESCRIPTION
[STATUS, COUNT] = cudaGetDeviceMultProcCount(DEV) re-
turns the number of multi-processors of the device=DEV. STATUS
is the result of the operation.
EXAMPLE
dev = 0;
[status,count] = cudaGetDeviceMultProcCount(dev);
if (status ~=0)
error(’Error getting numer of multi proc’);
end
disp([’ Mult. processors = ’ num2str(count) ]);

6.4.28 cudaGetLastError
cudaGetLastError - Wrapper to CUDA cudaGetLastError function
DESCRIPTION
[STATUS] = cudaGetLastError() returns the last error from the
run-time call. STATUS is the result of the operation.
cudaError_t
cudaGetLastError(void)
6.4.29 cudaSetDevice
cudaSetDevice - Wrapper to CUDA cudaSetDevice function
DESCRIPTION
[STATUS] = cudaSetDevice(DEV) sets the device to DEV and re-
turns the result of the operation in STATUS.
cudaError_t
cudaSetDevice( int dev )
6.4.30 cudaThreadSynchronize
cudaThreadSynchronize - Wrapper to CUDA cudaThreadSyn-
chronize function.
DESCRIPTION
[STATUS] = cudaThreadSynchronize(). STATUS is the result of
the operation.
cudaError_t cudaThreadSynchronize(void)

6.4.31 cufftCheckStatus
cufftCheckStatus - Checks the CUFFT status
DESCRIPTION
cufftCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an er-
ror with message ’MSG’. STATUS is compared to CUFFT possible
results.
EXAMPLE
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);

6.4.32 cufftDestroy
cufftDestroy - Wrapper to CUFFT cufftDestroy function
DESCRIPTION
Wrapper to CUFFT cufftDestroy function. Original function dec-
laration:
cufftResult
cufftDestroy(cufftHandle plan);
EXAMPLE
I = sqrt(-1);
A = GPUsingle(rand(1,128)+I*rand(1,128));
plan = 0;

6.4.33 cufftExecC2C
cufftExecC2C - Wrapper to CUFFT cufftExecC2C function
DESCRIPTION
Wrapper to CUFFT cufftExecC2C function. Original function dec-
laration:
cufftResult
cufftExecC2C(cufftHandle plan,
cufftComplex *idata,
cufftComplex *odata,
int direction);
EXAMPLE
fftDir = cufftTransformDirections;
I = sqrt(-1);
plan = 0;
dir = fftDir.CUFFT_FORWARD;
[status] = cufftExecC2C(plan, getPtr(A),getPtr(A), dir);

6.4.34 cufftExecC2R
cufftExecC2R - Wrapper to CUFFT cufftExecC2R function
DESCRIPTION
Wrapper to CUFFT cufftExecC2R function. Original function dec-
laration:
cufftResult
cufftExecC2R(cufftHandle plan,
cufftComplex *idata,
cufftReal *odata);
6.4.35 cufftExecR2C
cufftExecR2C - Wrapper to CUFFT cufftExecR2C function
DESCRIPTION
Wrapper to CUFFT cufftExecR2C function. Original function dec-
laration:
cufftResult
cufftExecR2C(cufftHandle plan,
cufftReal *idata,
cufftComplex *odata);

6.4.36 cufftPlan1d
cufftPlan1d - Wrapper to CUFFT cufftPlan1d function
DESCRIPTION
Wrapper to CUFFT cufftPlan1d function. Original function decla-
ration:
cufftResult
cufftPlan1d(cufftHandle *plan,
int nx,
cufftType type,
int batch);
Original function returns only a cufftResult, whereas wrapper re-

turns also the plan.
EXAMPLE
I = sqrt(-1);
plan = 0;

6.4.37 cufftPlan2d
DESCRIPTION
ration:
cufftResult
int nx, int ny,
cufftType type);
EXAMPLE
I = sqrt(-1);
plan = 0;
% Vectors stored in column major format (FORTRAN)
s = size(A);
[status, plan] = cufftPlan2d(plan, s(2), s(1),type);

6.4.38 cufftPlan3d
DESCRIPTION
ration:
cufftResult
int nx, int ny, int nz,
cufftType type);
Original function returns only a cufftResult, whereas wrapper re-

turns also the plan.
6.4.39 cufftResult
cufftResult - Returns a structure with CUFFT result codes
DESCRIPTION
Returns a structure with CUFFT result codes
EXAMPLE
cufftResult
ans =
CUFFT_SUCCESS: 0
CUFFT_INVALID_PLAN: 1
CUFFT_ALLOC_FAILED: 2
CUFFT_INVALID_TYPE: 3
CUFFT_INVALID_VALUE: 4
...

6.4.40 cufftTransformDirections
cufftTransformDirections - Returns a structure with CUFFT
transform direction codes
DESCRIPTION
Returns a structure with CUFFT transform direction codes.
CUFFT_FORWARD = -1; Forward FFT

CUFFT_INVERSE = 1; Inverse FFT
EXAMPLE
cufftTransformDirections
ans =
CUFFT_FORWARD: -1
CUFFT_INVERSE: 1
6.4.41 cufftType
cufftType - Returns a structure with CUFFT transform type codes
DESCRIPTION
Returns a structure with CUFFT transform type codes.
EXAMPLE
cufftType
ans =
CUFFT_R2C: 42
CUFFT_C2R: 44
CUFFT_C2C: 41

6.4.42 cuInit
cuInit - Wrapper to CUDA driver function cuInit
DESCRIPTION
Wrapper to CUDA driver function cuInit.
6.4.43 cuMemGetInfo
cuMemGetInfo - Wrapper to CUDA driver function cuMemGet-
Info
DESCRIPTION
Wrapper to CUDA driver function cuMemGetInfo.
EXAMPLE
freemem = 0;
c = 0;
[status, freemem, c] = cuMemGetInfo(freemem,c);

6.4.44 getPtr
getPtr - Get GPUsingle pointer on GPU memory
SYNTAX
R = getPtr(X)
X - GPU variable
R - the pointer to the GPU memory region
DESCRIPTION
This is a low level function used to get the pointer value to the
GPU memory of a GPUsingle object

EXAMPLE
getPtr(A)

6.4.45 getSizeOf
getSizeOf - Get the size of the GPU datatype (similar to sizeof in
C)
SYNTAX
R = getSizeOf(X)
X - GPU variable
R - the size of the GPU variable datatype
DESCRIPTION
This is a low level function used to get the size of the datatype of
the GPU variable.

EXAMPLE
getSizeOf(A)

6.4.46 getType
getType - Get the type of the GPU variable
SYNTAX
R = getType(X)
X - GPU variable
R - the type of the GPU variable
DESCRIPTION
This is a low level function used to get the type of the GPU variable
(FLOAT = 0, COMPLEX FLOAT = 1, DOUBLE = 2, COMPLEX
DOUBLE = 3)

EXAMPLE
getType(A)

6.4.47 GPUallocVector
GPUallocVector - Variable allocation on GPU memory
SYNTAX
GPUallocVector(P)
P - GPUsingle
DESCRIPTION
P = GPUallocVector(P) allocates the required GPU memory for
P. The size of the allocated variable depends on the size of P.
A complex variable is allocated as an interleaved sequence of real
and imaginary values. It means that the memory size for a complex
on the GPU is numel(P)*2*SIZE OF FLOAT. It is mandatory to set
the size of the variable before calling GPUallocVector.

EXAMPLE
A = GPUsingle();
setSize(A,[100 100]);
GPUallocVector(A);
A = GPUsingle();
setSize(A,[100 100]);
setComplex(A);
GPUallocVector(A);

6.4.48 GPUdeviceInit
GPUdeviceInit - Initializes a CUDA capable GPU device
SYNTAX
GPUdeviceInit(dev)
dev - device number
DESCRIPTION
GPUdeviceInit(dev) initializes the GPU device dev, where dev is
an integer corresponding to the device number. By using GPUinfo
it is possible to see the available devices and the corresponding
number
EXAMPLE
GPUinfo
GPUdeviceInit(0)

6.4.49 istrans
istrans - True if GPUsingle TRANS flag is set to 1
SYNTAX
R = istrans(X)
X - GPUsingle
DESCRIPTION
ISTRANS(X) returns 1 if the flag TRANS is set

EXAMPLE
istrans(A)
B = transpose(A);
istrans(B)

6.4.50 packfC2C
packfC2C - Pack two arrays into an interleaved complex array
SYNTAX
PACKFC2C(RE_IDATA, IM_IDATA, ODATA)

RE_IDATA - GPUsingle, real part
IM_IDATA - GPUsingle, imaginary part
ODATA - GPUsingle, complex
DESCRIPTION
PACKFC2C(RE IDATA, IM IDATA, ODATA) pack the values of
RE IDATA and IM IDATA into ODATA as shown in the example. The
type of elements of ODATA is complex.

EXAMPLE
re = GPUsingle(rand(1,100));
im = GPUsingle(rand(1,100));
r = GPUsingle();
setComplex(r);
setSize(r,size(re));
GPUallocVector(r);
packfC2C(re,im,r);

6.4.51 packfR2C
packfR2C - Transforms a real array into a complex array with zero
complex elements.
SYNTAX
PACKFR2C(RE_IDATA, ODATA)
RE_IDATA - GPUsingle, real part
ODATA - GPUsingle, complex
DESCRIPTION
PACKFR2C(RE IDATA, ODATA) transforms RE IDATA into a the com-
plex array ODATA. The type of elements of ODATA is complex. The
complex part of ODATA is set to zero.

EXAMPLE
re = GPUsingle(rand(1,100));
r = GPUsingle();
setComplex(r);
setSize(r,size(re));
GPUallocVector(r);
packfR2C(re,r);

6.4.52 setComplex
setComplex - Set a GPUsingle as complex
SYNTAX
setComplex(A)
A - GPUsingle
DESCRIPTION
setComplex(P) set the GPUsingle P as complex. Should be called
before using GPUallocVector.

EXAMPLE
A = GPUsingle();
setSize(A,[10 10]);
setComplex(A);
GPUallocVector(A);

6.4.53 setReal
setReal - Set a GPUsingle as real
SYNTAX
setReal(A)
A - GPUsingle
DESCRIPTION
setReal(P) set the GPUsingle P as real. Should be called before
using GPUallocVector.

EXAMPLE
A = GPUsingle();
setSize(A,[10 10]);
setReal(A);
GPUallocVector(A);

6.4.54 setSize
setSize - Set GPUsingle size
SYNTAX
setSize(A,SIZE)
A - GPUsingle
DESCRIPTION
setSize(R, SIZE) set the size of R to SIZE

EXAMPLE
A = GPUsingle();
setSize(A,[10 10]);

6.4.55 unpackfC2C
unpackfC2C - Unpack one complex array into two single precision
arrays
SYNTAX
UNPACKFC2C(IDATA, RE_ODATA, IM_ODATA)
DESCRIPTION
UNPACKFC2C(IDATA, RE ODATA, IM ODATA) unpack the values of
IDATA into two arrays RE ODATA and IM ODATA as shown in the
example. The type of elements of IDATA is complex.

EXAMPLE
r = GPUsingle(rand(1,100)+i*rand(1,100));
re = GPUsingle();
setReal(re);
setSize(re,size(r));
GPUallocVector(re);
im = GPUsingle();
setReal(im);
setSize(im,size(r));
GPUallocVector(im);
unpackfC2C(r,re,im);

6.4.56 unpackfC2R
unpackfC2R - Transforms a complex array into a real array dis-
carding the complex part
SYNTAX
UNPACKFC2C(IDATA, RE_ODATA)
DESCRIPTION
UNPACKFC2C(IDATA, RE ODATA) transforms the complex array
IDATA into the array RE ODATA discarding the imaginary part. The
type of elements of IDATA is complex.

EXAMPLE
r = GPUsingle(rand(1,100)+i*rand(1,100));
re = GPUsingle();
setReal(re);
setSize(re,size(r));
GPUallocVector(re);
unpackfC2R(r,re);

Bibliography
[1] NVIDIA Cuda Programming Guide. NVIDIA Corporation.
[2] Cuda. http://www.nvidia.com/object/cuda_home.html#.
[3] Gpgpu. http://www.gpgpu.org.
[4] CUDA CUBLAS Library. NVIDIA Corporation.
189

Gpumat User Guide: Version 0.1, April 2009

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gpumat User Guide: Version 0.1, April 2009

Uploaded by

Copyright:

Available Formats

GPUmat User Guide

Version 0.1, April 2009

5 Frequently Asked Questions 48

3 GPUmat Guide Version 0.1. Copyright gp-you.ch.

4 GPUmat Guide Version 0.1. Copyright gp-you.ch.

6.3.32 isempty . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5 GPUmat Guide Version 0.1. Copyright gp-you.ch.

6.4.3 cublasCheckStatus . . . . . . . . . . . . . . . . . . . . . 147

6 GPUmat Guide Version 0.1. Copyright gp-you.ch.

6.4.44 getPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

7 GPUmat Guide Version 0.1. Copyright gp-you.ch.

GPUmat enables Matlab code to run on the Graphical Processing Unit

• GPU computational power can be easily accessed from Matlab without

• Matlab code is directly executed on the GPU. The execution is trans-

• GPUmat speeds up Matlab functions by using the GPU multi-processor

• GPUmat can be used as a Source Development Kit to create new func-

1.1 About GPUs

1.2 System requirements

1.3 Credits and licensing

1.4 How to install

• STEP2: start GPUmat using the GPUstart command.

9 GPUmat Guide Version 0.1. Copyright gp-you.ch.

The GPU environment will not correctly work if a CUDA compatible

Error in ==> GPUstart at xx

• GPU: Graphics Processing Unit. It is the graphic card. We assume

• HOST: The computer where the GPU is installed.

• CPU: The Central Processing Unit installed on the HOST.

• GPU memory: the memory available on the GPU.

• CPU memory: the memory available on the HOST.

• CUDA capable GPU: a GPU compatible with NVIDIA CUDA SDK.

1.6 Documentation overview

10 GPUmat Guide Version 0.1. Copyright gp-you.ch.

• Quick start: describes GPUmat basic concepts by using simple exam-

• Overview: describes GPUmat high level functions.

• Developer’s section: describes low-level functions and how to imple-

11 GPUmat Guide Version 0.1. Copyright gp-you.ch.

The most important concepts about GPUmat are the following:

Ah = single(rand(100,100)); % Ah in on CPU memory

Ah = rand(100,100); % Ah in on CPU memory, double precision

A = colon(0,2,6,GPUsingle) % A is on GPU memory

A = colon(0,.1,.5,GPUsingle) % A is on GPU memory

Ah = single(rand(100,100)); % Ah in on CPU memory

The following example shows:

• The calculation of exp(A). The execution is on GPU and the result is

• The conversion of the result C into the Matlab variable Ch.

13 GPUmat Guide Version 0.1. Copyright gp-you.ch.

Ah = single(rand(100,100)); % Ah in on CPU memory

To visualize the contents of a GPUsingle, type the name of the variable on

0.8147 0.0975 0.1576 0.1419 0.6557

Single precision REAL GPU type.

Next sections show different examples: matrix addition, matrix multiplica-

2.1 Matrix addition example

A = single(rand(100)); % A is on CPU memory

The ported GPUmat code is the following:

14 GPUmat Guide Version 0.1. Copyright gp-you.ch.

A = GPUsingle(rand(100)); % A is on GPU memory

A = GPUsingle(rand(100)); % A is on GPU memory

Ah = single(A); %Ah is on HOST, A is on GPU

A = single(colon(0,1,1000)); % A is on CPU memory

The ported GPUmat code is the following:

A = colon(0,1,1000,GPUsingle); % A is on GPU memory

The Matlab expression

15 GPUmat Guide Version 0.1. Copyright gp-you.ch.