You are on page 1of 189

GPUmat User Guide

Version 0.1, April 2009


Contents

Contents 2

1 Introduction 8
1.1 About GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 System requirements . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Credits and licensing . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 How to install . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Documentation overview . . . . . . . . . . . . . . . . . . . . . . 10

2 Quick start 11
2.1 Matrix addition example . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Matrix multiplication example . . . . . . . . . . . . . . . . . . . 15
2.3 FFT calculation example . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Performance analisys . . . . . . . . . . . . . . . . . . . . . . . . 16

3 GPUmat overview 19
3.1 Starting the GPU environment . . . . . . . . . . . . . . . . . . . 20
3.2 Creating a GPU variable . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Performing calculations on the GPU . . . . . . . . . . . . . . . . 24
3.4 Porting existing Matlab code . . . . . . . . . . . . . . . . . . . 25
3.5 Converting a GPU variable into a Matlab variable . . . . . . . . 26
3.6 GPUmat functions . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 GPU memory management . . . . . . . . . . . . . . . . . . . . 28
3.8 Coding guidelines . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8.1 Memory transfers . . . . . . . . . . . . . . . . . . . . . 30
3.8.2 Vectorized code and for-loops . . . . . . . . . . . . . . . 30
3.8.3 Matlab and GPUsingle variables . . . . . . . . . . . . . . 32
3.9 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . 33

2
CONTENTS
CONTENTS

4 Developer’s section 34
4.1 The GPUsingle class . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 GPUsingle constructor . . . . . . . . . . . . . . . . . . 36
4.1.2 GPUsingle properties . . . . . . . . . . . . . . . . . . . 38
4.1.3 GPUsingle methods . . . . . . . . . . . . . . . . . . . . 40
4.2 Low level GPU memory management . . . . . . . . . . . . . . . 41
4.2.1 Memory management using the GPUsingle class . . . . . 41
4.2.2 Memory management using low level functions . . . . . . 41
4.3 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 CUBLAS functions . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 CUFFT functions . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 Frequently Asked Questions 48


5.1 What happens if GPUmat and Matlab variables are used together? 48
5.2 Is any Matlab function executed on GPU by using GPUsingle? . 49
5.3 What operations should I perform on the GPU? . . . . . . . . . 50

6 Function Reference 51
6.1 Functions - by category . . . . . . . . . . . . . . . . . . . . . . 51
6.1.1 GPU startup and management . . . . . . . . . . . . . . 51
6.1.2 GPU variables management . . . . . . . . . . . . . . . . 51
6.1.3 GPU memory management . . . . . . . . . . . . . . . . 52
6.1.4 Numerical functions . . . . . . . . . . . . . . . . . . . . 52
6.1.5 General information . . . . . . . . . . . . . . . . . . . . 53
6.1.6 Complex numbers . . . . . . . . . . . . . . . . . . . . . 54
6.1.7 CUBLAS functions . . . . . . . . . . . . . . . . . . . . . 54
6.1.8 CUDA Driver functions . . . . . . . . . . . . . . . . . . 55
6.1.9 CUFFT functions . . . . . . . . . . . . . . . . . . . . . 55
6.1.10 CUDA run-time functions . . . . . . . . . . . . . . . . . 56
6.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2.1 A & B . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.2 A’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2.3 A == B . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.4 A >= B . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2.5 A > B . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.6 A <= B . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2.7 A < B . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.8 A - B . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.9 A / B . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.10 A * B . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.11 A ~= B . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CONTENTS
CONTENTS

6.2.12 ~A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.13 A | B . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.14 A + B . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2.15 A . ^B . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.16 A ./ B . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.17 A(I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.18 A .* B . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.19 A .’ . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.20 [A;B] . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 High level functions - alphabetical list . . . . . . . . . . . . . . . 78
6.3.1 abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.2 acos . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.3 acosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.4 and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3.5 asin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.6 asinh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3.7 atan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.8 atanh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3.9 ceil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.10 colon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.11 conj . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.12 cos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3.13 cosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.14 ctranspose . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.15 display . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.16 double . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.17 eq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.18 exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.3.19 fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3.20 fft2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.21 floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3.22 ge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3.23 GPUinfo . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.24 GPUmem . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.25 GPUsingle . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.26 GPUstart . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.27 GPUsync . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.28 gt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3.29 ifft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.30 ifft2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.31 iscomplex . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CONTENTS
CONTENTS

6.3.32 isempty . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


6.3.33 isreal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.34 isscalar . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3.35 ldivide . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.36 le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.3.37 length . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3.38 log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3.39 log10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3.40 log1p . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.41 log2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3.42 lt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.43 minus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.44 mrdivide . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.45 mtimes . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.3.46 ndims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3.47 ne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.48 not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.49 numel . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.3.50 ones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3.51 or . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3.52 plus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3.53 power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3.54 rdivide . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3.55 round . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3.56 sin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.3.57 single . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3.58 sinh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.59 size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3.60 sqrt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3.61 subsref . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.62 sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3.63 tan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.64 tanh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.65 times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3.66 transpose . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3.67 uminus . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.3.68 vertcat . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3.69 zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.4 Low level functions - alphabetical list . . . . . . . . . . . . . . . 145
6.4.1 cublasAlloc . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.4.2 cublasCgemm . . . . . . . . . . . . . . . . . . . . . . . 146

5 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CONTENTS
CONTENTS

6.4.3 cublasCheckStatus . . . . . . . . . . . . . . . . . . . . . 147


6.4.4 cublasError . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.4.5 cublasFree . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.4.6 cublasGetError . . . . . . . . . . . . . . . . . . . . . . . 149
6.4.7 cublasGetVector . . . . . . . . . . . . . . . . . . . . . . 150
6.4.8 cublasInit . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.4.9 cublasIsamax . . . . . . . . . . . . . . . . . . . . . . . . 152
6.4.10 cublasIsamin . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4.11 cublasResult . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4.12 cublasSasum . . . . . . . . . . . . . . . . . . . . . . . . 154
6.4.13 cublasSaxpy . . . . . . . . . . . . . . . . . . . . . . . . 155
6.4.14 cublasScopy . . . . . . . . . . . . . . . . . . . . . . . . 156
6.4.15 cublasSdot . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.4.16 cublasSetVector . . . . . . . . . . . . . . . . . . . . . . 158
6.4.17 cublasSgemm . . . . . . . . . . . . . . . . . . . . . . . 159
6.4.18 cublasShutdown . . . . . . . . . . . . . . . . . . . . . . 160
6.4.19 cublasSnrm2 . . . . . . . . . . . . . . . . . . . . . . . . 160
6.4.20 cublasSrot . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4.21 cublasSscal . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4.22 cuCheckStatus . . . . . . . . . . . . . . . . . . . . . . . 162
6.4.23 cudaCheckStatus . . . . . . . . . . . . . . . . . . . . . . 162
6.4.24 cudaGetDeviceCount . . . . . . . . . . . . . . . . . . . . 163
6.4.25 cudaGetDeviceMajorMinor . . . . . . . . . . . . . . . . . 163
6.4.26 cudaGetDeviceMemory . . . . . . . . . . . . . . . . . . . 164
6.4.27 cudaGetDeviceMultProcCount . . . . . . . . . . . . . . . 164
6.4.28 cudaGetLastError . . . . . . . . . . . . . . . . . . . . . 165
6.4.29 cudaSetDevice . . . . . . . . . . . . . . . . . . . . . . . 165
6.4.30 cudaThreadSynchronize . . . . . . . . . . . . . . . . . . 165
6.4.31 cufftCheckStatus . . . . . . . . . . . . . . . . . . . . . . 166
6.4.32 cufftDestroy . . . . . . . . . . . . . . . . . . . . . . . . 167
6.4.33 cufftExecC2C . . . . . . . . . . . . . . . . . . . . . . . 168
6.4.34 cufftExecC2R . . . . . . . . . . . . . . . . . . . . . . . 169
6.4.35 cufftExecR2C . . . . . . . . . . . . . . . . . . . . . . . 169
6.4.36 cufftPlan1d . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.4.37 cufftPlan2d . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.4.38 cufftPlan3d . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.39 cufftResult . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.40 cufftTransformDirections . . . . . . . . . . . . . . . . . . 173
6.4.41 cufftType . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.4.42 cuInit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.4.43 cuMemGetInfo . . . . . . . . . . . . . . . . . . . . . . . 174

6 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CONTENTS
CONTENTS

6.4.44 getPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . 175


6.4.45 getSizeOf . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.4.46 getType . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.4.47 GPUallocVector . . . . . . . . . . . . . . . . . . . . . . 178
6.4.48 GPUdeviceInit . . . . . . . . . . . . . . . . . . . . . . . 179
6.4.49 istrans . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.4.50 packfC2C . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.4.51 packfR2C . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.4.52 setComplex . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.4.53 setReal . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.4.54 setSize . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.4.55 unpackfC2C . . . . . . . . . . . . . . . . . . . . . . . . 186
6.4.56 unpackfC2R . . . . . . . . . . . . . . . . . . . . . . . . 187

Bibliography 188

7 GPUmat Guide Version 0.1. Copyright gp-you.ch.


Chapter 1
Introduction

GPUmat enables Matlab code to run on the Graphical Processing Unit


(GPU). The following is a summary of GPUmat most important features:

• GPU computational power can be easily accessed from Matlab without


any GPU knowledge.

• Matlab code is directly executed on the GPU. The execution is trans-


parent to the user.

• GPUmat speeds up Matlab functions by using the GPU multi-processor


architecture.

• Existing Matlab code can be ported and executed on GPUs with few
modifications.

• GPU resources are accessed using Matlab scripting language. The fast
code protyping capability of the scripting language is combined with
the fast code execution on the GPU.

• GPUmat can be used as a Source Development Kit to create new func-


tions and extend the library functionality.

1.1 About GPUs


Although GPUs have been traditionally used only for computer graphics,
a recent technique called GPGPU (General-purpose computing on graph-
ics processing units) allows the GPUs to perform numerical computations
usually handled by CPU. The advantage of using GPUs for general purpose
computation is the performance speed up that can be achieved due to the
parallel architecture of these devices.

8
CHAPTER 1. Introduction
1.2. SYSTEM REQUIREMENTS

One of the most promising GPGPU technologies is called CUDA SDK [1],
developed by NVIDIA. For further information about CUDA, GPGPU and
related topics please check [2] [3].

1.2 System requirements


GPUmat was tested under Windows and Linux with Matlab ver. R2007a
or newer installed. CUDA should be installed on the system. Follow the
instructions on NVIDIA’s CUDA website [2] to download and install the
software.

1.3 Credits and licensing


Copyright gp-you.ch. GPUmat is distribuited as Freeware. By using GPUmat,
you accept all the terms and conditions specified in the license.txt file in the
GPUmat installation folder. Please send any suggestions, questions or bug
report to gp-you@gp-you.ch.

1.4 How to install


To install the library unpack the downloaded package and follow these steps:
• STEP1: start Matlab and change directory to the folder where the
library was unpacked.

• STEP2: start GPUmat using the GPUstart command.

• STEP3 (optional but suggested): add the library path to the Matlab
path by using the "File->Set Path" menu. The Matlab documenta-
tion describes how to add a new path. This step is not mandatory if the
GPUstart command is started from the directory where the library
was unpacked.
The GPUstart command should generate the following output in your
Matlab command window:

>> GPUstart
Starting GPU
- CUDA compute capability x.x
...done

9 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 1. Introduction
1.5. TERMINOLOGY

If you get the following error, then GPUstart command was not found in
the Matlab path. Try again the installation steps from STEP1 to STEP3.

>> GPUstart
??? Undefined function or variable ’GPUstart’.

The GPU environment will not correctly work if a CUDA compatible


graphic card and CUDA toolkit are not installed on the system, and you will
probably get an error as follows:

Starting GPU
??? Invalid MEX-file
...
The specified module could not be found.

Error in ==> GPUstart at xx

1.5 Terminology
The following is a summary of common terms and concepts used in this
manual:

• GPU: Graphics Processing Unit. It is the graphic card. We assume


that the GPU is compatible with NVIDIA’s CUDA SDK.

• HOST: The computer where the GPU is installed.

• CPU: The Central Processing Unit installed on the HOST.

• GPU memory: the memory available on the GPU.

• CPU memory: the memory available on the HOST.

• CUDA capable GPU: a GPU compatible with NVIDIA CUDA SDK.

1.6 Documentation overview


This manual is organized as follows:

10 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 1. Introduction
1.6. DOCUMENTATION OVERVIEW

• Quick start: describes GPUmat basic concepts by using simple exam-


ples.

• Overview: describes GPUmat high level functions.

• Developer’s section: describes low-level functions and how to imple-


ment new functions in GPUmat.

The first two chapters contains enough information to understand the basic
concepts of the library and are intended for users with at least some ex-
perience with Matlab. Chapter 4 is intended for users familiar with GPU
programming concepts, in particular with the CUDA SDK. The Function
reference can be found in Chapter 6.

11 GPUmat Guide Version 0.1. Copyright gp-you.ch.


Chapter 2
Quick start

The most important concepts about GPUmat are the following:


• GPU variables are allocated from Matlab using the GPUsingle class,
which corresponds to a single precision floating point variable. Cur-
rently GPUmat supports only single precision floating point variables
and it will be extended to the double precision in the future.
• A GPUsingle is effectively allocated on the GPU memory and it is
available from Matlab as any other Matlab variable. In this manual
we will call GPU variable a GPUsingle and we distinguish it from a
common Matlab variable that is allocated on the CPU memory.
• GPUmat defines functions and operators that are called from Matlab
and executed on the GPU. These functions work on data allocated on
the GPU memory using the GPUsingle class.
The next example creates two single precision Matlab variables Ah and A,
allocated on the CPU memory and on the GPU memory respectively. Ah is
used to initialize A.

Ah = single(rand(100,100)); % Ah in on CPU memory


A = GPUsingle(Ah); % A is on GPU memory

In the above code the function single in used to create the single precision
Matlab array Ah, and similarly the GPUsingle function is used to create a
single precision GPU variable. If a double precision Matlab array is used to
initialize a GPUsingle variable, it is converted to a single precision variable
resulting in a loss of precision:

Ah = rand(100,100); % Ah in on CPU memory, double precision


A = GPUsingle(Ah); % A is on GPU memory, single precision

12
CHAPTER 2. Quick start

During the initialization of the GPU variable A, the data in the Matlab array
Ah is copied from the CPU memory to the GPU memory. The data transfer
is transparent to the user.
There are several ways to create a GPUsingle, as explained in Section 3.2.
The command

A = colon(0,2,6,GPUsingle) % A is on GPU memory

results in

A =
0 2 4 6

Using the colon function to create a vector with arbitrary real increments
between the elements,

A = colon(0,.1,.5,GPUsingle) % A is on GPU memory

results in

A =
0 0.1000 0.2000 0.3000 0.4000 0.5000

In the following example, the function single is used to convert the GPU
variable C into the Matlab variable Ch. Every time a GPU variable is con-
verted into a Matlab variable, the data is copied from GPU memory to CPU
memory.

Ah = single(rand(100,100)); % Ah in on CPU memory


A = GPUsingle(Ah); % Create GPU variable A
Ch = single(A); % convert C (GPU) to Ch (CPU)

The following example shows:

• The creation of the GPU variable A, initialized with Matlab array Ah.

• The calculation of exp(A). The execution is on GPU and the result is


stored on the GPU variable C.

• The conversion of the result C into the Matlab variable Ch.

13 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 2. Quick start
2.1. MATRIX ADDITION EXAMPLE

Ah = single(rand(100,100)); % Ah in on CPU memory


A = GPUsingle(Ah); % Create A (GPU) initialized with Ah (CPU)
C = exp(A); % exp(A) performed on GPU
Ch = single(C); % convert C (GPU) to Ch (CPU)

To visualize the contents of a GPUsingle, type the name of the variable on


the Matlab command window:

A = GPUsingle(rand(5));

ans =

0.8147 0.0975 0.1576 0.1419 0.6557


0.9058 0.2785 0.9706 0.4218 0.0357
0.1270 0.5469 0.9572 0.9157 0.8491
0.9134 0.9575 0.4854 0.7922 0.9340
0.6324 0.9649 0.8003 0.9595 0.6787

Single precision REAL GPU type.

Next sections show different examples: matrix addition, matrix multiplica-


tion and FFT calculation.

2.1 Matrix addition example


The following code can be found in the QuickStart.m file located in the
examples folder, and it shows how to port existing Matlab code and run it
on the GPU. The example creates two variables A and B, add them and store
the result into the variable C. The original Matlab code is the following:

A = single(rand(100)); % A is on CPU memory


B = single(rand(100)); % B is on CPU memory
C = A+B; % executed on CPU. C is on CPU memory

The ported GPUmat code is the following:

14 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 2. Quick start
2.1. MATRIX ADDITION EXAMPLE

A = GPUsingle(rand(100)); % A is on GPU memory


B = GPUsingle(rand(100)); % B is on GPU memory
C = A+B; % executed on GPU. C is on GPU memory

Please note the difference between the original code and the modified code.
Every Matlab variable has been converted to the GPUsingle class: "A =
rand(100)" becomes "A = GPUsingle(rand(100))".
Any operation on GPUsingle variables generates a GPUsingle, i.e. C
(in the modified code) is also a GPUsingle. Functions involving GPUsingle
variables, like A + B in the above example, are executed on the GPU. To
convert the GPU variables A, B and C into the Matlab variables Ah, Bh and
Ch use the function single, as follows:

A = GPUsingle(rand(100)); % A is on GPU memory


B = GPUsingle(rand(100)); % B is on GPU memory
C = A+B; % executed on GPU. C is on GPU memory

Ah = single(A); %Ah is on HOST, A is on GPU


Bh = single(B); %Bh is on HOST, B is on GPU
Ch = single(C); %Ch is on HOST, C is on GPU

The following code shows a different way to initialize the arrays A and B by
using the colon function. The original Matlab code is the following:

A = single(colon(0,1,1000)); % A is on CPU memory


B = single(colon(0,1,1000)); % B is on CPU memory
C = A+B; % executed on CPU. C is on CPU memory

The ported GPUmat code is the following:

A = colon(0,1,1000,GPUsingle); % A is on GPU memory


B = colon(0,1,1000,GPUsingle); % B is on GPU memory
C = A+B; % executed on GPU. C is on GPU memory

The Matlab expression

A = single(colon(0,1,1000));

is equivalent to

15 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 2. Quick start
2.2. MATRIX MULTIPLICATION EXAMPLE

A = single([0:1:1000]);

and creates a vector with single precision elements having values from 0 to
1000.
Element-by-element operations, such as the the matrix addition A + B,
are highly optimized for the GPU. It is suggested to use this kind of opera-
tions as explained in Section 3.8.

2.2 Matrix multiplication example


This section describes the code to perform the following tasks:

• Create A and B on the GPU memory.

• Multiply A and B and store the results in C.

• Convert the result C into the Matlab variable Ch.

A = GPUsingle(rand(100,100)); % A is on GPU memory


B = GPUsingle(rand(100,100)); % B is on GPU memory
C = A*B; % executed on GPU, C is on GPU memory
Ch = single(C); % Ch is on CPU memory

The equivalent code on the CPU is the following:

A = single(rand(100,100)); % A is on CPU memory


B = single(rand(100,100)); % B is on CPU memory
C = A*B; % executed on CPU, C is on CPU memory

2.3 FFT calculation example


This section describes the code to perform the following tasks:

• Create two arrays A and B on the GPU.

• Calculate 1D FFT of A.

• Calculate 2D FFT of B.

16 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 2. Quick start
2.4. PERFORMANCE ANALISYS

• Transfer results from GPU into Matlab variables Ah and Bh.

A = GPUsingle(rand(1,100)); % GPU
B = GPUsingle(rand(100,100)); % GPU

%% 1D FFT
FFT_A = fft(A); % executed on GPU

%% 2D FFT
FFT_B = fft2(B); % executed on GPU

%% Convert GPU into Matlab variables


Ah = single(A); % Ah is on HOST
Bh = single(B); % Bh is on HOST
FFT_Ah = single(FFT_A); % FFT_Ah is on HOST
FFT_Bh = single(FFT_B); % FFT_Bh is on HOST

The equivalent code that executes above operations entirely on the CPU is
the following:

A = single(rand(1,100)); % CPU
B = single(rand(100,100)); % CPU

%% 1D FFT
FFT_A = fft(A); % executed on CPU

%% 2D FFT
FFT_B = fft2(B); % executed on CPU

2.4 Performance analisys


The easiest way to evaluate the performance in Matlab are the tic and toc
commands, as follows:

A = rand(1000,1000); % A is on CPU
B = rand(1000,1000); % B is on CPU
tic;A.*B;toc; % executed on CPU

17 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 2. Quick start
2.4. PERFORMANCE ANALISYS

The GPU code performance can be evaluated in a similar way by using tic,
toc and the GPUsync command, as follows:

A = GPUsingle(rand(1000,1000));
B = GPUsingle(rand(1000,1000));
tic;A.*B;GPUsync;toc;

The following example shows a simple Matlab script to compare the ex-
ecution time of the element-by-element multiplication between two matrices
A and B on the GPU and on the CPU.

N = 100:100:4000;
timecpu = zeros(1,length(N));
timegpu = zeros(1,length(N));

index=1;
for i=N
Ah = single(rand(i)); % CPU
A = GPUsingle(Ah); % GPU

%% Execution on GPU
tic;
A.*A;
GPUsync;
timegpu(index) = toc;

%% Execution on CPU
tic;
Ah.*Ah;
timecpu(index) = toc;

% increase index
index = index +1;
end

The above code calculates the two vectors timecpu and timegpu that can be
used to evaluate the speed-up between the GPU and the CPU as follows:

18 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 2. Quick start
2.4. PERFORMANCE ANALISYS

speedup = timecpu./timegpu

19 GPUmat Guide Version 0.1. Copyright gp-you.ch.


Chapter 3
GPUmat overview

GPUmat functions are grouped into high level and low level functions. High
level functions can be used in a similar way as existing Matlab functions, while
to use low level functions the user needs some experience in GPU program-
ming. For example, low level functions can directly manage GPU memory,
which is automatically handled on high level functions. Low level functions
can also directly access CUDA libraries such as CUBLAS and CUFFT. The
detailed list of high level and low level functions can be found in Chapter 6.
GPUmat can be used in the following ways:
• As any other Matlab toolbox by using high level functions. This is the
easiest way to use GPUmat.
• As a GPU Source Development Kit, in order to integrate functions
that are not available in the library, by using both low and high level
functions.
This chapter describes how to use the GPUmat high level functions. Users
can find further information about low level functions in Chapter 4. The full
function reference is in Chapter 6. This chapter describes the following topics:
• Starting the GPU environment
• Creating a GPU variable
• Performing calculations on the GPU
• Converting a GPU variable into a Matlab variable
• GPUmat functions
• GPU memory management
• Compatibility between Matlab and GPUmat
• GPUmat code performance

20
CHAPTER 3. GPUmat overview
3.1. STARTING THE GPU ENVIRONMENT

3.1 Starting the GPU environment

Name Description
GPUstart Starts GPU environment and loads the
required library components
GPUinfo Prints information about available
CUDA capable GPUs
GPUdeviceInit Initializes a CUDA capable GPU de-
vice

Table 3.1: GPU management functions.

Table 3.1 shows functions used to start GPUmat and to manage the GPU.
The GPUstart command is used to start GPUmat. The system might have
more than one GPU installed. By default GPUstart selects the first available
GPU device. The command GPUinfo prints information about installed
GPUs:

GPUinfo
Found 1 devices
* Device N. 0
Compute capability is 1.1
Total memory is 255.6875MB
Mult. processors = 2

It is possible to select a different GPU by using the GPUdeviceInit com-


mand.

3.2 Creating a GPU variable


A GPU variable is a Matlab variable that is allocated on GPU memory
and is created using the Matlab class GPUsingle. The GPUsingle class
is equivalent to the single precision real/complex type in Matlab. Double
precision type is supported by CUDA and some GPU devices, but is not
currently implemented in GPUmat.
Functions to create a GPUsingle variable are shown in table 3.2, and ex-
plained with more details in the next paragraphs. It is important to know

21 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.2. CREATING A GPU VARIABLE

that a memory transfer between GPU and CPU is required if the GPU vari-
able is initialized with a Matlab array. A memory transfer is a time consuming
task and might reduce the performance of the code.

Function Description
A = GPUsingle(Ah) Creates a GPU array A initial-
ized with the Matlab array Ah.
Requires GPU-CPU memory
transfer.
A = zeros(size, GPUsingle) Creates a GPU array initialized
with zeros.
A = ones(size, GPUsingle) Creates a GPU array initialized
with ones.
A = colon(begin, spacing, A = colon(begin, spacing,
end, GPUsingle) end, GPUsingle) creates a regu-
larly spaced GPU vector A with
values in the range [begin:end].
C = vertcat(A,B) or C = [A;B] Vertical concatenation. Can be
applied to more than 2 GPU vec-
tors.

Table 3.2: Functions used to create GPU variables.

A = GPUsingle(Ah)
Creates a GPU single precision variable A initialized with the
Matlab array Ah. A has the same properties as Ah, such as
the size and the number of elements. Requires GPU-CPU
memory transfer.
Example:

Ah = single(rand(1000));% Ah is a Matlab variable


A = GPUsingle(Ah); % GPU variable

If the GPU variable is initialized with a double precision Matlab array Ah,

22 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.2. CREATING A GPU VARIABLE

Ah = rand(1000); % Ah is a double precision Matlab variable


A = GPUsingle(Ah);% GPU variable

there will be a loss of precision in the conversion between double and single
precision.

23 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.2. CREATING A GPU VARIABLE

A = colon(begin, spacing, end, GPUsingle)


Creates a GPU single precision variable A with values between
begin and end and spaced by spacing. This command is similar
to the Matlab colon command.
Example:

A = colon(0,2,1000,GPUsingle); % A is a GPU variable

The syntax to create a Matlab variable is very similar to the above code:

Ah = colon(0,2,1000); % A is a CPU variable

Existing variables can be efficiently used also to create others. The follow-
ing example shows how to create a complex GPU variable using the colon
function:

A = colon(0,2,6,GPUsingle); % A is a real GPU variable


B = sqrt(-1)*A; % B is a complex GPU variable
C = 1 + B % All real elements of B are set to 1

The previous commands results in

>> A
ans =
0 2 4 6

Single precision REAL GPU type.


>> B
ans =
0 0 + 2.0000i 0 + 4.0000i 0 + 6.0000i

Single precision COMPLEX GPU type.


>> C
ans =
1.0000 1.0000 + 2.0000i 1.0000 + 4.0000i 1.0000 + 6.0000i

Using the function colon is a very efficient way to create a GPU variable
because array values are directly created on the GPU memory without any

24 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.3. PERFORMING CALCULATIONS ON THE GPU

data transfer between CPU and GPU.

A = zeros(size, GPUsingle)
Has the same behavior as Matlab zeros function. Creates a GPU
array with zeros.
Example:

A = zeros(1,1000,GPUsingle); % A is a GPU variable

A = ones(size, GPUsingle)
Has the same behavior as Matlab ones function. Creates a GPU
array with ones.
Example:

A = ones(1,1000,GPUsingle); % A is a GPU variable

Find some examples of GPU variables creation in the file CreateGPUVari-


ables.m located in the example folder.

3.3 Performing calculations on the GPU


The following example explains the mechanism that allows Matlab functions
to be executed on the GPU.

A = GPUsingle(rand(10)); % A is on GPU
B = exp(A) % exp calculated on GPU

The exp function in the above code that is executed by Matlab is the one
implemented in GPUmat and not the built-in function. Matlab uses the
GPUmat function because the argument of the exp is a GPUsingle type.
The following example shows similar code executed on CPU:

A = single(rand(10)); % A is on CPU

25 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.4. PORTING EXISTING MATLAB CODE

B = exp(A) % exp calculated on CPU

From the above example we conclude that:


• Functions involving the GPUsingle type are executed on GPU by using
GPUmat functions.
• Not every Matlab function is defined in GPUmat. This means that not
every Matlab code is executed on the GPU just by using the GPUsingle
type, but only the Matlab code that uses functions defined in GPUmat
(The complete function reference can be found in Chapter 6).
GPUmat implements also Matlab operators, such as +, -, .*. It means
that algebraic expressions such as A + B are also defined in GPUmat and
executed on the GPU. GPUsingle operators are shown on table 3.8. Here is
an example:

A = GPUsingle(rand(100,100)); %GPU variable


B = A/5 + A.*A*2 + 1; %run on GPU
C = A < B; %run on GPU

% Same operation performed on CPU


A = single(A); %CPU variable
B = A/5 + A.*A*2 + 1; %run on CPU
C = A < B; %run on CPU

3.4 Porting existing Matlab code


To port existing Matlab code, Matlab variables have to be converted to the
GPUsingle class. The easiest way to do it is to use the GPUsingle initialized
with the existing Matlab variable, but this is not the most efficient approach
because it involves a memory transfer between CPU and GPU. Here is an
example:

Ah = [0:10:1000]; % Ah is on CPU
A = GPUsingle(Ah); % A is on GPU

The above code can be written more efficiently using the colon function, as
follows:

26 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.5. CONVERTING A GPU VARIABLE INTO A MATLAB VARIABLE

Name Description
a + b Binary addition
a - b Binary subtraction
-a Unary minus
a.*b Element-wise multiplication
a*b Matrix multiplication
a./b Right element-wise division
a./ b Left element-wise division
a.^b Element-wise power
a < b Less than
a > b Greater than
a <= b Less than or equal to
a >= b Greater than or equal to
a ~= b Not equal to
a == b Equality
a & b Logical AND
a | b Logical OR
~a Logical NOT
a’ Complex conjugate trans-
pose
a.’ Matrix transpose
Table 3.8: Operators defined for GPUsingle type

A = colon(0,10,1000,GPUsingle); % A is on GPU

3.5 Converting a GPU variable into a Matlab


variable
Although the GPUsingle variable is available from Matlab, its content is
stored on the GPU memory. Converting a GPU variable into a Matlab vari-
able means that we transfer the content of the variable from the GPU to
the CPU memory. The following example describes how to convert a GPU
variable A into a Matlab array Ah, by using the function single:

27 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.6. GPUMAT FUNCTIONS

A = GPUsingle(Ah); %A is on GPU memory


Ah = single(A); %Ah is on HOST memory

To visualize the content of a GPU variable on the Matlab command window,


just type its name as any other Matlab array:

A = GPUsingle(rand(5)); % A is on GPU

ans =

0.8147 0.0975 0.1576 0.1419 0.6557


0.9058 0.2785 0.9706 0.4218 0.0357
0.1270 0.5469 0.9572 0.9157 0.8491
0.9134 0.9575 0.4854 0.7922 0.9340
0.6324 0.9649 0.8003 0.9595 0.6787

Single precision REAL GPU type.

Every time the content of a GPUsingle is read in Matlab, the system performs
a memory transfer from the GPU to the CPU. The same happens when
a GPUsingle is created and initialized using a Matlab array. Because of
the limited memory bandwidth between the HOST and the GPU, the data
transfer between CPU and GPU may be time consuming and therefore its
usage should be limited.

3.6 GPUmat functions


GPUmat currently implements only a subset of Matlab functions. The most
important operators and numerical functions are implemented and users with
programming experience can extend the library by using low level and high
level functions that are available and documented in the library. Table 3.9
shows a short summary of implemented functions and operators.

28 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.7. GPU MEMORY MANAGEMENT

Implemented functions Example

Matlab operators A = GPUsingle(rand(1000));


(A*B, A-B, A.*B, B = GPUsingle(rand(1000));
A+B, etc.) C = A + B;

Numerical functions A = GPUsingle(rand(1000));


(exp, sqrt, log, etc.) B = GPUsingle(rand(1000));
C = exp(A);
D = sqrt(C) + B;

Fast Fourier Transform RE = GPUsingle(rand(1000));


IM = i*GPUsingle(rand(1000));
C = fft(RE + IM);

Table 3.9: Some GPUmat functions.

3.7 GPU memory management


The memory is managed automatically by GPUmat. Any GPU variable is
automatically destroyed following exactly the same life-cycle as any other
Matlab variable. Nevertheless, the GPU memory is limited and eventually
the user can manually remove GPU variables by using the Matlab built-in
command clear. Table 3.10 shows functions to manage the GPU memory.

Name Description
clear Matlab built-in command, removes the
specified variables
GPUmem Returns available GPU memory in
bytes

Table 3.10: Functions used to manage the GPU memory

The following code shows a typical situation where the GPU memory is
not enough, and some variables must be manually removed:

29 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.8. CODING GUIDELINES

A = GPUsingle(rand(6000,3000)); % A is on GPU
B = GPUsingle(rand(6000,3000)); % B is on GPU
C = GPUsingle(rand(6000,3000)); % C is on GPU
Device memory allocation error.
Available memory is 65274 KB, required 70312 KB

In the above example, it is not possible to allocate the variable C because


the GPU memory is not enough (see the error message). In this case we
must delete other variable, such as A or B. If we need also A and B, then our
GPU card has not enough memory to manage all the variables. To delete a
variable (for example A), use the clear command, as follows:

clear A

Check the file MemoryExample.m, located in the example folder, to under-


stand how to use functions for memory management. The file performs the
following actions:

• Displays the GPU available memory.

• Creates a GPUsingle variable on the GPU workspace and displays the


available free memory.

• Cleans up the GPU variable and displays once more the available GPU
memory.

A very useful Matlab command is the whos, which can be used to check
how many GPUsingle variables are on the Matlab workspace. The following
Matlab output shows the result of the whos command and the presence of
a GPUsingle A on the Matlab workspace:

>> whos
Name Size Bytes Class Attributes

A 1x1000000 924 GPUsingle


ans 1x1 4 uint32

3.8 Coding guidelines


To maximize the execution performance keep in mind the following points:

30 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.8. CODING GUIDELINES

• Memory Transfers. Avoid excessive memory transfers between GPU/CPU


memory.

• Vectorized operations and for-loops. The best performance in both


Matlab and GPUmat can be achieved by using vectorized operations
and avoiding for-loops. More information can be found at the following
link: Matlab Code Vectorization Guide

Next section explains previous points with more details.

3.8.1 Memory transfers


The most time consuming task is the memory transfer from/to GPU, such as
initializing a GPUsingle variable with a Matlab array. Here is an example:

Ah = rand(1000); % Ah is on CPU memory


A = GPUsingle(Ah); % A is on GPU memory

In the above code, the variable Ah is used to initialize the GPU variable A,
which means that data is transferred from the CPU to the GPU memory.
Vice versa, when a GPU variable is converted into a Matlab variable there is
a memory transfer from the GPU to the CPU:

A = GPUsingle(rand(1000)); % A is on GPU memory


Ah = single(A); % Ah is on CPU memory

The fastest way to initialize or create a GPUsingle is to use existing variables


on the GPU memory to create other GPU variables, or to use functions
such as zeros or colon which directly create values on the GPU without
transferring data from Matlab. Please check Section 3.2 for more information
about creating new GPU variables with GPUmat.

3.8.2 Vectorized code and for-loops


Another way to improve the code performance is to avoid for loops by using
vectorized operations. For example:

for i=1:1e6
A = rand(3,3);
B = rand(3,3);
C = A.*B;

31 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.8. CODING GUIDELINES

%% do something with C
end

The above code can be executed as-is on the GPU by converting A and B
to GPUsingle, as follows:

for i=1:1e6
A = GPUsingle(rand(3,3));
B = GPUsingle(rand(3,3));
C = A.*B;
%% do something with C
end

Nevertheless, matrix operations can be used instead of the for-loop by


creating two arrays with 3 x 3e6 elements and multiplying them element-
by-element:

A = GPUsingle(rand(3,3e6)); % A is on GPU
B = GPUsingle(rand(3,3e6)); % B is on GPU
C = A.*B; % C is on GPU

The following Matlab code perform the matrix addition C = A + B using a


for-loop statement.

A = rand(100);
B = rand(100);
C = zeros(100);
for i=1:size(A,1)
for j=1:size(B,2)
C(i,j) = A(i,j) + B(i,j);
end
end

To port the code to the GPU, it is suggested to use the element-by-element


addition instead of using the for-loop:

A = GPUsingle(rand(100)); % A is on GPU
B = GPUsingle(rand(100)); % B is on GPU
C = A + B; % C is on GPU

32 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.8. CODING GUIDELINES

3.8.3 Matlab and GPUsingle variables


Operations and functions involving Matlab and GPUsingle variables at the
same time are not defined, except operations involving GPUsingle and Matlab
scalars. The following is an example:

Ah = rand(5); % Ah is on CPU
A = GPUsingle(rand(5));% A is on GPU
Bh = 1; % Bh is on CPU
Ah + A
Unknown operation + between ’double’ and ’GPUsingle’
A + Bh
ans =

1.8147 1.0975 1.1576 1.1419 1.6557


1.9058 1.2785 1.9706 1.4218 1.0357
1.1270 1.5469 1.9572 1.9157 1.8491
1.9134 1.9575 1.4854 1.7922 1.9340
1.6324 1.9649 1.8003 1.9595 1.6787

Single precision REAL GPU type.

Adding Ah and A generates an error, whereas adding A and Bh is possible


because Bh is a scalar. A can be converted into a Matlab variable and added
to Ah or in a similar way Ah can be converted into a GPU variable and
added to A, as follows:

Ah = rand(5);
A = GPUsingle(rand(5));

Ah + single(A); % A converted into Matlab

Ch = single(A); % A converted into Matlab Ch


Ah + Ch; % adding Ah and Ch

D = GPUsingle(Ah); % Ah converted into the GPUsingle D


A + D; % adding A and D

A + GPUsingle(Ah); % A added directly to GPUsingle(Ah)

33 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 3. GPUmat overview
3.9. PERFORMANCE ANALYSIS

3.9 Performance analysis


The easiest way to evaluate the performance in Matlab are the tic and toc
commands, as follows:

A = rand(1000,1000); % A is on CPU
B = rand(1000,1000); % B is on CPU
tic;A.*B;toc; % executed on CPU

The GPU code performance can be evaluated in a similar way by using tic,
toc and the GPUsync command, as follows:

A = GPUsingle(rand(1000,1000));
B = GPUsingle(rand(1000,1000));
tic;A.*B;GPUsync;toc;

The GPUsync command is used to synchronize the GPU code. It means


that Matlab waits until the GPU execution is completed. The execution of
the GPU code is asynchronous, i.e. the control is returned to Matlab after
calling the GPUmat function. But this does not necessarily mean that the
GPU has finished its task. To force Matlab to wait until the GPU has finished
his task, the GPUsync command must be used. Here is an example:

A = GPUsingle(rand(1000,1000));
B = GPUsingle(rand(1000,1000));
tic;A.*B;GPUsync;toc;
Elapsed time is 0.010231 seconds.
tic;A.*B;toc;
Elapsed time is 0.003808 seconds.

Asynchronous execution is entirely managed by GPUmat and is transparent


to the user.

34 GPUmat Guide Version 0.1. Copyright gp-you.ch.


Chapter 4
Developer’s section

This chapter explains how to use GPUmat low level functions. Low level
functions can be used for the following purpose:

• To develop new GPUsingle class methods and functions.

• To access CUDA libraries (CUBLAS, CUFFT, CUDA run-time).

• To directly access GPU memory by using low level memory manage-


ment functions.

4.1 The GPUsingle class


The GPUsingle class is used to create and initialize GPU variables, either
using the empty constructor or using and existing Matlab variable. Here is
an example:

Ah = rand(1000); % Matlab variable


A = GPUsingle(Ah); % GPU variable
B = GPUsingle(rand(100)); % GPU variable

The GPUsingle class implements a destructor, which frees the GPU memory
that is not used anymore. The life-time of a GPUsingle is the same as any
other Matlab variable. In the following example, the second assignment to
A automatically deletes the previously created variable and frees the corre-
sponding GPU memory occupied by an array with size=100x100:

A = GPUsingle(rand(100));
A = GPUsingle(rand(10));

35
CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS

In the following example we introduce some of the properties of the GPUs-


ingle class with a simple example: the low level function cublasGetVector
is used to retrieve the content of the GPUsingle A into the Matlab variable
Ah.

A = GPUsingle([1 2; 3 4]);
% Ah should be single precision, because
% A is single precision
Ah = single(zeros(1,numel(A)));
[status Ah] = cublasGetVector (numel(A), ...
getSizeOf(A), getPtr(A), 1, Ah, 1);
cublasCheckStatus( status, ...
’Unable to retrieve variable values from GPU.’);
Ah

ans =

1 3 2 4

In the result Ah the data is stored using column-major storage, the same
format as Matlab and Fortran. Complex numbers are stored interleaving
in memory imaginary and real part values, as explained in section 4.3. In
the above example we use the CUBLAS function ([4]) cublasGetVector to
transfer the data from the GPU to the CPU memory. The function numel is
used to get the number of elements in A. The function getSizeOf returns the
size of a single element of A. Finally the function getPtr returns the pointer
to the GPU memory.

36 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS

4.1.1 GPUsingle constructor


Constructor summary
A = GPUsingle(Ah) Creates the GPUsingle A initialized
with existing Matlab array Ah
A = GPUsingle() Creates an empty GPUsingle. Using
this constructor GPU memory is not al-
located.

Constructor details
A = GPUsingle(Ah)
Creates a GPU variable A initialized with the Matlab array Ah.
A has the same properties as Ah, such as the size and the number
of elements.
Example:

Ah = rand(1000); % Matlab variable


A = GPUsingle(Ah); % GPU variable
B = GPUsingle(rand(100)); % GPU variable

A = GPUsingle()
Creates an empty GPU variable. GPU memory is not automat-
ically allocated and the following steps must be performed to
allocate the memory:

• step1: initialize the size of the array by using setSize.

• step2: set the type of the GPUsingle by using setComplex


or setReal if the stored data is complex or real respectively.

• step3: use the GPUallocVector function. Please note that


this function should be used only after step1 and step2.

There is no memory transfer between the CPU and the GPU


when using the empty constructor .

37 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS

Example:

A = GPUsingle(); %empty constructor


setSize(A,[100 100]); %set variable size
setReal(A); %set variable as real
GPUallocVector(A); %allocate on GPU memory

A = GPUsingle(); %empty constructor


setSize(A,[10 10]); %set variable size
setComplex(A); %set variable as complex
GPUallocVector(A); %allocate on GPU memory

38 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS

4.1.2 GPUsingle properties


Fields summary
GPUPTR Pointer to the GPU memory
COMPLEX Complex type flag
TRANS Transposed flag
SIZE Variable size
SIZEOF Datatype size (similar to sizeof in C)

Property details
GPUPTR
GPUPTR is the pointer to the GPU memory region.
The pointer is indirectly set by using GPUallocVector.
Its value can be retrieved by using the getPtr function.
Example:

N = 10;
A = GPUsingle(rand(1,N));
Isamin = cublasIsamin(N, getPtr(A), 1);

COMPLEX
COMPLEX is a flag and defines a complex GPUsingle.
Check section 4.3 for further information about complex
numbers representation. It is set using setComplex and
reset using setReal. Use iscomplex to check its value.
The flag must be set using setComplex before allocating
the variable memory using GPUallocVector. The flag
has no effect if set after calling GPUallocVector.
Example:

A = GPUsingle(rand(5));
iscomplex(A)
A = GPUsingle(rand(5)+i*rand(5));
iscomplex(A)

39 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS

TRANS
TRANS is an internal flag. Use the function istrans to
check whether the flag is set or not. The flag is set to 1
to identify a matrix that is virtually transposed, which
means that values are not exchanged in memory. For
some operations, such as matrix-matrix multiplication
with CUBLAS functions, there is no need to effectively
transpose the matrix in memory (which is time consum-
ing). The high level function transpose(A) sets the
flag to 1, whereas the function transpose(A,1) is used
to effectively perform necessary memory operations to
transpose array elements. High level functions treat cor-
rectly a GPUsingle with TRANS set to 1, but low level
functions do not. The following example shows how the
values of the array A are stored in memory. Please note
that data is stored in column-major format.
Example:

A = GPUsingle([1 2; 3 4]);
A = transpose(A) % A = A.’ is the same
Ah = single(zeros(1,numel(A)));
[status Ah]= cublasGetVector (numel(A), ...
getSizeOf(A), getPtr(A), ...
1, Ah, 1);
cublasCheckStatus( status, ’Memory error.’);
Ah

ans =

1 3
2 4

Single precision REAL GPU type.

ans =

1 3 2 4

40 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS

SIZE
SIZE stores the variable size. The functions to modify
it and to get its value are setSize and size respectively.
The SIZE must be defined before using GPUallocVector.
Modifying the SIZE on initialized variables has no effect
on memory values.
Example:

A = GPUsingle();
setSize(A,[100 100]);
GPUallocVector(A);
size(A)

4.1.3 GPUsingle methods


Set and Get methods summary
getPtr(A) Get GPUPTR
setSize(A,size) Set SIZE
size(A) Get SIZE
setReal(A) Set REAL type
isreal(A) Returns 1 if REAL
setComplex(A) Set COMPLEX type
iscomplex(A) Returns 1 if COMPLEX
istrans(A) Get TRANS flag

41 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.2. LOW LEVEL GPU MEMORY MANAGEMENT

4.2 Low level GPU memory management


Memory management using high level functions is explained in section 3.7.
Memory management methods summary
GPUallocVector Allocates a variable on GPU memory.

GPU variables are managed in the following way:

• The GPUsingle implements a destructor which takes care of clearing


unused memory regions. There is no need to explicitly clean up the
GPU memory. If necessary it can be done using the Matlab clear com-
mand.

• If the user creates a Matlab pointer to the GPU memory using low level
functions, the memory is not automatically cleaned when the variable
is not used anymore. In this case the user must manually clean the
GPU memory.

Above concepts are explained in next sections.

4.2.1 Memory management using the GPUsingle class


The following code shows how to allocate and delete a GPUsingle.

A = GPUsingle(rand(100,100));
clear A;

B = GPUsingle(); % creates empty GPUsingle


setReal(B); % REAL type
setSize(B,[100 100]); % must set GPUsingle size
GPUallocVector(B); % allocate GPU memory
clear B;

4.2.2 Memory management using low level functions


The following code shows how to allocate a variable with 100 single precision
floating point elements by using CUBLAS functions:

42 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.3. COMPLEX NUMBERS

% create a new pointer


GPUptr = 0;

% allocate using cublasAlloc


SIZE_OF_FLOAT = 4;
NUMEL = 100;
[status GPUptr]= cublasAlloc(NUMEL,SIZE_OF_FLOAT,GPUptr);
cublasCheckStatus( status, ’Device memory allocation error’);

The function cublasFree is used to free the memory:

status = cublasFree(GPUptr);
cublasCheckStatus( status, ’!!!! memory free error (GPUptr)’);

4.3 Complex numbers


A single precision complex number is represented with two floating point val-
ues, the real and imaginary part respectively. A complex vector is a sequence
of complex numbers, i.e. a sequence of interleaved real and imaginary values.
There are different methods to create a complex GPU variable:

• Initializing a GPUsingle with a Matlab complex number

• Using functions unpackfC2C and packfC2C (see function reference,


Chapter 6)

• Multiply a real number by the imaginary unit

Above points are explained in the following example:

% 1) Initialize a GPUsingle with a Matlab complex array

Gh = rand(10) + sqrt(-1)*rand(10); %Matlab complex variable


G = GPUsingle(Gh); %GPU single complex

% 2) Using unpackC2C to separate the values in A into


% B and C
A = GPUsingle([1 2 3 4 5] + sqrt(-1)*[6 7 8 9 10]);
B = GPUsingle(zeros(1,5));

43 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.4. CUBLAS FUNCTIONS

C = GPUsingle(zeros(1,5));
unpackfC2C(A, B, C);
single(B)
single(C)
unpackfC2R(A, B);
single(B)
single(C)

% 3) Multiply a real array by the imaginary unit

Gh = rand(10); % Matlab real variable


G = GPUsingle(Gh)*sqrt(-1); % sqrt(-1) gives imaginary unit

4.4 CUBLAS functions


The following code shows how to use low level CUBLAS functions using
GPUmat wrappers. The code can be found in the file simpleCUBLAS.m lo-
cated in the examples folder CUBLAS. Make sure that the GPU environment
was started using GPUstart before running the example.

function simpleCUBLAS
% This is the GPUmat translation of the code in the
% CUDA SDK projects called with the same name (simpleCUBLAS).
% The example shows how to access CUBLAS functions from GPUmat

SIZEOF_FLOAT = sizeoffloat();

%% Allocate HOST arrays and initialize with random numbers


N = 500;

h_A = single(rand(N));
h_B = single(rand(N));
h_C = single(rand(N));

%% Allocate GPU arrays


d_A = GPUsingle(h_A);
d_B = GPUsingle(h_B);

44 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.4. CUBLAS FUNCTIONS

d_C = GPUsingle(h_C);

% Although d_A was already initialized with h_A values, we can


% call cublasSetVector to do that again
status = cublasSetVector(N*N, SIZEOF_FLOAT, ...
h_A, 1, getPtr(d_A), 1);
cublasCheckStatus( status, ’!!!! device access error (write A)’);

% Calculate reference in Matlab


alpha = 2.0;
h_C_ref = alpha * h_A*h_B;

% Execute on GPU
cublasSgemm(’n’,’n’, N, N, N, alpha, getPtr(d_A), ...
N, getPtr(d_B), N, 0.0, getPtr(d_C), N);
status = cublasGetError();
cublasCheckStatus( status, ’!!!! kernel execution error.’);

% Copy results back to HOST


h_C = single(d_C);

compareArrays(h_C_ref, h_C, 5e-6);

% Clean up GPU memory


% THERE IS NO NEED TO CLEAN UP MEMORY
% NEVERTHELESS, IF NECESSARY, ALWAYS USE
% CLEAR WITH GPUSINGLE
clear d_A
clear d_B
clear d_C

end

GPUmat defines wrappers to CUBLAS functions. The list of these functions


can be found in the function reference under the category CUBLAS functions
(Chapter 6). Some examples can be found in the example folder CUBLAS. In
general CUBLAS wrappers have the same interface as the original CUBLAS
functions. When a CUBLAS function needs a pointer to a GPU variable A,
the pointer is obtained using getPtr(A). For example:

45 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.4. CUBLAS FUNCTIONS

A = GPUsingle(rand(1,100));
r = cublasIsamax(numel(A),getPtr(A),1)

The original declaration of the CUBLAS function cublasIsamax is:

int
cublasIsamax (int n, const float *x, int incx)

Note the mapping between variables in the above example:

int n -> numel(A)


const float *x -> getPtr(A)
int incx -> 1

The following code performs complex matrix-matrix multiplication using


cublasCgemm:

N = 10;
I = sqrt(-1);
A = GPUsingle(rand(N,N) + I*rand(N,N));
B = GPUsingle(rand(N,N) + I*rand(N,N));

% C needs to be complex as well, thats why we multiply by I


C = zeros(N,N,GPUsingle)*I;

% alpha is complex
alpha = 2.0+I*3.0;
beta = 0.0;

opA = ’n’;
opB = ’n’;

cublasCgemm(opA, opB, N, N, N, ...


alpha, getPtr(A), N, getPtr(B), ...
N, beta, getPtr(C), N);

status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

46 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.5. CUFFT FUNCTIONS

C_mat = alpha * single(A)*single(B);


compareArrays(C_mat, single(C), 1e-6);

The original declaration of the CUBLAS function cublasCgemm is:

void cublasCgemm (char transa, char transb, int m, int n, int k,


cuComplex alpha, const cuComplex *A, int lda,
const cuComplex *B, int ldb, cuComplex beta,
cuComplex *C, int ldc)

Please note the mapping between variables in the above example:

char transa -> ’n’


char transb -> ’n’
int m -> N
int n -> N
int k -> N
cuComplex alpha -> 2.0+I*3.0
const cuComplex *A -> getPtr(d_A)
int lda -> N
const cuComplex *B -> getPtr(d_B)
int ldb -> N
cuComplex beta -> 0.0
cuComplex *C -> getPtr(d_C)
int ldc -> N

Complex numbers are stored interleaving real and imaginary values on the
GPU (see section 4.3), the same format expected by the cublasCgemm func-
tion and other CUFFT functions. For a complete description of CUBLAS
functions check the CUDA CUBLAS manual. For a complete list of imple-
mented wrappers check the functions reference section (Chapter 6).

4.5 CUFFT functions


The following code shows how to call low level CUFFT functions using
GPUmat wrappers. The code can be found in the file simpleCUFFT.m lo-
cated in the examples folder CUFFT. Make sure that the GPU environment
was started using GPUstart before testing the example.

47 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 4. Developer’s section
4.5. CUFFT FUNCTIONS

%% CUFFT example

%% Allocate HOST arrays and initialize with random numbers


N = 512;

h_A = single(rand(1,N)+i*rand(1,N));

d_A = GPUsingle(h_A);
d_B = GPUsingle(h_A);

fftType = cufftType;
fftDir = cufftTransformDirections;

% FFT plan
plan = 0;
[status, plan] = cufftPlan1d(plan, numel(d_A), ...
fftType.CUFFT_C2C, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);

% Run GPU FFT


[status] = cufftExecC2C(plan, getPtr(d_A), getPtr(d_B), ...
fftDir.CUFFT_FORWARD);
cufftCheckStatus(status, ’Error in cufftExecC2C’);

% Run GPU IFFT


[status] = cufftExecC2C(plan, getPtr(d_B), getPtr(d_A), ...
fftDir.CUFFT_INVERSE);
cufftCheckStatus(status, ’Error in cufftExecC2C’);

% results should be scaled by 1/N if compared to CPU


h_B = 1/N*single(d_A);

[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);

48 GPUmat Guide Version 0.1. Copyright gp-you.ch.


Chapter 5
Frequently Asked Questions

5.1 What happens if GPUmat and Matlab vari-


ables are used together?
Operations and functions involving Matlab and GPUmat variables together
are not supported, with the exception of Matlab scalars (either real or com-
plex). Matlab scalars can be used together with GPUsingle without convert-
ing them to GPUsingle. In the following example we add a Matlab array Ah
to a GPUsingle array A:

A = GPUsingle(rand(2)); % GPU
Ah = rand(2); % CPU
A + Ah;
??? Error using ==> ...
Unknown operation + between ’GPUsingle’ and ’double’

Performing the same operation with a Matlab scalar does not generate any
error:

A = GPUsingle(rand(2)); % GPU
Ah = 1+2*i; % complex
A + Ah
ans =

1.3500 + 2.0000i 1.2511 + 2.0000i


1.1966 + 2.0000i 1.6160 + 2.0000i

49
CHAPTER 5. Frequently Asked Questions
5.2. IS ANY MATLAB FUNCTION EXECUTED ON GPU BY USING
GPUSINGLE?

The Matlab array Ah can be converted into a GPUsingle and added to A, as


follows:

A = GPUsingle(rand(2)); % GPU
Ah = rand(2); % CPU
A + GPUsingle(Ah)
ans =

1.0230 1.1167
1.2689 1.3425

Single precision REAL GPU type.

The main concept is that CPU and GPU variables are stored on different
memory regions, and if we want to do operations on both we have first to
transfer the CPU variable to the GPU or the other way around. Matlab
scalars are an exception, but the same doesn’t work with GPU scalars which
cannot be added directly to Matlab variables.

5.2 Is any Matlab function executed on GPU by


using GPUsingle?
Only GPUmat functions, which are a subset of the existing Matlab functions,
are executed on GPU. Find the complete function and operators list in Chap-
ter 6. If the user tries to use a function that is not defined in GPUmat an
error message is generated. For example, the function trapz is not defined
and Matlab output is the following:

%
A = GPUsingle([1 2; 3 4]);
trapz(A)
??? Error using ==> ...
...

GPUmat is currently under development and new functions will be added


in future version. The user has the possibility to define new functions as
explained in Chapter 4, using GPUmat as a Source Development Kit (SDK).

50 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 5. Frequently Asked Questions
5.3. WHAT OPERATIONS SHOULD I PERFORM ON THE GPU?

5.3 What operations should I perform on the


GPU?
The GPU is useful for computationally intensive operations, such as:

• Matrix-matrix multiplications

• Element-by-element operations such as the Matlab .* operator

• Fast Fourier Transform

The above operations should be performed on large arrays to efficiently


use the GPU. Coding guidelines can be found in Chapter 3.

51 GPUmat Guide Version 0.1. Copyright gp-you.ch.


Chapter 6
Function Reference

6.1 Functions - by category


6.1.1 GPU startup and management

Name Description
GPUdeviceInit Initializes a CUDA capable GPU device
GPUinfo Prints information about the GPU device
GPUstart Starts the GPU environment and loads re-
quired components

6.1.2 GPU variables management

Name Description
colon Colon
double Converts a GPU single precision variable into
a Matlab double precision variable
GPUsingle GPUsingle constructor
GPUsync Wait until all GPU operations are completed
ones GPU single precision ones array
setComplex Set a GPUsingle as complex
setReal Set a GPUsingle as real
setSize Set GPUsingle size
single Converts a GPU variable into a Matlab single
precision variable
zeros GPU single precision zeros array

52
CHAPTER 6. Function Reference
6.1. FUNCTIONS - BY CATEGORY

6.1.3 GPU memory management

Name Description
GPUallocVector Variable allocation on GPU memory
GPUmem Returns the free memory (bytes) on selected
GPU device

6.1.4 Numerical functions

Name Description
abs Absolute value
acos Inverse cosine
acosh Inverse hyperbolic cosine
and Logical AND
asin Inverse sine
asinh Inverse hyperbolic sine
atan Inverse tangent, result in radians
atanh Inverse hyperbolic tangent
ceil Round towards plus infinity
conj CONJ(X) is the complex conjugate of X
cos Cosine of argument in radians
cosh Hyperbolic cosine
ctranspose Complex conjugate transpose
eq Equal
exp Exponential
fft Discrete Fourier transform
fft2 Two-dimensional discrete Fourier Transform
floor Round towards minus infinity
ge Greater than or equal
gt Greater than
ifft Inverse discrete Fourier transform
ifft2 Two-dimensional inverse discrete Fourier
transform
ldivide Left array divide
le Less than or equal

53 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.1. FUNCTIONS - BY CATEGORY

log Natural logarithm


log10 Common (base 10) logarithm
log1p Compute log(1+z) accurately
log2 Base 2 logarithm and dissect floating point
number
lt Less than
minus Minus
mrdivide Slash or right matrix divide
mtimes Matrix multiply
ne Not equal
not Logical NOT
or Logical OR
plus Plus
power Array power
rdivide Right array divide
round Round towards nearest integer
sin Sine of argument in radians
sinh Hyperbolic sine
sqrt Square root
subsref Subscripted reference
sum Sum of elements
tan Tangent of argument in radians
tanh Hyperbolic tangent
times Array multiply
transpose Transpose
uminus Unary minus
vertcat Vertical concatenation

6.1.5 General information

Name Description
display Display GPU variable
getPtr Get GPUsingle pointer on GPU memory
getSizeOf Get the size of the GPU datatype (similar to
sizeof in C)
getType Get the type of the GPU variable

54 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.1. FUNCTIONS - BY CATEGORY

iscomplex True for complex array


isempty True for empty GPUsingle array
isreal True for real array
isscalar True if array is a scalar
istrans True if GPUsingle TRANS flag is set to 1
length Length of vector
ndims Number of dimensions
numel Number of elements in an array or sub-
scripted array expression.
size Size of array

6.1.6 Complex numbers

Name Description
packfC2C Pack two arrays into an interleaved complex
array
packfR2C Transforms a real array into a complex array
with zero complex elements.
unpackfC2C Unpack one complex array into two single
precision arrays
unpackfC2R Transforms a complex array into a real array
discarding the complex part

6.1.7 CUBLAS functions

Name Description
cublasAlloc Wrapper to CUBLAS cublasAlloc function

cublasCgemm Wrapper to CUBLAS cublasCgemm function


cublasCheckStatus Check the CUBLAS status.
cublasError Returns a structure with CUBLAS result
codes
cublasFree Wrapper to CUBLAS cublasFree function

55 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.1. FUNCTIONS - BY CATEGORY

cublasGetError Wrapper to CUBLAS cublasGetError func-


tion
cublasGetVector Wrapper to CUBLAS cublasGetVector func-
tion
cublasInit Wrapper to CUBLAS cublasInit function
cublasIsamax Wrapper to CUBLAS cublasIsamax function
cublasIsamin Wrapper to CUBLAS cublasIsamin function
cublasResult Returns a structure with CUBLAS error re-
sults
cublasSasum Wrapper to CUBLAS cublasSasum function
cublasSaxpy Wrapper to CUBLAS cublasSaxpy function
cublasScopy Wrapper to CUBLAS cublasScopy function
cublasSdot Wrapper to CUBLAS cublasSdot function
cublasSetVector Wrapper to CUBLAS cublasSetVector func-
tion
cublasSgemm Wrapper to CUBLAS cublasSgemm function
cublasShutdown Wrapper to CUBLAS cublasShutdown func-
tion
cublasSnrm2 Wrapper to CUBLAS cublasSnrm2 function
cublasSrot Wrapper to CUBLAS cublasSrot function
cublasSscal Wrapper to CUBLAS cublasSscal function

6.1.8 CUDA Driver functions

Name Description
cuCheckStatus Check the CUDA DRV status.
cuInit Wrapper to CUDA driver function cuInit
cuMemGetInfo Wrapper to CUDA driver function
cuMemGetInfo

6.1.9 CUFFT functions

Name Description
cufftCheckStatus Checks the CUFFT status

56 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.1. FUNCTIONS - BY CATEGORY

cufftDestroy Wrapper to CUFFT cufftDestroy


function
cufftExecC2C Wrapper to CUFFT cufftExecC2C
function
cufftExecC2R Wrapper to CUFFT cufftExecC2R
function
cufftExecR2C Wrapper to CUFFT cufftExecR2C
function
cufftPlan1d Wrapper to CUFFT cufftPlan1d
function
cufftPlan2d Wrapper to CUFFT cufftPlan2d
function
cufftResult Returns a structure with CUFFT re-
sult codes
cufftTransformDirections Returns a structure with CUFFT
transform direction codes
cufftType Returns a structure with CUFFT
transform type codes

6.1.10 CUDA run-time functions

Name Description
cudaCheckStatus Check the CUDA run-time status
cudaGetDeviceCount Wrapper to CUDA cudaGetDe-
viceCount function.
cudaGetDeviceMajorMinor Returns CUDA compute capabil-
ity major and minor numbers.
cudaGetDeviceMemory Returns device total memory
cudaGetDeviceMultProcCount Returns device multi-processors
count
cudaGetLastError Wrapper to CUDA cudaGet-
LastError function
cudaSetDevice Wrapper to CUDA cudaSetDe-
vice function
cudaThreadSynchronize Wrapper to CUDA cud-
aThreadSynchronize function.

57 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2 Operators
Operators are used in mathematical expression such as A + B. GPUmat over-
loads Matlab operators for the GPUsingle class.

Name Description
a + b Binary addition
a - b Binary subtraction
-a Unary minus
a.*b Element-wise multiplication
a*b Matrix multiplication
a./b Right element-wise division
a./ b Left element-wise division
a.^b Element-wise power
a < b Less than
a > b Greater than
a <= b Less than or equal to
a >= b Greater than or equal to
a ~= b Not equal to
a == b Equality
a & b Logical AND
a | b Logical OR
~a Logical NOT
a’ Complex conjugate trans-
pose
a.’ Matrix transpose

58 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.1 A & B
and - Logical AND

SYNTAX

R = A & B
R = and(A,B)
A - GPUsingle
B - GPUsingle
R - GPUsingle

DESCRIPTION
A & B performs a logical AND of arrays A and B and returns an
array containing elements set to either logical 1 (TRUE) or logical
0 (FALSE).

Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 3 0 4]);
B = GPUsingle([0 1 10 2]);
R = A & B;
single(R)

59 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.2 A’
ctranspose - Complex conjugate transpose

SYNTAX

R = X’
R = ctranspose(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
X’ is the complex conjugate transpose of X.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10)+i*rand(10));
R = A’
R = ctranspose(X)

60 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.3 A == B
eq - Equal

SYNTAX

R = X == Y
R = eq(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A == B (eq(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A == B;
single(R)
R = eq(A, B);
single(R)

61 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.4 A >= B
ge - Greater than or equal

SYNTAX

R = X >= Y
R = ge(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A >= B (ge(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A >= B;
single(R)
R = ge(A, B);
single(R)

62 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.5 A > B
gt - Greater than

SYNTAX

R = X > Y
R = gt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A > B (gt(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A > B;
single(R)
R = gt(A, B);
single(R)

63 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.6 A <= B
le - Less than or equal

SYNTAX

R = X <= Y
R = le(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A <= B (le(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A <= B;
single(R)
R = le(A, B);
single(R)

64 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.7 A < B
lt - Less than

SYNTAX

R = X < Y
R = lt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A < B (lt(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A < B;
single(R)
R = lt(A, B);
single(R)

65 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.8 A - B
minus - Minus

SYNTAX

R = X - Y
R = minus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
X - Y subtracts matrix Y from X. X and Y must have the same
dimensions unless one is a scalar. A scalar can be subtracted from
anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
Y = GPUsingle(rand(10));
R = Y - X

66 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.9 A / B
mrdivide - Slash or right matrix divide

SYNTAX

R = X / Y
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
Slash or right matrix divide.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = A / 5

MATLAB COMPATIBILITY
Supported only A / n where n is scalar.

67 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.10 A * B
mtimes - Matrix multiply

SYNTAX

R = X * Y
R = mtimes(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
* (mtimes(X, Y)) is the matrix product of X and Y.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A * B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A * B

68 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.11 A ~= B
ne - Not equal

SYNTAX

R = X ~= Y
R = ne(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A ~= B (ne(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A ~= B;
single(R)
R = ne(A, B);
single(R)

69 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.12 ~A
not - Logical NOT

SYNTAX

R = ~X
X - GPUsingle
R - GPUsingle

DESCRIPTION
~A (not(A)) performs a logical NOT of input array A.

Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
R = ~A;
single(R)

70 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.13 A | B
or - Logical OR

SYNTAX

R = X | Y
R = or(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A | B (or(A, B)) performs a logical OR of arrays A and B.

Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A | B;
single(R)
R = or(A, B);
single(R)

71 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.14 A + B
plus - Plus

SYNTAX

R = X + Y
R = plus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
X + Y (plus(X, Y)) adds matrices X and Y. X and Y must have
the same dimensions unless one is a scalar (a 1-by-1 matrix). A
scalar can be added to anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A + B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A + B

72 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.15 A . ^B
power - Array power

SYNTAX

R = X .^ Y
R = power(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
Z = X.^Y denotes element-by-element powers.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = 2;
R = A .^ B
A = GPUsingle(rand(10)+i*rand(10));
R = A .^ B

MATLAB COMPATIBILITY
Implemented for REAL exponents only.

73 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.16 A ./ B
rdivide - Right array divide

SYNTAX

R = X ./ Y
R = rdivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A./B denotes element-by-element division. A and B must have the
same dimensions unless one is a scalar. A scalar can be divided
with anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A ./ B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A ./ B

74 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.17 A(I)
subsref - Subscripted reference

SYNTAX

R = X(I)
X - GPUsingle
I - GPUsingle
R - GPUsingle

DESCRIPTION
A(I) (subsref) is an array formed from the elements of A specified
by the subscript vector I. The resulting array is the same size as
I except for the special case where A and I are both vectors. In
this case, A(I) has the same number of elements as I but has the
orientation of A.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 3 4 5]);
idx = GPUsingle([1 2]);
B = A(idx)

75 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.18 A .* B
times - Array multiply

SYNTAX

R = X .* Y
R = times(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
X.*Y denotes element-by-element multiplication. X and Y must
have the same dimensions unless one is a scalar. A scalar can be
multiplied into anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A .* B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A .* B

76 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.19 A .’
transpose - Transpose

SYNTAX

R = X.’
R = transpose(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
X.’ or transpose(X) is the non-conjugate transpose.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = X.’
R = transpose(X)

77 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.2. OPERATORS

6.2.20 [A;B]
vertcat - Vertical concatenation

SYNTAX

R = [X;Y]
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
[A;B] is the vertical concatenation of matrices A and B. A and B
must have the same number of columns. Any number of matrices
can be concatenated within one pair of brackets.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = [zeros(10,1,GPUsingle);colon(0,1,10,GPUsingle)’];

78 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3 High level functions - alphabetical list


6.3.1 abs
abs - Absolute value

SYNTAX

R = abs(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
ABS(X) is the absolute value of the elements of X. When X is com-
plex, ABS(X) is the complex modulus (magnitude) of the elements
of X.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(1,5)+i*rand(1,5));
R = abs(X)

79 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.2 acos
acos - Inverse cosine

SYNTAX

R = acos(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
ACOS(X) is the arccosine of the elements of X. NaN (Not A Number)
results are obtained if ABS(x) > 1.0 for some element.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = acos(X)

MATLAB COMPATIBILITY
NaN returned if ABS(x) > 1.0 . In this case Matlab returns a
complex number. Not implemented for complex X.

80 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.3 acosh
acosh - Inverse hyperbolic cosine

SYNTAX

R = acosh(X)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
ACOSH(X) is the inverse hyperbolic cosine of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10)) + 1;
R = acosh(X)

MATLAB COMPATIBILITY
NaN is returned if X<1.0 . Not implemented for complex X.

81 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.4 and
and - Logical AND

SYNTAX

R = A & B
R = and(A,B)
A - GPUsingle
B - GPUsingle
R - GPUsingle

DESCRIPTION
A & B performs a logical AND of arrays A and B and returns an
array containing elements set to either logical 1 (TRUE) or logical
0 (FALSE).

Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 3 0 4]);
B = GPUsingle([0 1 10 2]);
R = A & B;
single(R)

82 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.5 asin
asin - Inverse sine

SYNTAX

R = asin(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
ASIN(X) is the arcsine of the elements of X. NaN (Not A Number)
results are obtained if ABS(x) > 1.0 for some element.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = asin(X)

MATLAB COMPATIBILITY
NaN returned if ABS(x) > 1.0 . In this case Matlab returns a
complex number. Not implemented for complex X.

83 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.6 asinh
asinh - Inverse hyperbolic sine

SYNTAX

R = asinh(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
ASINH(X) is the inverse hyperbolic sine of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = asinh(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

84 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.7 atan
atan - Inverse tangent, result in radians

SYNTAX

R = atan(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
ATAN(X) is the arctangent of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = atan(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

85 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.8 atanh
atanh - Inverse hyperbolic tangent

SYNTAX

R = atanh(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
ATANH(X) is the inverse hyperbolic tangent of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = atanh(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

86 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.9 ceil
ceil - Round towards plus infinity

SYNTAX

R = ceil(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
CEIL(X) rounds the elements of X to the nearest integers towards
infinity.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = ceil(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

87 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.10 colon
colon - Colon

SYNTAX

R = colon(J,K,GPUsingle)
R = colon(J,D,K,GPUsingle)

DESCRIPTION
COLON(J,K,GPUsingle) is the same as J:K and
COLON(J,D,K,GPUsingle) is the same as J:D:K. J:K is the
same as [J, J+1, ..., K]. J:K is empty if J > K. J:D:K is the
same as [J, J+D, ..., J+m*D] where m = fix((K-J)/D). J:D:K
is empty if D == 0, if D > 0 and J > K, or if D < 0 and J < K.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = colon(1,2,10,GPUsingle)

88 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.11 conj
conj - CONJ(X) is the complex conjugate of X

SYNTAX

R = conj(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
For a complex X, CONJ(X) = REAL(X) - i*IMAG(X).

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(1,5) + i*rand(1,5));
B = conj(A)

89 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.12 cos
cos - Cosine of argument in radians

SYNTAX

R = cos(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
COS(X) is the cosine of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = cos(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

90 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.13 cosh
cosh - Hyperbolic cosine

SYNTAX

R = cosh(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
COSH(X) is the hyperbolic cosine of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = cosh(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

91 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.14 ctranspose
ctranspose - Complex conjugate transpose

SYNTAX

R = X’
R = ctranspose(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
X’ is the complex conjugate transpose of X.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10)+i*rand(10));
R = A’
R = ctranspose(X)

92 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.15 display
display - Display GPU variable

SYNTAX

display(X)
X - GPUsingle

DESCRIPTION
Prints GPU single information. DISPLAY(X) is called for the ob-
ject X when the semicolon is not used to terminate a statement.

EXAMPLE

A = GPUsingle(rand(10));
display(A)
A

93 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.16 double
double - Converts a GPU single precision variable into a Matlab
double precision variable

SYNTAX

R = single(A)
A - GPUsingle variable
R - single precision Matlab variable

DESCRIPTION
B = SINGLE(A) converts the content of the GPU single precision
variable A into a double precision Matlab array. Loss of precision
occurs in the conversion.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(100));
Ah = double(A);

94 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.17 eq
eq - Equal

SYNTAX

R = X == Y
R = eq(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A == B (eq(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A == B;
single(R)
R = eq(A, B);
single(R)

95 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.18 exp
exp - Exponential

SYNTAX

R = exp(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
EXP(X) is the exponential of the elements of X, e to the X. For
complex Z=X+i*Y, EXP(Z) = EXP(X)*(COS(Y)+i*SIN(Y)).

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(1,5)+i*rand(1,5));
R = exp(X)

96 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.19 fft
fft - Discrete Fourier transform

SYNTAX

R = fft(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
FFT(X) is the discrete Fourier transform (DFT) of vector X.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(1,5)+i*rand(1,5));
R = fft(X)

97 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.20 fft2
fft2 - Two-dimensional discrete Fourier Transform

SYNTAX

R = fft2(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
FFT2(X) returns the two-dimensional Fourier transform of matrix
X.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(5,5)+i*rand(5,5));
R = fft2(X)

98 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.21 floor
floor - Round towards minus infinity

SYNTAX

R = floor(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
FLOOR(X) rounds the elements of X to the nearest integers towards
minus infinity.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(1,5));
R = floor(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

99 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.22 ge
ge - Greater than or equal

SYNTAX

R = X >= Y
R = ge(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A >= B (ge(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A >= B;
single(R)
R = ge(A, B);
single(R)

100 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.23 GPUinfo
GPUinfo - Prints information about the GPU device

SYNTAX

GPUinfo

DESCRIPTION
GPUinfo displays information about each CUDA capable device
installed on the system. Printed information includes total memory
and number of processors. GPUinfo(N) displays information about
the specific device with index= N.

EXAMPLE

GPUinfo(0)

6.3.24 GPUmem
GPUmem - Returns the free memory (bytes) on selected GPU
device

SYNTAX

GPUmem

DESCRIPTION
Returns the free memory (bytes) on selected GPU device.

EXAMPLE

GPUmem
GPUmem/1024/1024

101 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.25 GPUsingle
GPUsingle - GPUsingle constructor

SYNTAX

R = GPUsingle()
R = GPUsingle(A)
A - Either a GPUsingle or a Matlab array
R - GPUsingle variable

DESCRIPTION
GPUsingle is used to create a Matlab variable that is effectively
allocated on the GPU memory. Operations on GPUsingle objects
are executed on GPU.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

GPUsingle(rand(100,100))
Ah = rand(100);
A = GPUsingle(Ah);
Bh = rand(100) + i*rand(100);
B = GPUsingle(Bh);

102 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.26 GPUstart
GPUstart - Starts the GPU environment and loads required com-
ponents

SYNTAX

GPUstart

DESCRIPTION
Start GPU environment and load required components.

EXAMPLE

GPUstart

6.3.27 GPUsync
GPUsync - Wait until all GPU operations are completed

SYNTAX

GPUsync

DESCRIPTION
Wait until all GPU operations are completed.

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
tic;A + B;GPUsync;toc;

103 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.28 gt
gt - Greater than

SYNTAX

R = X > Y
R = gt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A > B (gt(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A > B;
single(R)
R = gt(A, B);
single(R)

104 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.29 ifft
ifft - Inverse discrete Fourier transform

SYNTAX

R = ifft(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
IFFT(X) is the inverse discrete Fourier transform of X.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(1,5)+i*rand(1,5));
R = fft(X);
X = ifft(R);

105 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.30 ifft2
ifft2 - Two-dimensional inverse discrete Fourier transform

SYNTAX

R = ifft2(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
IFFT2(F) returns the two-dimensional inverse Fourier transform of
matrix F.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(5,5)+i*rand(5,5));
R = fft2(X);
X = ifft2(R);

106 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.31 iscomplex
iscomplex - True for complex array

SYNTAX

R = iscomplex(X)
X - GPUsingle
R - logical (0 or 1)

DESCRIPTION
ISCOMPLEX(X) returns 1 if X does have an imaginary part and 0
otherwise.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(5));
iscomplex(A)
A = GPUsingle(rand(5)+i*rand(5));
iscomplex(A)

107 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.32 isempty
isempty - True for empty GPUsingle array

SYNTAX

R = isempty(X)
X - GPUsingle
R - logical (0 or 1)

DESCRIPTION
ISEMPTY(X) returns 1 if X is an empty GPUsingle array and 0
otherwise. An empty GPUsingle array has no elements, that is
prod(size(X))==0.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle();
isempty(A)
A = GPUsingle(rand(5)+i*rand(5));
isempty(A)

108 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.33 isreal
isreal - True for real array

SYNTAX

R = isreal(X)
X - GPUsingle
R - logical (0 or 1)

DESCRIPTION
ISREAL(X) returns 1 if X does not have an imaginary part and 0
otherwise.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(5));
isreal(A)
A = GPUsingle(rand(5)+i*rand(5));
isreal(A)

109 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.34 isscalar
isscalar - True if array is a scalar

SYNTAX

R = isscalar(X)
X - GPUsingle
R - logical (0 or 1)

DESCRIPTION
ISSCALAR(S) returns 1 if S is a 1x1 matrix and 0 otherwise.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(5));
isscalar(A)
A = GPUsingle(1);
isscalar(A)

110 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.35 ldivide
ldivide - Left array divide

SYNTAX

R = X .\ Y
R = ldivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A.\B denotes element-by-element division. A and B must have the
same dimensions unless one is a scalar. A scalar can be divided
with anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A .\ B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A .\ B

111 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.36 le
le - Less than or equal

SYNTAX

R = X <= Y
R = le(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A <= B (le(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A <= B;
single(R)
R = le(A, B);
single(R)

112 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.37 length
length - Length of vector

SYNTAX

R = length(X)
X - GPUsingle

DESCRIPTION
LENGTH(X) returns the length of vector X. It is equivalent to
MAX(SIZE(X)) for non-empty arrays and 0 for empty ones.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(5));
length(A)

113 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.38 log
log - Natural logarithm

SYNTAX

R = log(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
LOG(X) is the natural logarithm of the elements of X. NaN results
are produced if X is not positive.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = log(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

114 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.39 log10
log10 - Common (base 10) logarithm

SYNTAX

R = log10(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
LOG10(X) is the base 10 logarithm of the elements of X. NaN results
are produced if X is not positive.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = log10(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

115 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.40 log1p
log1p - Compute log(1+z) accurately

SYNTAX

R = log1p(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
LOG1P(Z) computes log(1+z). Only REAL values are accepted.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = log1p(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

116 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.41 log2
log2 - Base 2 logarithm and dissect floating point number

SYNTAX

R = log2(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
Y = LOG2(X) is the base 2 logarithm of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = log2(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

117 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.42 lt
lt - Less than

SYNTAX

R = X < Y
R = lt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A < B (lt(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A < B;
single(R)
R = lt(A, B);
single(R)

118 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.43 minus
minus - Minus

SYNTAX

R = X - Y
R = minus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
X - Y subtracts matrix Y from X. X and Y must have the same
dimensions unless one is a scalar. A scalar can be subtracted from
anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
Y = GPUsingle(rand(10));
R = Y - X

119 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.44 mrdivide
mrdivide - Slash or right matrix divide

SYNTAX

R = X / Y
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
Slash or right matrix divide.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = A / 5

MATLAB COMPATIBILITY
Supported only A / n where n is scalar.

120 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.45 mtimes
mtimes - Matrix multiply

SYNTAX

R = X * Y
R = mtimes(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
* (mtimes(X, Y)) is the matrix product of X and Y.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A * B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A * B

121 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.46 ndims
ndims - Number of dimensions

SYNTAX

R = ndims(X)
X - GPUsingle

DESCRIPTION
N = NDIMS(X) returns the number of dimensions in the array X.
The number of dimensions in an array is always greater than or
equal to 2. Trailing singleton dimensions are ignored. Put simply,
it is LENGTH(SIZE(X)).

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
ndims(X)

122 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.47 ne
ne - Not equal

SYNTAX

R = X ~= Y
R = ne(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A ~= B (ne(A, B)) does element by element comparisons between
A and B.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A ~= B;
single(R)
R = ne(A, B);
single(R)

123 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.48 not
not - Logical NOT

SYNTAX

R = ~X
X - GPUsingle
R - GPUsingle

DESCRIPTION
~A (not(A)) performs a logical NOT of input array A.

Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
R = ~A;
single(R)

124 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.49 numel
numel - Number of elements in an array or subscripted array ex-
pression.

SYNTAX

R = numel(X)
X - GPUsingle
R - number of elements

DESCRIPTION
N = NUMEL(A) returns the number of elements N in array A.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
numel(X)

125 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.50 ones
ones - GPU single precision ones array

SYNTAX

ones(N,GPUsingle)
ones(M,N,GPUsingle)
ones([M,N],GPUsingle)
ones(M,N,P,?,GPUsingle)
ones([M N P ...],GPUsingle)

DESCRIPTION
ones(N,GPUsingle) is an N-by-N GPU matrix of single preicision
ones.
ones(M,N,GPUsingle) or ones([M,N],GPUsingle) is an M-by-N
GPU matrix of single precision ones.
ones(M,N,P,...,GPUsingle) or ones([M N P ...,GPUsingle])
is an M-by-N-by-P-by-... GPU array of single precision ones.

Real type supported since version 0.1

EXAMPLE

A = ones(10,GPUsingle)
B = ones(10, 10,GPUsingle)
C = ones([10 10],GPUsingle)

126 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.51 or
or - Logical OR

SYNTAX

R = X | Y
R = or(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A | B (or(A, B)) performs a logical OR of arrays A and B.

Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A | B;
single(R)
R = or(A, B);
single(R)

127 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.52 plus
plus - Plus

SYNTAX

R = X + Y
R = plus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
X + Y (plus(X, Y)) adds matrices X and Y. X and Y must have
the same dimensions unless one is a scalar (a 1-by-1 matrix). A
scalar can be added to anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A + B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A + B

128 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.53 power
power - Array power

SYNTAX

R = X .^ Y
R = power(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
Z = X.^Y denotes element-by-element powers.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = 2;
R = A .^ B
A = GPUsingle(rand(10)+i*rand(10));
R = A .^ B

MATLAB COMPATIBILITY
Implemented for REAL exponents only.

129 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.54 rdivide
rdivide - Right array divide

SYNTAX

R = X ./ Y
R = rdivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
A./B denotes element-by-element division. A and B must have the
same dimensions unless one is a scalar. A scalar can be divided
with anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A ./ B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A ./ B

130 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.55 round
round - Round towards nearest integer

SYNTAX

R = round(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
ROUND(X) rounds the elements of X to the nearest integers.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = round(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

131 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.56 sin
sin - Sine of argument in radians

SYNTAX

R = sin(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
SIN(X) is the sine of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = sin(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

132 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.57 single
single - Converts a GPU variable into a Matlab single precision
variable

SYNTAX

R = single(X)
X - GPUsingle
R - Matlab variable

DESCRIPTION
B = SINGLE(A) returns the contents of the GPU variable A into a
single precision Matlab array.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(100))
Ah = single(A);

133 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.58 sinh
sinh - Hyperbolic sine

SYNTAX

R = sinh(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
SINH(X) is the hyperbolic sine of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = sinh(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

134 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.59 size
size - Size of array

SYNTAX

R = size(X)
X - GPUsingle

DESCRIPTION
D = SIZE(X), for M-by-N matrix X, returns the two-element row
vector D = [M,N] containing the number of rows and columns in
the matrix.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
size(X)

135 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.60 sqrt
sqrt - Square root

SYNTAX

R = sqrt(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
SQRT(X) is the square root of the elements of X. NaN results are
produced if X is not positive.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = sqrt(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

136 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.61 subsref
subsref - Subscripted reference

SYNTAX

R = X(I)
X - GPUsingle
I - GPUsingle
R - GPUsingle

DESCRIPTION
A(I) (subsref) is an array formed from the elements of A specified
by the subscript vector I. The resulting array is the same size as
I except for the special case where A and I are both vectors. In
this case, A(I) has the same number of elements as I but has the
orientation of A.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle([1 2 3 4 5]);
idx = GPUsingle([1 2]);
B = A(idx)

137 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.62 sum
sum - Sum of elements

SYNTAX

R = sum(X)
R = sum(X, DIM)
X - GPUsingle
DIM - integer
R - GPUsingle

DESCRIPTION
S = SUM(X) is the sum of the elements of the vector X. S =
SUM(X,DIM) sums along the dimension DIM.
Note: currently the performance of the sum(X,DIM) with DIM>1 is
3x or 4x better than the sum(X,DIM) with DIM=1.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(5,5)+i*rand(5,5));
R = sum(X);
E = sum(X,2);

138 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.63 tan
tan - Tangent of argument in radians

SYNTAX

R = tan(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
TAN(X) is the tangent of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = tan(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

139 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.64 tanh
tanh - Hyperbolic tangent

SYNTAX

R = tanh(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
TANH(X) is the hyperbolic tangent of the elements of X.

Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = tanh(X)

MATLAB COMPATIBILITY
Not implemented for complex X.

140 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.65 times
times - Array multiply

SYNTAX

R = X .* Y
R = times(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
X.*Y denotes element-by-element multiplication. X and Y must
have the same dimensions unless one is a scalar. A scalar can be
multiplied into anything.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A .* B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A .* B

141 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.66 transpose
transpose - Transpose

SYNTAX

R = X.’
R = transpose(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
X.’ or transpose(X) is the non-conjugate transpose.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = X.’
R = transpose(X)

142 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.67 uminus
uminus - Unary minus

SYNTAX

R = -X
R = uminus(X)
X - GPUsingle
R - GPUsingle

DESCRIPTION
-A negates the elements of A.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

X = GPUsingle(rand(10));
R = -X
R = uminus(X)

143 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.68 vertcat
vertcat - Vertical concatenation

SYNTAX

R = [X;Y]
X - GPUsingle
Y - GPUsingle
R - GPUsingle

DESCRIPTION
[A;B] is the vertical concatenation of matrices A and B. A and B
must have the same number of columns. Any number of matrices
can be concatenated within one pair of brackets.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = [zeros(10,1,GPUsingle);colon(0,1,10,GPUsingle)’];

144 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.3. HIGH LEVEL FUNCTIONS - ALPHABETICAL LIST

6.3.69 zeros
zeros - GPU single precision zeros array

SYNTAX

zeros(N,GPUsingle)
zeros(M,N,GPUsingle)
zeros([M,N],GPUsingle)
zeros(M,N,P,?,GPUsingle)
zeros([M N P ...],GPUsingle)

DESCRIPTION
zeros(N,GPUsingle) is an N-by-N GPU matrix of single preicision
zeros.
zeros(M,N,GPUsingle) or zeros([M,N],GPUsingle) is an M-by-N
GPU matrix of single precision zeros.
zeros(M,N,P,...,GPUsingle) or zeros([M N P
...,GPUsingle]) is an M-by-N-by-P-by-... GPU array of
single precision zeros.

Real type supported since version 0.1

EXAMPLE

A = zeros(10,GPUsingle)
B = zeros(10, 10,GPUsingle)
C = zeros([10 10],GPUsingle)

145 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4 Low level functions - alphabetical list


6.4.1 cublasAlloc
cublasAlloc - Wrapper to CUBLAS cublasAlloc function

SYNTAX

[status d_A] = cublasAlloc(N,SIZE,d_A);


N - number of elements to allocate
SIZE - size of the elements to allocate
d_A - pointer to GPU memory
status - CUBLAS status
d_A - pointer to GPU memory

DESCRIPTION
Wrapper to CUBLAS cublasAlloc function.
Original function declaration:

cublasStatus
cublasAlloc (int n, int elemSize, void **devicePtr)

Mapping:

[status d_A] = cublasAlloc(N, SIZE, d_A)


N -> int n
SIZE -> int elemSize
d_A -> void **devicePtr

status -> cublasStatus

EXAMPLE

N = 10;
SIZEOF_FLOAT = sizeoffloat();
% GPU variable d_A
d_A = 0;
[status d_A] = cublasAlloc(N,SIZEOF_FLOAT,d_A);
ret = cublasCheckStatus( status, ...
’!!!! device memory allocation error (d_A)’);

146 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.2 cublasCgemm
cublasCgemm - Wrapper to CUBLAS cublasCgemm function

DESCRIPTION
Wrapper to CUBLAS cublasCgemm function. Original function
declaration:

void cublasCgemm
(char transa, char transb, int m, int n, int k,
cuComplex alpha, const cuComplex *A, int lda,
const cuComplex *B, int ldb, cuComplex beta,
cuComplex *C, int ldc)

EXAMPLE

I = sqrt(-1);
A = GPUsingle(rand(N,N) + I*rand(N,N));
B = GPUsingle(rand(N,N) + I*rand(N,N));
% C needs to be complex as well
C = zeros(N,N,GPUsingle)*I;

% alpha is complex
alpha = 2.0+I*3.0;
beta = 0.0;

opA = ’n’;
opB = ’n’;

cublasCgemm(opA, opB, N, N, N, ...


alpha, getPtr(A), N, getPtr(B), ...
N, beta, getPtr(C), N);

status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

147 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.3 cublasCheckStatus
cublasCheckStatus - Check the CUBLAS status.

DESCRIPTION
cublasCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an er-
ror with message ’MSG’.

EXAMPLE

status = cublasGetError();
cublasCheckStatus( status, ’Kernel execution error’);

6.4.4 cublasError
cublasError - Returns a structure with CUBLAS result codes

DESCRIPTION
Returns a structure with CUBLAS result codes.

EXAMPLE

cublasError

ans =

CUBLAS_STATUS_SUCCESS: 0
CUBLAS_STATUS_NOT_INITIALIZED: 1
CUBLAS_STATUS_ALLOC_FAILED: 3
CUBLAS_STATUS_INVALID_VALUE: 7
CUBLAS_STATUS_ARCH_MISMATCH: 8
CUBLAS_STATUS_MAPPING_ERROR: 11
CUBLAS_STATUS_EXECUTION_FAILED: 13
CUBLAS_STATUS_INTERNAL_ERROR: 14

148 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.5 cublasFree
cublasFree - Wrapper to CUBLAS cublasFree function

DESCRIPTION
Wrapper to CUBLAS cublasFree function.
Original function declaration:

cublasStatus
cublasFree (const void *devicePtr)

Mapping:

status = cublasFree(d_A)
d_A -> const void *devicePtr

status -> cublasStatus

EXAMPLE

N = 10;
SIZEOF_FLOAT = sizeoffloat();

% GPU variable d_A


d_A = 0;
[status d_A] = cublasAlloc(N,SIZEOF_FLOAT,d_A);
ret = cublasCheckStatus( status, ...
’!!!! device memory allocation error (d_A)’);

% Clean up memory
status = cublasFree(d_A);
ret = cublasCheckStatus( status, ...
’!!!! memory free error (d_A)’);

149 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.6 cublasGetError
cublasGetError - Wrapper to CUBLAS cublasGetError function

DESCRIPTION
Wrapper to CUBLAS cublasGetError function. Original function
declaration:

cublasStatus
cublasGetError (void)

EXAMPLE

status = cublasGetError();
cublasCheckStatus( status, ’Kernel execution error’);

150 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.7 cublasGetVector
cublasGetVector - Wrapper to CUBLAS cublasGetVector function

DESCRIPTION
Wrapper to CUBLAS cublasGetVector function. Original function
declaration:

cublasStatus
cublasGetVector
(int n, int elemSize, const void *x, int incx,
void *y, int incy)

EXAMPLE

A = GPUsingle([1 2 3 4]);

% Ah should be of the correct type. GPUsingle is single


% precision floating point, also Ah should be single
% precision
Ah = single(zeros(size(A)));

% The function getSizeOf returns the size of the


% stored elements in A (for example float or complex)
[status Ah] = cublasGetVector (numel(A), getSizeOf(A),...
getPtr(A), 1, Ah, 1);
cublasCheckStatus( status, ’Error.’);
disp(Ah);

151 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.8 cublasInit
cublasInit - Wrapper to CUBLAS cublasInit function

DESCRIPTION
Wrapper to CUBLAS cublasInit function. Original function decla-
ration:

cublasStatus
cublasInit (void)

EXAMPLE

status = cublasInit;
cublasCheckStatus(status, ’Error.’);

152 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.9 cublasIsamax
cublasIsamax - Wrapper to CUBLAS cublasIsamax function

DESCRIPTION
Wrapper to CUBLAS cublasIsamax function. Original function
declaration:

int
cublasIsamax (int n, const float *x, int incx)

Mapping:

RET = cublasIsamax(N, d_A, INCX)


N -> int n
d_A -> void **devicePtr
INCX -> int incx

RET -> cublasIsamax result

EXAMPLE

%% High level implementation


A = GPUsingle(rand(1,N));

Isamax = cublasIsamax(N, getPtr(A), 1);


status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

[value, Isamax_mat] = max(single(A));


compareArrays(Isamax, Isamax_mat, 1e-6);

153 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.10 cublasIsamin
cublasIsamin - Wrapper to CUBLAS cublasIsamin function

DESCRIPTION
Wrapper to CUBLAS cublasIsamin function. Original function dec-
laration:

int
cublasIsamin (int n, const float *x, int incx)

EXAMPLE

N = 10;
A = GPUsingle(rand(1,N));

Isamin = cublasIsamin(N, getPtr(A), 1);


status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

[value, Isamin_mat] = min(single(A));


compareArrays(Isamin, Isamin_mat, 1e-6);

6.4.11 cublasResult
cublasResult - Returns a structure with CUBLAS error results

DESCRIPTION
Returns a structure with CUBLAS error results.

154 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.12 cublasSasum
cublasSasum - Wrapper to CUBLAS cublasSasum function

DESCRIPTION
Wrapper to CUBLAS cublasSasum function.
Original function declaration:

float
cublasSasum (int n, const float *x, int incx)

EXAMPLE

N = 10;
A = GPUsingle(rand(1,N));

Sasum = cublasSasum( N, getPtr(A), 1);


status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

Sasum_mat = sum(single(A));
compareArrays(Sasum, Sasum_mat, 1e-6);

155 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.13 cublasSaxpy
cublasSaxpy - Wrapper to CUBLAS cublasSaxpy function

DESCRIPTION
Wrapper to CUBLAS cublasSaxpy function. Original function dec-
laration:

void
cublasSaxpy
(int n, float alpha, const float *x, int incx, float *y,
int incy)

EXAMPLE

N = 10;
A = GPUsingle(rand(1,N));
B = GPUsingle(rand(1,N));

alpha = 2.0;
Saxpy_mat = alpha * single(A) + single(B);

cublasSaxpy(N, alpha, getPtr(A), 1, getPtr(B), 1);

status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

compareArrays(Saxpy_mat, single(B), 1e-6);

156 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.14 cublasScopy
cublasScopy - Wrapper to CUBLAS cublasScopy function

DESCRIPTION
Wrapper to CUBLAS cublasScopy function. Original function dec-
laration:

void
cublasScopy
(int n, const float *x, int incx, float *y, int incy)

EXAMPLE

N = 10;
A = GPUsingle(rand(1,N));
B = GPUsingle(rand(1,N));

cublasScopy(N, getPtr(A), 1, getPtr(B), 1);


status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

compareArrays(single(A), single(B), 1e-6);

157 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.15 cublasSdot
cublasSdot - Wrapper to CUBLAS cublasSdot function

DESCRIPTION
Wrapper to CUBLAS cublasSdot function. Original function dec-
laration:

float
cublasSdot
(int n, const float *x, int incx, const float *y, int incy)

EXAMPLE

N = 10;
A = GPUsingle(rand(1,N));
B = GPUsingle(rand(1,N));

Sdot_mat = sum(single(A).*single(B));
Sdot = cublasSdot(N, getPtr(A), 1, getPtr(B), 1);

status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

compareArrays(Sdot_mat, Sdot, 1e-6);

158 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.16 cublasSetVector
cublasSetVector - Wrapper to CUBLAS cublasSetVector function

DESCRIPTION
Wrapper to CUBLAS cublasSetVector function. Original function
declaration:

cublasStatus
cublasSetVector
(int n, int elemSize, const void *x, int incx,
void *y, int incy)

EXAMPLE

B =single( [1 2 3 4]);

% Create empty GPU variable A


A = GPUsingle();
setSize(A, size(B));
GPUallocVector(A);

status = cublasSetVector(numel(A), getSizeOf(A), ...


B, 1, getPtr(A), 1);
cublasCheckStatus( status, ’Error.’);

disp(single(A));

159 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.17 cublasSgemm
cublasSgemm - Wrapper to CUBLAS cublasSgemm function

DESCRIPTION
Wrapper to CUBLAS cublasSgemm function. Original function
declaration:

void
cublasSgemm
(char transa, char transb, int m, int n, int k,
float alpha, const float *A, int lda,
const float *B, int ldb, float beta,
float *C, int ldc)

EXAMPLE

N = 10;
A = GPUsingle(rand(N,N));
B = GPUsingle(rand(N,N));
C = zeros(N,N,GPUsingle);

alpha = 2.0;
beta = 0.0;

opA = ’n’;
opB = ’n’;

cublasSgemm(opA, opB, N, N, N, ...


alpha, getPtr(A), N, getPtr(B), ...
N, beta, getPtr(C), N);

status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

160 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.18 cublasShutdown
cublasShutdown - Wrapper to CUBLAS cublasShutdown function

DESCRIPTION
Wrapper to CUBLAS cublasShutdown function. Original function
declaration:

cublasStatus
cublasShutdown (void)

6.4.19 cublasSnrm2
cublasSnrm2 - Wrapper to CUBLAS cublasSnrm2 function

DESCRIPTION
Wrapper to CUBLAS cublasSnrm2 function. Original function dec-
laration:

float
cublasSnrm2 (int n, const float *x, int incx)

EXAMPLE

N = 10;
A = GPUsingle(rand(1,N));

Snrm2_mat = sqrt(sum(single(A).*single(A)));
Snrm2 = cublasSnrm2(N, getPtr(A),1);

status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

161 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.20 cublasSrot
cublasSrot - Wrapper to CUBLAS cublasSrot function

DESCRIPTION
Wrapper to CUBLAS cublasSrot function.
Original function declaration:

void
cublasSrot (int n, float *x, int incx,
float *y, int incy, float sc,
float ss)

6.4.21 cublasSscal
cublasSscal - Wrapper to CUBLAS cublasSscal function

DESCRIPTION
Wrapper to CUBLAS cublasSscal function.
Original function declaration:

void
sscal (int n, float alpha, float *x, int incx)

EXAMPLE

N = 10;
A = GPUsingle(rand(1,N));

alpha = 1/10.0;
A_mat = single(A)*alpha;
cublasSscal(N, alpha, getPtr(A), 1);

status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);

162 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.22 cuCheckStatus
cuCheckStatus - Check the CUDA DRV status.

DESCRIPTION
cuCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an
error with message ’MSG’.

EXAMPLE

[status]=cuInit();
cuCheckStatus( status, ’Error initialize CUDA driver.’);

6.4.23 cudaCheckStatus
cudaCheckStatus - Check the CUDA run-time status

DESCRIPTION
RET = cudaCheckStatus(STATUS,MSG) returns EXIT FAILURE(1)
or EXIT SUCCESS(0) depending on STATUS value, and throws an
error with message ’MSG’.

EXAMPLE

status = cudaGetLastError();
cudaCheckStatus( status, ’Kernel execution error.’);

163 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.24 cudaGetDeviceCount
cudaGetDeviceCount - Wrapper to CUDA cudaGetDeviceCount
function.

DESCRIPTION
Wrapper to CUDA cudaGetDeviceCount function.

EXAMPLE

count = 0;
[status,count] = cudaGetDeviceCount(count);
if (status ~=0)
error(’Unable to get the number of devices’);
end

6.4.25 cudaGetDeviceMajorMinor
cudaGetDeviceMajorMinor - Returns CUDA compute capability
major and minor numbers.

DESCRIPTION
Returns CUDA compute capability major and minor numbers.
[STATUS, MAJOR, MINOR] = cudaGetDeviceMajorMinor(DEV)
returns the compute capability number (major, minor) of the
device=DEV. STATUS is the result of the operation.

EXAMPLE

dev = 0;
[status,major,minor] = cudaGetDeviceMajorMinor(dev);
if (status ~=0)
error([’Unable to get the compute capability’]);
end

major
minor

164 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.26 cudaGetDeviceMemory
cudaGetDeviceMemory - Returns device total memory

DESCRIPTION
[STATUS, TOTMEM] = cudaGetDeviceMemory(DEV) returns the to-
tal memory of the device=DEV. STATUS is the result of the oper-
ation.

EXAMPLE

dev = 0;
[status,totmem] = cudaGetDeviceMemory(dev);
if (status ~=0)
error(’Error getting total memory’);
end
totmem = totmem/1024/1024;
disp([’Total memory=’ num2str(totmem) ’MB’]);

6.4.27 cudaGetDeviceMultProcCount
cudaGetDeviceMultProcCount - Returns device multi-processors
count

DESCRIPTION
[STATUS, COUNT] = cudaGetDeviceMultProcCount(DEV) re-
turns the number of multi-processors of the device=DEV. STATUS
is the result of the operation.

EXAMPLE

dev = 0;
[status,count] = cudaGetDeviceMultProcCount(dev);
if (status ~=0)
error(’Error getting numer of multi proc’);
end
disp([’ Mult. processors = ’ num2str(count) ]);

165 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.28 cudaGetLastError
cudaGetLastError - Wrapper to CUDA cudaGetLastError function

DESCRIPTION
[STATUS] = cudaGetLastError() returns the last error from the
run-time call. STATUS is the result of the operation.
Original function declaration:

cudaError_t
cudaGetLastError(void)

6.4.29 cudaSetDevice
cudaSetDevice - Wrapper to CUDA cudaSetDevice function

DESCRIPTION
[STATUS] = cudaSetDevice(DEV) sets the device to DEV and re-
turns the result of the operation in STATUS.
Original function declaration:

cudaError_t
cudaSetDevice( int dev )

6.4.30 cudaThreadSynchronize
cudaThreadSynchronize - Wrapper to CUDA cudaThreadSyn-
chronize function.

DESCRIPTION
[STATUS] = cudaThreadSynchronize(). STATUS is the result of
the operation.
Original function declaration:

cudaError_t cudaThreadSynchronize(void)

166 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.31 cufftCheckStatus
cufftCheckStatus - Checks the CUFFT status

DESCRIPTION
cufftCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an er-
ror with message ’MSG’. STATUS is compared to CUFFT possible
results.

EXAMPLE

fftType = cufftType;
A = GPUsingle(rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);

167 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.32 cufftDestroy
cufftDestroy - Wrapper to CUFFT cufftDestroy function

DESCRIPTION
Wrapper to CUFFT cufftDestroy function. Original function dec-
laration:

cufftResult
cufftDestroy(cufftHandle plan);

EXAMPLE

fftType = cufftType;
I = sqrt(-1);
A = GPUsingle(rand(1,128)+I*rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);

[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);

168 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.33 cufftExecC2C
cufftExecC2C - Wrapper to CUFFT cufftExecC2C function

DESCRIPTION
Wrapper to CUFFT cufftExecC2C function. Original function dec-
laration:

cufftResult
cufftExecC2C(cufftHandle plan,
cufftComplex *idata,
cufftComplex *odata,
int direction);

EXAMPLE

fftType = cufftType;
fftDir = cufftTransformDirections;

I = sqrt(-1);

A = GPUsingle(rand(1,128)+I*rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);

dir = fftDir.CUFFT_FORWARD;
[status] = cufftExecC2C(plan, getPtr(A),getPtr(A), dir);
cufftCheckStatus(status, ’Error in cufftExecC2C’);

[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);

169 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.34 cufftExecC2R
cufftExecC2R - Wrapper to CUFFT cufftExecC2R function

DESCRIPTION
Wrapper to CUFFT cufftExecC2R function. Original function dec-
laration:

cufftResult
cufftExecC2R(cufftHandle plan,
cufftComplex *idata,
cufftReal *odata);

6.4.35 cufftExecR2C
cufftExecR2C - Wrapper to CUFFT cufftExecR2C function

DESCRIPTION
Wrapper to CUFFT cufftExecR2C function. Original function dec-
laration:

cufftResult
cufftExecR2C(cufftHandle plan,
cufftReal *idata,
cufftComplex *odata);

170 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.36 cufftPlan1d
cufftPlan1d - Wrapper to CUFFT cufftPlan1d function

DESCRIPTION
Wrapper to CUFFT cufftPlan1d function. Original function decla-
ration:

cufftResult
cufftPlan1d(cufftHandle *plan,
int nx,
cufftType type,
int batch);

Original function returns only a cufftResult, whereas wrapper re-


turns also the plan.

EXAMPLE

fftType = cufftType;
I = sqrt(-1);
A = GPUsingle(rand(1,128)+I*rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);

[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);

171 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.37 cufftPlan2d
cufftPlan2d - Wrapper to CUFFT cufftPlan2d function

DESCRIPTION
Wrapper to CUFFT cufftPlan2d function. Original function decla-
ration:

cufftResult
cufftPlan2d(cufftHandle *plan,
int nx, int ny,
cufftType type);

EXAMPLE

fftType = cufftType;
I = sqrt(-1);
A = GPUsingle(rand(128,128)+I*rand(128,128));
plan = 0;
% Vectors stored in column major format (FORTRAN)
s = size(A);
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan2d(plan, s(2), s(1),type);
cufftCheckStatus(status, ’Error in cufftPlan2D’);

[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);

172 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.38 cufftPlan3d
cufftPlan3d - Wrapper to CUFFT cufftPlan3d function

DESCRIPTION
Wrapper to CUFFT cufftPlan3d function. Original function decla-
ration:

cufftResult
cufftPlan2d(cufftHandle *plan,
int nx, int ny, int nz,
cufftType type);

Original function returns only a cufftResult, whereas wrapper re-


turns also the plan.

6.4.39 cufftResult
cufftResult - Returns a structure with CUFFT result codes

DESCRIPTION
Returns a structure with CUFFT result codes

EXAMPLE

cufftResult

ans =

CUFFT_SUCCESS: 0
CUFFT_INVALID_PLAN: 1
CUFFT_ALLOC_FAILED: 2
CUFFT_INVALID_TYPE: 3
CUFFT_INVALID_VALUE: 4
...

173 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.40 cufftTransformDirections
cufftTransformDirections - Returns a structure with CUFFT
transform direction codes

DESCRIPTION
Returns a structure with CUFFT transform direction codes.

CUFFT_FORWARD = -1; Forward FFT


CUFFT_INVERSE = 1; Inverse FFT

EXAMPLE

cufftTransformDirections

ans =

CUFFT_FORWARD: -1
CUFFT_INVERSE: 1

6.4.41 cufftType
cufftType - Returns a structure with CUFFT transform type codes

DESCRIPTION
Returns a structure with CUFFT transform type codes.

EXAMPLE

cufftType

ans =

CUFFT_R2C: 42
CUFFT_C2R: 44
CUFFT_C2C: 41

174 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.42 cuInit
cuInit - Wrapper to CUDA driver function cuInit

DESCRIPTION
Wrapper to CUDA driver function cuInit.

6.4.43 cuMemGetInfo
cuMemGetInfo - Wrapper to CUDA driver function cuMemGet-
Info

DESCRIPTION
Wrapper to CUDA driver function cuMemGetInfo.

EXAMPLE

freemem = 0;
c = 0;
[status, freemem, c] = cuMemGetInfo(freemem,c);

175 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.44 getPtr
getPtr - Get GPUsingle pointer on GPU memory

SYNTAX

R = getPtr(X)
X - GPU variable
R - the pointer to the GPU memory region

DESCRIPTION
This is a low level function used to get the pointer value to the
GPU memory of a GPUsingle object

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
getPtr(A)

176 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.45 getSizeOf
getSizeOf - Get the size of the GPU datatype (similar to sizeof in
C)

SYNTAX

R = getSizeOf(X)
X - GPU variable
R - the size of the GPU variable datatype

DESCRIPTION
This is a low level function used to get the size of the datatype of
the GPU variable.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
getSizeOf(A)

177 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.46 getType
getType - Get the type of the GPU variable

SYNTAX

R = getType(X)
X - GPU variable
R - the type of the GPU variable

DESCRIPTION
This is a low level function used to get the type of the GPU variable
(FLOAT = 0, COMPLEX FLOAT = 1, DOUBLE = 2, COMPLEX
DOUBLE = 3)

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(10));
getType(A)

178 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.47 GPUallocVector
GPUallocVector - Variable allocation on GPU memory

SYNTAX

GPUallocVector(P)
P - GPUsingle

DESCRIPTION
P = GPUallocVector(P) allocates the required GPU memory for
P. The size of the allocated variable depends on the size of P.
A complex variable is allocated as an interleaved sequence of real
and imaginary values. It means that the memory size for a complex
on the GPU is numel(P)*2*SIZE OF FLOAT. It is mandatory to set
the size of the variable before calling GPUallocVector.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle();
setSize(A,[100 100]);
GPUallocVector(A);

A = GPUsingle();
setSize(A,[100 100]);
setComplex(A);
GPUallocVector(A);

179 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.48 GPUdeviceInit
GPUdeviceInit - Initializes a CUDA capable GPU device

SYNTAX

GPUdeviceInit(dev)
dev - device number

DESCRIPTION
GPUdeviceInit(dev) initializes the GPU device dev, where dev is
an integer corresponding to the device number. By using GPUinfo
it is possible to see the available devices and the corresponding
number

EXAMPLE

GPUinfo
GPUdeviceInit(0)

180 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.49 istrans
istrans - True if GPUsingle TRANS flag is set to 1

SYNTAX

R = istrans(X)
X - GPUsingle
R - logical (0 or 1)

DESCRIPTION
ISTRANS(X) returns 1 if the flag TRANS is set

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle(rand(5));
istrans(A)
B = transpose(A);
istrans(B)

181 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.50 packfC2C
packfC2C - Pack two arrays into an interleaved complex array

SYNTAX

PACKFC2C(RE_IDATA, IM_IDATA, ODATA)


RE_IDATA - GPUsingle, real part
IM_IDATA - GPUsingle, imaginary part
ODATA - GPUsingle, complex

DESCRIPTION
PACKFC2C(RE IDATA, IM IDATA, ODATA) pack the values of
RE IDATA and IM IDATA into ODATA as shown in the example. The
type of elements of ODATA is complex.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

re = GPUsingle(rand(1,100));
im = GPUsingle(rand(1,100));

r = GPUsingle();
setComplex(r);
setSize(r,size(re));
GPUallocVector(r);

packfC2C(re,im,r);

182 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.51 packfR2C
packfR2C - Transforms a real array into a complex array with zero
complex elements.

SYNTAX

PACKFR2C(RE_IDATA, ODATA)
RE_IDATA - GPUsingle, real part
ODATA - GPUsingle, complex

DESCRIPTION
PACKFR2C(RE IDATA, ODATA) transforms RE IDATA into a the com-
plex array ODATA. The type of elements of ODATA is complex. The
complex part of ODATA is set to zero.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

re = GPUsingle(rand(1,100));
r = GPUsingle();
setComplex(r);
setSize(r,size(re));
GPUallocVector(r);

packfR2C(re,r);

183 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.52 setComplex
setComplex - Set a GPUsingle as complex

SYNTAX

setComplex(A)
A - GPUsingle

DESCRIPTION
setComplex(P) set the GPUsingle P as complex. Should be called
before using GPUallocVector.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle();
setSize(A,[10 10]);
setComplex(A);
GPUallocVector(A);

184 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.53 setReal
setReal - Set a GPUsingle as real

SYNTAX

setReal(A)
A - GPUsingle

DESCRIPTION
setReal(P) set the GPUsingle P as real. Should be called before
using GPUallocVector.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle();
setSize(A,[10 10]);
setReal(A);
GPUallocVector(A);

185 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.54 setSize
setSize - Set GPUsingle size

SYNTAX

setSize(A,SIZE)
A - GPUsingle

DESCRIPTION
setSize(R, SIZE) set the size of R to SIZE

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

A = GPUsingle();
setSize(A,[10 10]);

186 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.55 unpackfC2C
unpackfC2C - Unpack one complex array into two single precision
arrays

SYNTAX

UNPACKFC2C(IDATA, RE_ODATA, IM_ODATA)

DESCRIPTION
UNPACKFC2C(IDATA, RE ODATA, IM ODATA) unpack the values of
IDATA into two arrays RE ODATA and IM ODATA as shown in the
example. The type of elements of IDATA is complex.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

r = GPUsingle(rand(1,100)+i*rand(1,100));
re = GPUsingle();
setReal(re);
setSize(re,size(r));
GPUallocVector(re);
im = GPUsingle();
setReal(im);
setSize(im,size(r));
GPUallocVector(im);
unpackfC2C(r,re,im);

187 GPUmat Guide Version 0.1. Copyright gp-you.ch.


CHAPTER 6. Function Reference
6.4. LOW LEVEL FUNCTIONS - ALPHABETICAL LIST

6.4.56 unpackfC2R
unpackfC2R - Transforms a complex array into a real array dis-
carding the complex part

SYNTAX

UNPACKFC2C(IDATA, RE_ODATA)

DESCRIPTION
UNPACKFC2C(IDATA, RE ODATA) transforms the complex array
IDATA into the array RE ODATA discarding the imaginary part. The
type of elements of IDATA is complex.

Complex type supported since version 0.1


Real type supported since version 0.1

EXAMPLE

r = GPUsingle(rand(1,100)+i*rand(1,100));
re = GPUsingle();
setReal(re);
setSize(re,size(r));
GPUallocVector(re);
unpackfC2R(r,re);

188 GPUmat Guide Version 0.1. Copyright gp-you.ch.


Bibliography

[1] NVIDIA Cuda Programming Guide. NVIDIA Corporation.

[2] Cuda. http://www.nvidia.com/object/cuda_home.html#.

[3] Gpgpu. http://www.gpgpu.org.

[4] CUDA CUBLAS Library. NVIDIA Corporation.

189

You might also like