Professional Documents
Culture Documents
Contents 2
1 Introduction 8
1.1 About GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 System requirements . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Credits and licensing . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 How to install . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Documentation overview . . . . . . . . . . . . . . . . . . . . . . 10
2 Quick start 11
2.1 Matrix addition example . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Matrix multiplication example . . . . . . . . . . . . . . . . . . . 15
2.3 FFT calculation example . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Performance analisys . . . . . . . . . . . . . . . . . . . . . . . . 16
3 GPUmat overview 19
3.1 Starting the GPU environment . . . . . . . . . . . . . . . . . . . 20
3.2 Creating a GPU variable . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Performing calculations on the GPU . . . . . . . . . . . . . . . . 24
3.4 Porting existing Matlab code . . . . . . . . . . . . . . . . . . . 25
3.5 Converting a GPU variable into a Matlab variable . . . . . . . . 26
3.6 GPUmat functions . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 GPU memory management . . . . . . . . . . . . . . . . . . . . 28
3.8 Coding guidelines . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8.1 Memory transfers . . . . . . . . . . . . . . . . . . . . . 30
3.8.2 Vectorized code and for-loops . . . . . . . . . . . . . . . 30
3.8.3 Matlab and GPUsingle variables . . . . . . . . . . . . . . 32
3.9 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . 33
2
CONTENTS
CONTENTS
4 Developer’s section 34
4.1 The GPUsingle class . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 GPUsingle constructor . . . . . . . . . . . . . . . . . . 36
4.1.2 GPUsingle properties . . . . . . . . . . . . . . . . . . . 38
4.1.3 GPUsingle methods . . . . . . . . . . . . . . . . . . . . 40
4.2 Low level GPU memory management . . . . . . . . . . . . . . . 41
4.2.1 Memory management using the GPUsingle class . . . . . 41
4.2.2 Memory management using low level functions . . . . . . 41
4.3 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 CUBLAS functions . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 CUFFT functions . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Function Reference 51
6.1 Functions - by category . . . . . . . . . . . . . . . . . . . . . . 51
6.1.1 GPU startup and management . . . . . . . . . . . . . . 51
6.1.2 GPU variables management . . . . . . . . . . . . . . . . 51
6.1.3 GPU memory management . . . . . . . . . . . . . . . . 52
6.1.4 Numerical functions . . . . . . . . . . . . . . . . . . . . 52
6.1.5 General information . . . . . . . . . . . . . . . . . . . . 53
6.1.6 Complex numbers . . . . . . . . . . . . . . . . . . . . . 54
6.1.7 CUBLAS functions . . . . . . . . . . . . . . . . . . . . . 54
6.1.8 CUDA Driver functions . . . . . . . . . . . . . . . . . . 55
6.1.9 CUFFT functions . . . . . . . . . . . . . . . . . . . . . 55
6.1.10 CUDA run-time functions . . . . . . . . . . . . . . . . . 56
6.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2.1 A & B . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.2 A’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2.3 A == B . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.4 A >= B . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2.5 A > B . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.6 A <= B . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2.7 A < B . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.8 A - B . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.9 A / B . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.10 A * B . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.11 A ~= B . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.12 ~A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.13 A | B . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.14 A + B . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2.15 A . ^B . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.16 A ./ B . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.17 A(I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.18 A .* B . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.19 A .’ . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.20 [A;B] . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 High level functions - alphabetical list . . . . . . . . . . . . . . . 78
6.3.1 abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.2 acos . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.3 acosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.4 and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3.5 asin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.6 asinh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3.7 atan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.8 atanh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3.9 ceil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.10 colon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.11 conj . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.12 cos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3.13 cosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.14 ctranspose . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.15 display . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.16 double . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.17 eq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.18 exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.3.19 fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3.20 fft2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.21 floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3.22 ge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3.23 GPUinfo . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.24 GPUmem . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.25 GPUsingle . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.26 GPUstart . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.27 GPUsync . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.28 gt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3.29 ifft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.30 ifft2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.31 iscomplex . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Bibliography 188
• Existing Matlab code can be ported and executed on GPUs with few
modifications.
• GPU resources are accessed using Matlab scripting language. The fast
code protyping capability of the scripting language is combined with
the fast code execution on the GPU.
8
CHAPTER 1. Introduction
1.2. SYSTEM REQUIREMENTS
One of the most promising GPGPU technologies is called CUDA SDK [1],
developed by NVIDIA. For further information about CUDA, GPGPU and
related topics please check [2] [3].
• STEP3 (optional but suggested): add the library path to the Matlab
path by using the "File->Set Path" menu. The Matlab documenta-
tion describes how to add a new path. This step is not mandatory if the
GPUstart command is started from the directory where the library
was unpacked.
The GPUstart command should generate the following output in your
Matlab command window:
>> GPUstart
Starting GPU
- CUDA compute capability x.x
...done
If you get the following error, then GPUstart command was not found in
the Matlab path. Try again the installation steps from STEP1 to STEP3.
>> GPUstart
??? Undefined function or variable ’GPUstart’.
Starting GPU
??? Invalid MEX-file
...
The specified module could not be found.
1.5 Terminology
The following is a summary of common terms and concepts used in this
manual:
The first two chapters contains enough information to understand the basic
concepts of the library and are intended for users with at least some ex-
perience with Matlab. Chapter 4 is intended for users familiar with GPU
programming concepts, in particular with the CUDA SDK. The Function
reference can be found in Chapter 6.
In the above code the function single in used to create the single precision
Matlab array Ah, and similarly the GPUsingle function is used to create a
single precision GPU variable. If a double precision Matlab array is used to
initialize a GPUsingle variable, it is converted to a single precision variable
resulting in a loss of precision:
12
CHAPTER 2. Quick start
During the initialization of the GPU variable A, the data in the Matlab array
Ah is copied from the CPU memory to the GPU memory. The data transfer
is transparent to the user.
There are several ways to create a GPUsingle, as explained in Section 3.2.
The command
results in
A =
0 2 4 6
Using the colon function to create a vector with arbitrary real increments
between the elements,
results in
A =
0 0.1000 0.2000 0.3000 0.4000 0.5000
In the following example, the function single is used to convert the GPU
variable C into the Matlab variable Ch. Every time a GPU variable is con-
verted into a Matlab variable, the data is copied from GPU memory to CPU
memory.
• The creation of the GPU variable A, initialized with Matlab array Ah.
A = GPUsingle(rand(5));
ans =
Please note the difference between the original code and the modified code.
Every Matlab variable has been converted to the GPUsingle class: "A =
rand(100)" becomes "A = GPUsingle(rand(100))".
Any operation on GPUsingle variables generates a GPUsingle, i.e. C
(in the modified code) is also a GPUsingle. Functions involving GPUsingle
variables, like A + B in the above example, are executed on the GPU. To
convert the GPU variables A, B and C into the Matlab variables Ah, Bh and
Ch use the function single, as follows:
The following code shows a different way to initialize the arrays A and B by
using the colon function. The original Matlab code is the following:
A = single(colon(0,1,1000));
is equivalent to
A = single([0:1:1000]);
and creates a vector with single precision elements having values from 0 to
1000.
Element-by-element operations, such as the the matrix addition A + B,
are highly optimized for the GPU. It is suggested to use this kind of opera-
tions as explained in Section 3.8.
• Calculate 1D FFT of A.
• Calculate 2D FFT of B.
A = GPUsingle(rand(1,100)); % GPU
B = GPUsingle(rand(100,100)); % GPU
%% 1D FFT
FFT_A = fft(A); % executed on GPU
%% 2D FFT
FFT_B = fft2(B); % executed on GPU
The equivalent code that executes above operations entirely on the CPU is
the following:
A = single(rand(1,100)); % CPU
B = single(rand(100,100)); % CPU
%% 1D FFT
FFT_A = fft(A); % executed on CPU
%% 2D FFT
FFT_B = fft2(B); % executed on CPU
A = rand(1000,1000); % A is on CPU
B = rand(1000,1000); % B is on CPU
tic;A.*B;toc; % executed on CPU
The GPU code performance can be evaluated in a similar way by using tic,
toc and the GPUsync command, as follows:
A = GPUsingle(rand(1000,1000));
B = GPUsingle(rand(1000,1000));
tic;A.*B;GPUsync;toc;
The following example shows a simple Matlab script to compare the ex-
ecution time of the element-by-element multiplication between two matrices
A and B on the GPU and on the CPU.
N = 100:100:4000;
timecpu = zeros(1,length(N));
timegpu = zeros(1,length(N));
index=1;
for i=N
Ah = single(rand(i)); % CPU
A = GPUsingle(Ah); % GPU
%% Execution on GPU
tic;
A.*A;
GPUsync;
timegpu(index) = toc;
%% Execution on CPU
tic;
Ah.*Ah;
timecpu(index) = toc;
% increase index
index = index +1;
end
The above code calculates the two vectors timecpu and timegpu that can be
used to evaluate the speed-up between the GPU and the CPU as follows:
speedup = timecpu./timegpu
GPUmat functions are grouped into high level and low level functions. High
level functions can be used in a similar way as existing Matlab functions, while
to use low level functions the user needs some experience in GPU program-
ming. For example, low level functions can directly manage GPU memory,
which is automatically handled on high level functions. Low level functions
can also directly access CUDA libraries such as CUBLAS and CUFFT. The
detailed list of high level and low level functions can be found in Chapter 6.
GPUmat can be used in the following ways:
• As any other Matlab toolbox by using high level functions. This is the
easiest way to use GPUmat.
• As a GPU Source Development Kit, in order to integrate functions
that are not available in the library, by using both low and high level
functions.
This chapter describes how to use the GPUmat high level functions. Users
can find further information about low level functions in Chapter 4. The full
function reference is in Chapter 6. This chapter describes the following topics:
• Starting the GPU environment
• Creating a GPU variable
• Performing calculations on the GPU
• Converting a GPU variable into a Matlab variable
• GPUmat functions
• GPU memory management
• Compatibility between Matlab and GPUmat
• GPUmat code performance
20
CHAPTER 3. GPUmat overview
3.1. STARTING THE GPU ENVIRONMENT
Name Description
GPUstart Starts GPU environment and loads the
required library components
GPUinfo Prints information about available
CUDA capable GPUs
GPUdeviceInit Initializes a CUDA capable GPU de-
vice
Table 3.1 shows functions used to start GPUmat and to manage the GPU.
The GPUstart command is used to start GPUmat. The system might have
more than one GPU installed. By default GPUstart selects the first available
GPU device. The command GPUinfo prints information about installed
GPUs:
GPUinfo
Found 1 devices
* Device N. 0
Compute capability is 1.1
Total memory is 255.6875MB
Mult. processors = 2
that a memory transfer between GPU and CPU is required if the GPU vari-
able is initialized with a Matlab array. A memory transfer is a time consuming
task and might reduce the performance of the code.
Function Description
A = GPUsingle(Ah) Creates a GPU array A initial-
ized with the Matlab array Ah.
Requires GPU-CPU memory
transfer.
A = zeros(size, GPUsingle) Creates a GPU array initialized
with zeros.
A = ones(size, GPUsingle) Creates a GPU array initialized
with ones.
A = colon(begin, spacing, A = colon(begin, spacing,
end, GPUsingle) end, GPUsingle) creates a regu-
larly spaced GPU vector A with
values in the range [begin:end].
C = vertcat(A,B) or C = [A;B] Vertical concatenation. Can be
applied to more than 2 GPU vec-
tors.
A = GPUsingle(Ah)
Creates a GPU single precision variable A initialized with the
Matlab array Ah. A has the same properties as Ah, such as
the size and the number of elements. Requires GPU-CPU
memory transfer.
Example:
If the GPU variable is initialized with a double precision Matlab array Ah,
there will be a loss of precision in the conversion between double and single
precision.
The syntax to create a Matlab variable is very similar to the above code:
Existing variables can be efficiently used also to create others. The follow-
ing example shows how to create a complex GPU variable using the colon
function:
>> A
ans =
0 2 4 6
Using the function colon is a very efficient way to create a GPU variable
because array values are directly created on the GPU memory without any
A = zeros(size, GPUsingle)
Has the same behavior as Matlab zeros function. Creates a GPU
array with zeros.
Example:
A = ones(size, GPUsingle)
Has the same behavior as Matlab ones function. Creates a GPU
array with ones.
Example:
A = GPUsingle(rand(10)); % A is on GPU
B = exp(A) % exp calculated on GPU
The exp function in the above code that is executed by Matlab is the one
implemented in GPUmat and not the built-in function. Matlab uses the
GPUmat function because the argument of the exp is a GPUsingle type.
The following example shows similar code executed on CPU:
A = single(rand(10)); % A is on CPU
Ah = [0:10:1000]; % Ah is on CPU
A = GPUsingle(Ah); % A is on GPU
The above code can be written more efficiently using the colon function, as
follows:
Name Description
a + b Binary addition
a - b Binary subtraction
-a Unary minus
a.*b Element-wise multiplication
a*b Matrix multiplication
a./b Right element-wise division
a./ b Left element-wise division
a.^b Element-wise power
a < b Less than
a > b Greater than
a <= b Less than or equal to
a >= b Greater than or equal to
a ~= b Not equal to
a == b Equality
a & b Logical AND
a | b Logical OR
~a Logical NOT
a’ Complex conjugate trans-
pose
a.’ Matrix transpose
Table 3.8: Operators defined for GPUsingle type
A = colon(0,10,1000,GPUsingle); % A is on GPU
A = GPUsingle(rand(5)); % A is on GPU
ans =
Every time the content of a GPUsingle is read in Matlab, the system performs
a memory transfer from the GPU to the CPU. The same happens when
a GPUsingle is created and initialized using a Matlab array. Because of
the limited memory bandwidth between the HOST and the GPU, the data
transfer between CPU and GPU may be time consuming and therefore its
usage should be limited.
Name Description
clear Matlab built-in command, removes the
specified variables
GPUmem Returns available GPU memory in
bytes
The following code shows a typical situation where the GPU memory is
not enough, and some variables must be manually removed:
A = GPUsingle(rand(6000,3000)); % A is on GPU
B = GPUsingle(rand(6000,3000)); % B is on GPU
C = GPUsingle(rand(6000,3000)); % C is on GPU
Device memory allocation error.
Available memory is 65274 KB, required 70312 KB
clear A
• Cleans up the GPU variable and displays once more the available GPU
memory.
A very useful Matlab command is the whos, which can be used to check
how many GPUsingle variables are on the Matlab workspace. The following
Matlab output shows the result of the whos command and the presence of
a GPUsingle A on the Matlab workspace:
>> whos
Name Size Bytes Class Attributes
In the above code, the variable Ah is used to initialize the GPU variable A,
which means that data is transferred from the CPU to the GPU memory.
Vice versa, when a GPU variable is converted into a Matlab variable there is
a memory transfer from the GPU to the CPU:
for i=1:1e6
A = rand(3,3);
B = rand(3,3);
C = A.*B;
%% do something with C
end
The above code can be executed as-is on the GPU by converting A and B
to GPUsingle, as follows:
for i=1:1e6
A = GPUsingle(rand(3,3));
B = GPUsingle(rand(3,3));
C = A.*B;
%% do something with C
end
A = GPUsingle(rand(3,3e6)); % A is on GPU
B = GPUsingle(rand(3,3e6)); % B is on GPU
C = A.*B; % C is on GPU
A = rand(100);
B = rand(100);
C = zeros(100);
for i=1:size(A,1)
for j=1:size(B,2)
C(i,j) = A(i,j) + B(i,j);
end
end
A = GPUsingle(rand(100)); % A is on GPU
B = GPUsingle(rand(100)); % B is on GPU
C = A + B; % C is on GPU
Ah = rand(5); % Ah is on CPU
A = GPUsingle(rand(5));% A is on GPU
Bh = 1; % Bh is on CPU
Ah + A
Unknown operation + between ’double’ and ’GPUsingle’
A + Bh
ans =
Ah = rand(5);
A = GPUsingle(rand(5));
A = rand(1000,1000); % A is on CPU
B = rand(1000,1000); % B is on CPU
tic;A.*B;toc; % executed on CPU
The GPU code performance can be evaluated in a similar way by using tic,
toc and the GPUsync command, as follows:
A = GPUsingle(rand(1000,1000));
B = GPUsingle(rand(1000,1000));
tic;A.*B;GPUsync;toc;
A = GPUsingle(rand(1000,1000));
B = GPUsingle(rand(1000,1000));
tic;A.*B;GPUsync;toc;
Elapsed time is 0.010231 seconds.
tic;A.*B;toc;
Elapsed time is 0.003808 seconds.
This chapter explains how to use GPUmat low level functions. Low level
functions can be used for the following purpose:
The GPUsingle class implements a destructor, which frees the GPU memory
that is not used anymore. The life-time of a GPUsingle is the same as any
other Matlab variable. In the following example, the second assignment to
A automatically deletes the previously created variable and frees the corre-
sponding GPU memory occupied by an array with size=100x100:
A = GPUsingle(rand(100));
A = GPUsingle(rand(10));
35
CHAPTER 4. Developer’s section
4.1. THE GPUSINGLE CLASS
A = GPUsingle([1 2; 3 4]);
% Ah should be single precision, because
% A is single precision
Ah = single(zeros(1,numel(A)));
[status Ah] = cublasGetVector (numel(A), ...
getSizeOf(A), getPtr(A), 1, Ah, 1);
cublasCheckStatus( status, ...
’Unable to retrieve variable values from GPU.’);
Ah
ans =
1 3 2 4
In the result Ah the data is stored using column-major storage, the same
format as Matlab and Fortran. Complex numbers are stored interleaving
in memory imaginary and real part values, as explained in section 4.3. In
the above example we use the CUBLAS function ([4]) cublasGetVector to
transfer the data from the GPU to the CPU memory. The function numel is
used to get the number of elements in A. The function getSizeOf returns the
size of a single element of A. Finally the function getPtr returns the pointer
to the GPU memory.
Constructor details
A = GPUsingle(Ah)
Creates a GPU variable A initialized with the Matlab array Ah.
A has the same properties as Ah, such as the size and the number
of elements.
Example:
A = GPUsingle()
Creates an empty GPU variable. GPU memory is not automat-
ically allocated and the following steps must be performed to
allocate the memory:
Example:
Property details
GPUPTR
GPUPTR is the pointer to the GPU memory region.
The pointer is indirectly set by using GPUallocVector.
Its value can be retrieved by using the getPtr function.
Example:
N = 10;
A = GPUsingle(rand(1,N));
Isamin = cublasIsamin(N, getPtr(A), 1);
COMPLEX
COMPLEX is a flag and defines a complex GPUsingle.
Check section 4.3 for further information about complex
numbers representation. It is set using setComplex and
reset using setReal. Use iscomplex to check its value.
The flag must be set using setComplex before allocating
the variable memory using GPUallocVector. The flag
has no effect if set after calling GPUallocVector.
Example:
A = GPUsingle(rand(5));
iscomplex(A)
A = GPUsingle(rand(5)+i*rand(5));
iscomplex(A)
TRANS
TRANS is an internal flag. Use the function istrans to
check whether the flag is set or not. The flag is set to 1
to identify a matrix that is virtually transposed, which
means that values are not exchanged in memory. For
some operations, such as matrix-matrix multiplication
with CUBLAS functions, there is no need to effectively
transpose the matrix in memory (which is time consum-
ing). The high level function transpose(A) sets the
flag to 1, whereas the function transpose(A,1) is used
to effectively perform necessary memory operations to
transpose array elements. High level functions treat cor-
rectly a GPUsingle with TRANS set to 1, but low level
functions do not. The following example shows how the
values of the array A are stored in memory. Please note
that data is stored in column-major format.
Example:
A = GPUsingle([1 2; 3 4]);
A = transpose(A) % A = A.’ is the same
Ah = single(zeros(1,numel(A)));
[status Ah]= cublasGetVector (numel(A), ...
getSizeOf(A), getPtr(A), ...
1, Ah, 1);
cublasCheckStatus( status, ’Memory error.’);
Ah
ans =
1 3
2 4
ans =
1 3 2 4
SIZE
SIZE stores the variable size. The functions to modify
it and to get its value are setSize and size respectively.
The SIZE must be defined before using GPUallocVector.
Modifying the SIZE on initialized variables has no effect
on memory values.
Example:
A = GPUsingle();
setSize(A,[100 100]);
GPUallocVector(A);
size(A)
• If the user creates a Matlab pointer to the GPU memory using low level
functions, the memory is not automatically cleaned when the variable
is not used anymore. In this case the user must manually clean the
GPU memory.
A = GPUsingle(rand(100,100));
clear A;
status = cublasFree(GPUptr);
cublasCheckStatus( status, ’!!!! memory free error (GPUptr)’);
C = GPUsingle(zeros(1,5));
unpackfC2C(A, B, C);
single(B)
single(C)
unpackfC2R(A, B);
single(B)
single(C)
function simpleCUBLAS
% This is the GPUmat translation of the code in the
% CUDA SDK projects called with the same name (simpleCUBLAS).
% The example shows how to access CUBLAS functions from GPUmat
SIZEOF_FLOAT = sizeoffloat();
h_A = single(rand(N));
h_B = single(rand(N));
h_C = single(rand(N));
d_C = GPUsingle(h_C);
% Execute on GPU
cublasSgemm(’n’,’n’, N, N, N, alpha, getPtr(d_A), ...
N, getPtr(d_B), N, 0.0, getPtr(d_C), N);
status = cublasGetError();
cublasCheckStatus( status, ’!!!! kernel execution error.’);
end
A = GPUsingle(rand(1,100));
r = cublasIsamax(numel(A),getPtr(A),1)
int
cublasIsamax (int n, const float *x, int incx)
N = 10;
I = sqrt(-1);
A = GPUsingle(rand(N,N) + I*rand(N,N));
B = GPUsingle(rand(N,N) + I*rand(N,N));
% alpha is complex
alpha = 2.0+I*3.0;
beta = 0.0;
opA = ’n’;
opB = ’n’;
status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);
Complex numbers are stored interleaving real and imaginary values on the
GPU (see section 4.3), the same format expected by the cublasCgemm func-
tion and other CUFFT functions. For a complete description of CUBLAS
functions check the CUDA CUBLAS manual. For a complete list of imple-
mented wrappers check the functions reference section (Chapter 6).
%% CUFFT example
h_A = single(rand(1,N)+i*rand(1,N));
d_A = GPUsingle(h_A);
d_B = GPUsingle(h_A);
fftType = cufftType;
fftDir = cufftTransformDirections;
% FFT plan
plan = 0;
[status, plan] = cufftPlan1d(plan, numel(d_A), ...
fftType.CUFFT_C2C, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);
[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);
A = GPUsingle(rand(2)); % GPU
Ah = rand(2); % CPU
A + Ah;
??? Error using ==> ...
Unknown operation + between ’GPUsingle’ and ’double’
Performing the same operation with a Matlab scalar does not generate any
error:
A = GPUsingle(rand(2)); % GPU
Ah = 1+2*i; % complex
A + Ah
ans =
49
CHAPTER 5. Frequently Asked Questions
5.2. IS ANY MATLAB FUNCTION EXECUTED ON GPU BY USING
GPUSINGLE?
A = GPUsingle(rand(2)); % GPU
Ah = rand(2); % CPU
A + GPUsingle(Ah)
ans =
1.0230 1.1167
1.2689 1.3425
The main concept is that CPU and GPU variables are stored on different
memory regions, and if we want to do operations on both we have first to
transfer the CPU variable to the GPU or the other way around. Matlab
scalars are an exception, but the same doesn’t work with GPU scalars which
cannot be added directly to Matlab variables.
%
A = GPUsingle([1 2; 3 4]);
trapz(A)
??? Error using ==> ...
...
• Matrix-matrix multiplications
Name Description
GPUdeviceInit Initializes a CUDA capable GPU device
GPUinfo Prints information about the GPU device
GPUstart Starts the GPU environment and loads re-
quired components
Name Description
colon Colon
double Converts a GPU single precision variable into
a Matlab double precision variable
GPUsingle GPUsingle constructor
GPUsync Wait until all GPU operations are completed
ones GPU single precision ones array
setComplex Set a GPUsingle as complex
setReal Set a GPUsingle as real
setSize Set GPUsingle size
single Converts a GPU variable into a Matlab single
precision variable
zeros GPU single precision zeros array
52
CHAPTER 6. Function Reference
6.1. FUNCTIONS - BY CATEGORY
Name Description
GPUallocVector Variable allocation on GPU memory
GPUmem Returns the free memory (bytes) on selected
GPU device
Name Description
abs Absolute value
acos Inverse cosine
acosh Inverse hyperbolic cosine
and Logical AND
asin Inverse sine
asinh Inverse hyperbolic sine
atan Inverse tangent, result in radians
atanh Inverse hyperbolic tangent
ceil Round towards plus infinity
conj CONJ(X) is the complex conjugate of X
cos Cosine of argument in radians
cosh Hyperbolic cosine
ctranspose Complex conjugate transpose
eq Equal
exp Exponential
fft Discrete Fourier transform
fft2 Two-dimensional discrete Fourier Transform
floor Round towards minus infinity
ge Greater than or equal
gt Greater than
ifft Inverse discrete Fourier transform
ifft2 Two-dimensional inverse discrete Fourier
transform
ldivide Left array divide
le Less than or equal
Name Description
display Display GPU variable
getPtr Get GPUsingle pointer on GPU memory
getSizeOf Get the size of the GPU datatype (similar to
sizeof in C)
getType Get the type of the GPU variable
Name Description
packfC2C Pack two arrays into an interleaved complex
array
packfR2C Transforms a real array into a complex array
with zero complex elements.
unpackfC2C Unpack one complex array into two single
precision arrays
unpackfC2R Transforms a complex array into a real array
discarding the complex part
Name Description
cublasAlloc Wrapper to CUBLAS cublasAlloc function
Name Description
cuCheckStatus Check the CUDA DRV status.
cuInit Wrapper to CUDA driver function cuInit
cuMemGetInfo Wrapper to CUDA driver function
cuMemGetInfo
Name Description
cufftCheckStatus Checks the CUFFT status
Name Description
cudaCheckStatus Check the CUDA run-time status
cudaGetDeviceCount Wrapper to CUDA cudaGetDe-
viceCount function.
cudaGetDeviceMajorMinor Returns CUDA compute capabil-
ity major and minor numbers.
cudaGetDeviceMemory Returns device total memory
cudaGetDeviceMultProcCount Returns device multi-processors
count
cudaGetLastError Wrapper to CUDA cudaGet-
LastError function
cudaSetDevice Wrapper to CUDA cudaSetDe-
vice function
cudaThreadSynchronize Wrapper to CUDA cud-
aThreadSynchronize function.
6.2 Operators
Operators are used in mathematical expression such as A + B. GPUmat over-
loads Matlab operators for the GPUsingle class.
Name Description
a + b Binary addition
a - b Binary subtraction
-a Unary minus
a.*b Element-wise multiplication
a*b Matrix multiplication
a./b Right element-wise division
a./ b Left element-wise division
a.^b Element-wise power
a < b Less than
a > b Greater than
a <= b Less than or equal to
a >= b Greater than or equal to
a ~= b Not equal to
a == b Equality
a & b Logical AND
a | b Logical OR
~a Logical NOT
a’ Complex conjugate trans-
pose
a.’ Matrix transpose
6.2.1 A & B
and - Logical AND
SYNTAX
R = A & B
R = and(A,B)
A - GPUsingle
B - GPUsingle
R - GPUsingle
DESCRIPTION
A & B performs a logical AND of arrays A and B and returns an
array containing elements set to either logical 1 (TRUE) or logical
0 (FALSE).
EXAMPLE
A = GPUsingle([1 3 0 4]);
B = GPUsingle([0 1 10 2]);
R = A & B;
single(R)
6.2.2 A’
ctranspose - Complex conjugate transpose
SYNTAX
R = X’
R = ctranspose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X’ is the complex conjugate transpose of X.
EXAMPLE
X = GPUsingle(rand(10)+i*rand(10));
R = A’
R = ctranspose(X)
6.2.3 A == B
eq - Equal
SYNTAX
R = X == Y
R = eq(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A == B (eq(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A == B;
single(R)
R = eq(A, B);
single(R)
6.2.4 A >= B
ge - Greater than or equal
SYNTAX
R = X >= Y
R = ge(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A >= B (ge(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A >= B;
single(R)
R = ge(A, B);
single(R)
6.2.5 A > B
gt - Greater than
SYNTAX
R = X > Y
R = gt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A > B (gt(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A > B;
single(R)
R = gt(A, B);
single(R)
6.2.6 A <= B
le - Less than or equal
SYNTAX
R = X <= Y
R = le(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A <= B (le(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A <= B;
single(R)
R = le(A, B);
single(R)
6.2.7 A < B
lt - Less than
SYNTAX
R = X < Y
R = lt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A < B (lt(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A < B;
single(R)
R = lt(A, B);
single(R)
6.2.8 A - B
minus - Minus
SYNTAX
R = X - Y
R = minus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X - Y subtracts matrix Y from X. X and Y must have the same
dimensions unless one is a scalar. A scalar can be subtracted from
anything.
EXAMPLE
X = GPUsingle(rand(10));
Y = GPUsingle(rand(10));
R = Y - X
6.2.9 A / B
mrdivide - Slash or right matrix divide
SYNTAX
R = X / Y
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Slash or right matrix divide.
EXAMPLE
A = GPUsingle(rand(10));
B = A / 5
MATLAB COMPATIBILITY
Supported only A / n where n is scalar.
6.2.10 A * B
mtimes - Matrix multiply
SYNTAX
R = X * Y
R = mtimes(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
* (mtimes(X, Y)) is the matrix product of X and Y.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A * B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A * B
6.2.11 A ~= B
ne - Not equal
SYNTAX
R = X ~= Y
R = ne(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A ~= B (ne(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A ~= B;
single(R)
R = ne(A, B);
single(R)
6.2.12 ~A
not - Logical NOT
SYNTAX
R = ~X
X - GPUsingle
R - GPUsingle
DESCRIPTION
~A (not(A)) performs a logical NOT of input array A.
EXAMPLE
A = GPUsingle([1 2 0 4]);
R = ~A;
single(R)
6.2.13 A | B
or - Logical OR
SYNTAX
R = X | Y
R = or(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A | B (or(A, B)) performs a logical OR of arrays A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A | B;
single(R)
R = or(A, B);
single(R)
6.2.14 A + B
plus - Plus
SYNTAX
R = X + Y
R = plus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X + Y (plus(X, Y)) adds matrices X and Y. X and Y must have
the same dimensions unless one is a scalar (a 1-by-1 matrix). A
scalar can be added to anything.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A + B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A + B
6.2.15 A . ^B
power - Array power
SYNTAX
R = X .^ Y
R = power(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Z = X.^Y denotes element-by-element powers.
EXAMPLE
A = GPUsingle(rand(10));
B = 2;
R = A .^ B
A = GPUsingle(rand(10)+i*rand(10));
R = A .^ B
MATLAB COMPATIBILITY
Implemented for REAL exponents only.
6.2.16 A ./ B
rdivide - Right array divide
SYNTAX
R = X ./ Y
R = rdivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A./B denotes element-by-element division. A and B must have the
same dimensions unless one is a scalar. A scalar can be divided
with anything.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A ./ B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A ./ B
6.2.17 A(I)
subsref - Subscripted reference
SYNTAX
R = X(I)
X - GPUsingle
I - GPUsingle
R - GPUsingle
DESCRIPTION
A(I) (subsref) is an array formed from the elements of A specified
by the subscript vector I. The resulting array is the same size as
I except for the special case where A and I are both vectors. In
this case, A(I) has the same number of elements as I but has the
orientation of A.
EXAMPLE
A = GPUsingle([1 2 3 4 5]);
idx = GPUsingle([1 2]);
B = A(idx)
6.2.18 A .* B
times - Array multiply
SYNTAX
R = X .* Y
R = times(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X.*Y denotes element-by-element multiplication. X and Y must
have the same dimensions unless one is a scalar. A scalar can be
multiplied into anything.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A .* B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A .* B
6.2.19 A .’
transpose - Transpose
SYNTAX
R = X.’
R = transpose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X.’ or transpose(X) is the non-conjugate transpose.
EXAMPLE
X = GPUsingle(rand(10));
R = X.’
R = transpose(X)
6.2.20 [A;B]
vertcat - Vertical concatenation
SYNTAX
R = [X;Y]
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
[A;B] is the vertical concatenation of matrices A and B. A and B
must have the same number of columns. Any number of matrices
can be concatenated within one pair of brackets.
EXAMPLE
A = [zeros(10,1,GPUsingle);colon(0,1,10,GPUsingle)’];
SYNTAX
R = abs(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ABS(X) is the absolute value of the elements of X. When X is com-
plex, ABS(X) is the complex modulus (magnitude) of the elements
of X.
EXAMPLE
X = GPUsingle(rand(1,5)+i*rand(1,5));
R = abs(X)
6.3.2 acos
acos - Inverse cosine
SYNTAX
R = acos(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ACOS(X) is the arccosine of the elements of X. NaN (Not A Number)
results are obtained if ABS(x) > 1.0 for some element.
EXAMPLE
X = GPUsingle(rand(10));
R = acos(X)
MATLAB COMPATIBILITY
NaN returned if ABS(x) > 1.0 . In this case Matlab returns a
complex number. Not implemented for complex X.
6.3.3 acosh
acosh - Inverse hyperbolic cosine
SYNTAX
R = acosh(X)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
ACOSH(X) is the inverse hyperbolic cosine of the elements of X.
EXAMPLE
X = GPUsingle(rand(10)) + 1;
R = acosh(X)
MATLAB COMPATIBILITY
NaN is returned if X<1.0 . Not implemented for complex X.
6.3.4 and
and - Logical AND
SYNTAX
R = A & B
R = and(A,B)
A - GPUsingle
B - GPUsingle
R - GPUsingle
DESCRIPTION
A & B performs a logical AND of arrays A and B and returns an
array containing elements set to either logical 1 (TRUE) or logical
0 (FALSE).
EXAMPLE
A = GPUsingle([1 3 0 4]);
B = GPUsingle([0 1 10 2]);
R = A & B;
single(R)
6.3.5 asin
asin - Inverse sine
SYNTAX
R = asin(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ASIN(X) is the arcsine of the elements of X. NaN (Not A Number)
results are obtained if ABS(x) > 1.0 for some element.
EXAMPLE
X = GPUsingle(rand(10));
R = asin(X)
MATLAB COMPATIBILITY
NaN returned if ABS(x) > 1.0 . In this case Matlab returns a
complex number. Not implemented for complex X.
6.3.6 asinh
asinh - Inverse hyperbolic sine
SYNTAX
R = asinh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ASINH(X) is the inverse hyperbolic sine of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = asinh(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.7 atan
atan - Inverse tangent, result in radians
SYNTAX
R = atan(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ATAN(X) is the arctangent of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = atan(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.8 atanh
atanh - Inverse hyperbolic tangent
SYNTAX
R = atanh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ATANH(X) is the inverse hyperbolic tangent of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = atanh(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.9 ceil
ceil - Round towards plus infinity
SYNTAX
R = ceil(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
CEIL(X) rounds the elements of X to the nearest integers towards
infinity.
EXAMPLE
X = GPUsingle(rand(10));
R = ceil(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.10 colon
colon - Colon
SYNTAX
R = colon(J,K,GPUsingle)
R = colon(J,D,K,GPUsingle)
DESCRIPTION
COLON(J,K,GPUsingle) is the same as J:K and
COLON(J,D,K,GPUsingle) is the same as J:D:K. J:K is the
same as [J, J+1, ..., K]. J:K is empty if J > K. J:D:K is the
same as [J, J+D, ..., J+m*D] where m = fix((K-J)/D). J:D:K
is empty if D == 0, if D > 0 and J > K, or if D < 0 and J < K.
EXAMPLE
A = colon(1,2,10,GPUsingle)
6.3.11 conj
conj - CONJ(X) is the complex conjugate of X
SYNTAX
R = conj(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
For a complex X, CONJ(X) = REAL(X) - i*IMAG(X).
EXAMPLE
A = GPUsingle(rand(1,5) + i*rand(1,5));
B = conj(A)
6.3.12 cos
cos - Cosine of argument in radians
SYNTAX
R = cos(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
COS(X) is the cosine of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = cos(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.13 cosh
cosh - Hyperbolic cosine
SYNTAX
R = cosh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
COSH(X) is the hyperbolic cosine of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = cosh(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.14 ctranspose
ctranspose - Complex conjugate transpose
SYNTAX
R = X’
R = ctranspose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X’ is the complex conjugate transpose of X.
EXAMPLE
X = GPUsingle(rand(10)+i*rand(10));
R = A’
R = ctranspose(X)
6.3.15 display
display - Display GPU variable
SYNTAX
display(X)
X - GPUsingle
DESCRIPTION
Prints GPU single information. DISPLAY(X) is called for the ob-
ject X when the semicolon is not used to terminate a statement.
EXAMPLE
A = GPUsingle(rand(10));
display(A)
A
6.3.16 double
double - Converts a GPU single precision variable into a Matlab
double precision variable
SYNTAX
R = single(A)
A - GPUsingle variable
R - single precision Matlab variable
DESCRIPTION
B = SINGLE(A) converts the content of the GPU single precision
variable A into a double precision Matlab array. Loss of precision
occurs in the conversion.
EXAMPLE
A = GPUsingle(rand(100));
Ah = double(A);
6.3.17 eq
eq - Equal
SYNTAX
R = X == Y
R = eq(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A == B (eq(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A == B;
single(R)
R = eq(A, B);
single(R)
6.3.18 exp
exp - Exponential
SYNTAX
R = exp(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
EXP(X) is the exponential of the elements of X, e to the X. For
complex Z=X+i*Y, EXP(Z) = EXP(X)*(COS(Y)+i*SIN(Y)).
EXAMPLE
X = GPUsingle(rand(1,5)+i*rand(1,5));
R = exp(X)
6.3.19 fft
fft - Discrete Fourier transform
SYNTAX
R = fft(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
FFT(X) is the discrete Fourier transform (DFT) of vector X.
EXAMPLE
X = GPUsingle(rand(1,5)+i*rand(1,5));
R = fft(X)
6.3.20 fft2
fft2 - Two-dimensional discrete Fourier Transform
SYNTAX
R = fft2(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
FFT2(X) returns the two-dimensional Fourier transform of matrix
X.
EXAMPLE
X = GPUsingle(rand(5,5)+i*rand(5,5));
R = fft2(X)
6.3.21 floor
floor - Round towards minus infinity
SYNTAX
R = floor(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
FLOOR(X) rounds the elements of X to the nearest integers towards
minus infinity.
EXAMPLE
X = GPUsingle(rand(1,5));
R = floor(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.22 ge
ge - Greater than or equal
SYNTAX
R = X >= Y
R = ge(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A >= B (ge(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A >= B;
single(R)
R = ge(A, B);
single(R)
6.3.23 GPUinfo
GPUinfo - Prints information about the GPU device
SYNTAX
GPUinfo
DESCRIPTION
GPUinfo displays information about each CUDA capable device
installed on the system. Printed information includes total memory
and number of processors. GPUinfo(N) displays information about
the specific device with index= N.
EXAMPLE
GPUinfo(0)
6.3.24 GPUmem
GPUmem - Returns the free memory (bytes) on selected GPU
device
SYNTAX
GPUmem
DESCRIPTION
Returns the free memory (bytes) on selected GPU device.
EXAMPLE
GPUmem
GPUmem/1024/1024
6.3.25 GPUsingle
GPUsingle - GPUsingle constructor
SYNTAX
R = GPUsingle()
R = GPUsingle(A)
A - Either a GPUsingle or a Matlab array
R - GPUsingle variable
DESCRIPTION
GPUsingle is used to create a Matlab variable that is effectively
allocated on the GPU memory. Operations on GPUsingle objects
are executed on GPU.
EXAMPLE
GPUsingle(rand(100,100))
Ah = rand(100);
A = GPUsingle(Ah);
Bh = rand(100) + i*rand(100);
B = GPUsingle(Bh);
6.3.26 GPUstart
GPUstart - Starts the GPU environment and loads required com-
ponents
SYNTAX
GPUstart
DESCRIPTION
Start GPU environment and load required components.
EXAMPLE
GPUstart
6.3.27 GPUsync
GPUsync - Wait until all GPU operations are completed
SYNTAX
GPUsync
DESCRIPTION
Wait until all GPU operations are completed.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
tic;A + B;GPUsync;toc;
6.3.28 gt
gt - Greater than
SYNTAX
R = X > Y
R = gt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A > B (gt(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A > B;
single(R)
R = gt(A, B);
single(R)
6.3.29 ifft
ifft - Inverse discrete Fourier transform
SYNTAX
R = ifft(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
IFFT(X) is the inverse discrete Fourier transform of X.
EXAMPLE
X = GPUsingle(rand(1,5)+i*rand(1,5));
R = fft(X);
X = ifft(R);
6.3.30 ifft2
ifft2 - Two-dimensional inverse discrete Fourier transform
SYNTAX
R = ifft2(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
IFFT2(F) returns the two-dimensional inverse Fourier transform of
matrix F.
EXAMPLE
X = GPUsingle(rand(5,5)+i*rand(5,5));
R = fft2(X);
X = ifft2(R);
6.3.31 iscomplex
iscomplex - True for complex array
SYNTAX
R = iscomplex(X)
X - GPUsingle
R - logical (0 or 1)
DESCRIPTION
ISCOMPLEX(X) returns 1 if X does have an imaginary part and 0
otherwise.
EXAMPLE
A = GPUsingle(rand(5));
iscomplex(A)
A = GPUsingle(rand(5)+i*rand(5));
iscomplex(A)
6.3.32 isempty
isempty - True for empty GPUsingle array
SYNTAX
R = isempty(X)
X - GPUsingle
R - logical (0 or 1)
DESCRIPTION
ISEMPTY(X) returns 1 if X is an empty GPUsingle array and 0
otherwise. An empty GPUsingle array has no elements, that is
prod(size(X))==0.
EXAMPLE
A = GPUsingle();
isempty(A)
A = GPUsingle(rand(5)+i*rand(5));
isempty(A)
6.3.33 isreal
isreal - True for real array
SYNTAX
R = isreal(X)
X - GPUsingle
R - logical (0 or 1)
DESCRIPTION
ISREAL(X) returns 1 if X does not have an imaginary part and 0
otherwise.
EXAMPLE
A = GPUsingle(rand(5));
isreal(A)
A = GPUsingle(rand(5)+i*rand(5));
isreal(A)
6.3.34 isscalar
isscalar - True if array is a scalar
SYNTAX
R = isscalar(X)
X - GPUsingle
R - logical (0 or 1)
DESCRIPTION
ISSCALAR(S) returns 1 if S is a 1x1 matrix and 0 otherwise.
EXAMPLE
A = GPUsingle(rand(5));
isscalar(A)
A = GPUsingle(1);
isscalar(A)
6.3.35 ldivide
ldivide - Left array divide
SYNTAX
R = X .\ Y
R = ldivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A.\B denotes element-by-element division. A and B must have the
same dimensions unless one is a scalar. A scalar can be divided
with anything.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A .\ B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A .\ B
6.3.36 le
le - Less than or equal
SYNTAX
R = X <= Y
R = le(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A <= B (le(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A <= B;
single(R)
R = le(A, B);
single(R)
6.3.37 length
length - Length of vector
SYNTAX
R = length(X)
X - GPUsingle
DESCRIPTION
LENGTH(X) returns the length of vector X. It is equivalent to
MAX(SIZE(X)) for non-empty arrays and 0 for empty ones.
EXAMPLE
A = GPUsingle(rand(5));
length(A)
6.3.38 log
log - Natural logarithm
SYNTAX
R = log(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
LOG(X) is the natural logarithm of the elements of X. NaN results
are produced if X is not positive.
EXAMPLE
X = GPUsingle(rand(10));
R = log(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.39 log10
log10 - Common (base 10) logarithm
SYNTAX
R = log10(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
LOG10(X) is the base 10 logarithm of the elements of X. NaN results
are produced if X is not positive.
EXAMPLE
X = GPUsingle(rand(10));
R = log10(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.40 log1p
log1p - Compute log(1+z) accurately
SYNTAX
R = log1p(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
LOG1P(Z) computes log(1+z). Only REAL values are accepted.
EXAMPLE
X = GPUsingle(rand(10));
R = log1p(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.41 log2
log2 - Base 2 logarithm and dissect floating point number
SYNTAX
R = log2(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
Y = LOG2(X) is the base 2 logarithm of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = log2(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.42 lt
lt - Less than
SYNTAX
R = X < Y
R = lt(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A < B (lt(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A < B;
single(R)
R = lt(A, B);
single(R)
6.3.43 minus
minus - Minus
SYNTAX
R = X - Y
R = minus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X - Y subtracts matrix Y from X. X and Y must have the same
dimensions unless one is a scalar. A scalar can be subtracted from
anything.
EXAMPLE
X = GPUsingle(rand(10));
Y = GPUsingle(rand(10));
R = Y - X
6.3.44 mrdivide
mrdivide - Slash or right matrix divide
SYNTAX
R = X / Y
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Slash or right matrix divide.
EXAMPLE
A = GPUsingle(rand(10));
B = A / 5
MATLAB COMPATIBILITY
Supported only A / n where n is scalar.
6.3.45 mtimes
mtimes - Matrix multiply
SYNTAX
R = X * Y
R = mtimes(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
* (mtimes(X, Y)) is the matrix product of X and Y.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A * B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A * B
6.3.46 ndims
ndims - Number of dimensions
SYNTAX
R = ndims(X)
X - GPUsingle
DESCRIPTION
N = NDIMS(X) returns the number of dimensions in the array X.
The number of dimensions in an array is always greater than or
equal to 2. Trailing singleton dimensions are ignored. Put simply,
it is LENGTH(SIZE(X)).
EXAMPLE
X = GPUsingle(rand(10));
ndims(X)
6.3.47 ne
ne - Not equal
SYNTAX
R = X ~= Y
R = ne(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A ~= B (ne(A, B)) does element by element comparisons between
A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A ~= B;
single(R)
R = ne(A, B);
single(R)
6.3.48 not
not - Logical NOT
SYNTAX
R = ~X
X - GPUsingle
R - GPUsingle
DESCRIPTION
~A (not(A)) performs a logical NOT of input array A.
EXAMPLE
A = GPUsingle([1 2 0 4]);
R = ~A;
single(R)
6.3.49 numel
numel - Number of elements in an array or subscripted array ex-
pression.
SYNTAX
R = numel(X)
X - GPUsingle
R - number of elements
DESCRIPTION
N = NUMEL(A) returns the number of elements N in array A.
EXAMPLE
X = GPUsingle(rand(10));
numel(X)
6.3.50 ones
ones - GPU single precision ones array
SYNTAX
ones(N,GPUsingle)
ones(M,N,GPUsingle)
ones([M,N],GPUsingle)
ones(M,N,P,?,GPUsingle)
ones([M N P ...],GPUsingle)
DESCRIPTION
ones(N,GPUsingle) is an N-by-N GPU matrix of single preicision
ones.
ones(M,N,GPUsingle) or ones([M,N],GPUsingle) is an M-by-N
GPU matrix of single precision ones.
ones(M,N,P,...,GPUsingle) or ones([M N P ...,GPUsingle])
is an M-by-N-by-P-by-... GPU array of single precision ones.
EXAMPLE
A = ones(10,GPUsingle)
B = ones(10, 10,GPUsingle)
C = ones([10 10],GPUsingle)
6.3.51 or
or - Logical OR
SYNTAX
R = X | Y
R = or(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A | B (or(A, B)) performs a logical OR of arrays A and B.
EXAMPLE
A = GPUsingle([1 2 0 4]);
B = GPUsingle([1 0 0 4]);
R = A | B;
single(R)
R = or(A, B);
single(R)
6.3.52 plus
plus - Plus
SYNTAX
R = X + Y
R = plus(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X + Y (plus(X, Y)) adds matrices X and Y. X and Y must have
the same dimensions unless one is a scalar (a 1-by-1 matrix). A
scalar can be added to anything.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A + B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A + B
6.3.53 power
power - Array power
SYNTAX
R = X .^ Y
R = power(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
Z = X.^Y denotes element-by-element powers.
EXAMPLE
A = GPUsingle(rand(10));
B = 2;
R = A .^ B
A = GPUsingle(rand(10)+i*rand(10));
R = A .^ B
MATLAB COMPATIBILITY
Implemented for REAL exponents only.
6.3.54 rdivide
rdivide - Right array divide
SYNTAX
R = X ./ Y
R = rdivide(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
A./B denotes element-by-element division. A and B must have the
same dimensions unless one is a scalar. A scalar can be divided
with anything.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A ./ B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A ./ B
6.3.55 round
round - Round towards nearest integer
SYNTAX
R = round(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
ROUND(X) rounds the elements of X to the nearest integers.
EXAMPLE
X = GPUsingle(rand(10));
R = round(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.56 sin
sin - Sine of argument in radians
SYNTAX
R = sin(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
SIN(X) is the sine of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = sin(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.57 single
single - Converts a GPU variable into a Matlab single precision
variable
SYNTAX
R = single(X)
X - GPUsingle
R - Matlab variable
DESCRIPTION
B = SINGLE(A) returns the contents of the GPU variable A into a
single precision Matlab array.
EXAMPLE
A = GPUsingle(rand(100))
Ah = single(A);
6.3.58 sinh
sinh - Hyperbolic sine
SYNTAX
R = sinh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
SINH(X) is the hyperbolic sine of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = sinh(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.59 size
size - Size of array
SYNTAX
R = size(X)
X - GPUsingle
DESCRIPTION
D = SIZE(X), for M-by-N matrix X, returns the two-element row
vector D = [M,N] containing the number of rows and columns in
the matrix.
EXAMPLE
X = GPUsingle(rand(10));
size(X)
6.3.60 sqrt
sqrt - Square root
SYNTAX
R = sqrt(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
SQRT(X) is the square root of the elements of X. NaN results are
produced if X is not positive.
EXAMPLE
X = GPUsingle(rand(10));
R = sqrt(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.61 subsref
subsref - Subscripted reference
SYNTAX
R = X(I)
X - GPUsingle
I - GPUsingle
R - GPUsingle
DESCRIPTION
A(I) (subsref) is an array formed from the elements of A specified
by the subscript vector I. The resulting array is the same size as
I except for the special case where A and I are both vectors. In
this case, A(I) has the same number of elements as I but has the
orientation of A.
EXAMPLE
A = GPUsingle([1 2 3 4 5]);
idx = GPUsingle([1 2]);
B = A(idx)
6.3.62 sum
sum - Sum of elements
SYNTAX
R = sum(X)
R = sum(X, DIM)
X - GPUsingle
DIM - integer
R - GPUsingle
DESCRIPTION
S = SUM(X) is the sum of the elements of the vector X. S =
SUM(X,DIM) sums along the dimension DIM.
Note: currently the performance of the sum(X,DIM) with DIM>1 is
3x or 4x better than the sum(X,DIM) with DIM=1.
EXAMPLE
X = GPUsingle(rand(5,5)+i*rand(5,5));
R = sum(X);
E = sum(X,2);
6.3.63 tan
tan - Tangent of argument in radians
SYNTAX
R = tan(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
TAN(X) is the tangent of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = tan(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.64 tanh
tanh - Hyperbolic tangent
SYNTAX
R = tanh(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
TANH(X) is the hyperbolic tangent of the elements of X.
EXAMPLE
X = GPUsingle(rand(10));
R = tanh(X)
MATLAB COMPATIBILITY
Not implemented for complex X.
6.3.65 times
times - Array multiply
SYNTAX
R = X .* Y
R = times(X,Y)
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
X.*Y denotes element-by-element multiplication. X and Y must
have the same dimensions unless one is a scalar. A scalar can be
multiplied into anything.
EXAMPLE
A = GPUsingle(rand(10));
B = GPUsingle(rand(10));
R = A .* B
A = GPUsingle(rand(10)+i*rand(10));
B = GPUsingle(rand(10)+i*rand(10));
R = A .* B
6.3.66 transpose
transpose - Transpose
SYNTAX
R = X.’
R = transpose(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
X.’ or transpose(X) is the non-conjugate transpose.
EXAMPLE
X = GPUsingle(rand(10));
R = X.’
R = transpose(X)
6.3.67 uminus
uminus - Unary minus
SYNTAX
R = -X
R = uminus(X)
X - GPUsingle
R - GPUsingle
DESCRIPTION
-A negates the elements of A.
EXAMPLE
X = GPUsingle(rand(10));
R = -X
R = uminus(X)
6.3.68 vertcat
vertcat - Vertical concatenation
SYNTAX
R = [X;Y]
X - GPUsingle
Y - GPUsingle
R - GPUsingle
DESCRIPTION
[A;B] is the vertical concatenation of matrices A and B. A and B
must have the same number of columns. Any number of matrices
can be concatenated within one pair of brackets.
EXAMPLE
A = [zeros(10,1,GPUsingle);colon(0,1,10,GPUsingle)’];
6.3.69 zeros
zeros - GPU single precision zeros array
SYNTAX
zeros(N,GPUsingle)
zeros(M,N,GPUsingle)
zeros([M,N],GPUsingle)
zeros(M,N,P,?,GPUsingle)
zeros([M N P ...],GPUsingle)
DESCRIPTION
zeros(N,GPUsingle) is an N-by-N GPU matrix of single preicision
zeros.
zeros(M,N,GPUsingle) or zeros([M,N],GPUsingle) is an M-by-N
GPU matrix of single precision zeros.
zeros(M,N,P,...,GPUsingle) or zeros([M N P
...,GPUsingle]) is an M-by-N-by-P-by-... GPU array of
single precision zeros.
EXAMPLE
A = zeros(10,GPUsingle)
B = zeros(10, 10,GPUsingle)
C = zeros([10 10],GPUsingle)
SYNTAX
DESCRIPTION
Wrapper to CUBLAS cublasAlloc function.
Original function declaration:
cublasStatus
cublasAlloc (int n, int elemSize, void **devicePtr)
Mapping:
EXAMPLE
N = 10;
SIZEOF_FLOAT = sizeoffloat();
% GPU variable d_A
d_A = 0;
[status d_A] = cublasAlloc(N,SIZEOF_FLOAT,d_A);
ret = cublasCheckStatus( status, ...
’!!!! device memory allocation error (d_A)’);
6.4.2 cublasCgemm
cublasCgemm - Wrapper to CUBLAS cublasCgemm function
DESCRIPTION
Wrapper to CUBLAS cublasCgemm function. Original function
declaration:
void cublasCgemm
(char transa, char transb, int m, int n, int k,
cuComplex alpha, const cuComplex *A, int lda,
const cuComplex *B, int ldb, cuComplex beta,
cuComplex *C, int ldc)
EXAMPLE
I = sqrt(-1);
A = GPUsingle(rand(N,N) + I*rand(N,N));
B = GPUsingle(rand(N,N) + I*rand(N,N));
% C needs to be complex as well
C = zeros(N,N,GPUsingle)*I;
% alpha is complex
alpha = 2.0+I*3.0;
beta = 0.0;
opA = ’n’;
opB = ’n’;
status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);
6.4.3 cublasCheckStatus
cublasCheckStatus - Check the CUBLAS status.
DESCRIPTION
cublasCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an er-
ror with message ’MSG’.
EXAMPLE
status = cublasGetError();
cublasCheckStatus( status, ’Kernel execution error’);
6.4.4 cublasError
cublasError - Returns a structure with CUBLAS result codes
DESCRIPTION
Returns a structure with CUBLAS result codes.
EXAMPLE
cublasError
ans =
CUBLAS_STATUS_SUCCESS: 0
CUBLAS_STATUS_NOT_INITIALIZED: 1
CUBLAS_STATUS_ALLOC_FAILED: 3
CUBLAS_STATUS_INVALID_VALUE: 7
CUBLAS_STATUS_ARCH_MISMATCH: 8
CUBLAS_STATUS_MAPPING_ERROR: 11
CUBLAS_STATUS_EXECUTION_FAILED: 13
CUBLAS_STATUS_INTERNAL_ERROR: 14
6.4.5 cublasFree
cublasFree - Wrapper to CUBLAS cublasFree function
DESCRIPTION
Wrapper to CUBLAS cublasFree function.
Original function declaration:
cublasStatus
cublasFree (const void *devicePtr)
Mapping:
status = cublasFree(d_A)
d_A -> const void *devicePtr
EXAMPLE
N = 10;
SIZEOF_FLOAT = sizeoffloat();
% Clean up memory
status = cublasFree(d_A);
ret = cublasCheckStatus( status, ...
’!!!! memory free error (d_A)’);
6.4.6 cublasGetError
cublasGetError - Wrapper to CUBLAS cublasGetError function
DESCRIPTION
Wrapper to CUBLAS cublasGetError function. Original function
declaration:
cublasStatus
cublasGetError (void)
EXAMPLE
status = cublasGetError();
cublasCheckStatus( status, ’Kernel execution error’);
6.4.7 cublasGetVector
cublasGetVector - Wrapper to CUBLAS cublasGetVector function
DESCRIPTION
Wrapper to CUBLAS cublasGetVector function. Original function
declaration:
cublasStatus
cublasGetVector
(int n, int elemSize, const void *x, int incx,
void *y, int incy)
EXAMPLE
A = GPUsingle([1 2 3 4]);
6.4.8 cublasInit
cublasInit - Wrapper to CUBLAS cublasInit function
DESCRIPTION
Wrapper to CUBLAS cublasInit function. Original function decla-
ration:
cublasStatus
cublasInit (void)
EXAMPLE
status = cublasInit;
cublasCheckStatus(status, ’Error.’);
6.4.9 cublasIsamax
cublasIsamax - Wrapper to CUBLAS cublasIsamax function
DESCRIPTION
Wrapper to CUBLAS cublasIsamax function. Original function
declaration:
int
cublasIsamax (int n, const float *x, int incx)
Mapping:
EXAMPLE
6.4.10 cublasIsamin
cublasIsamin - Wrapper to CUBLAS cublasIsamin function
DESCRIPTION
Wrapper to CUBLAS cublasIsamin function. Original function dec-
laration:
int
cublasIsamin (int n, const float *x, int incx)
EXAMPLE
N = 10;
A = GPUsingle(rand(1,N));
6.4.11 cublasResult
cublasResult - Returns a structure with CUBLAS error results
DESCRIPTION
Returns a structure with CUBLAS error results.
6.4.12 cublasSasum
cublasSasum - Wrapper to CUBLAS cublasSasum function
DESCRIPTION
Wrapper to CUBLAS cublasSasum function.
Original function declaration:
float
cublasSasum (int n, const float *x, int incx)
EXAMPLE
N = 10;
A = GPUsingle(rand(1,N));
Sasum_mat = sum(single(A));
compareArrays(Sasum, Sasum_mat, 1e-6);
6.4.13 cublasSaxpy
cublasSaxpy - Wrapper to CUBLAS cublasSaxpy function
DESCRIPTION
Wrapper to CUBLAS cublasSaxpy function. Original function dec-
laration:
void
cublasSaxpy
(int n, float alpha, const float *x, int incx, float *y,
int incy)
EXAMPLE
N = 10;
A = GPUsingle(rand(1,N));
B = GPUsingle(rand(1,N));
alpha = 2.0;
Saxpy_mat = alpha * single(A) + single(B);
status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);
6.4.14 cublasScopy
cublasScopy - Wrapper to CUBLAS cublasScopy function
DESCRIPTION
Wrapper to CUBLAS cublasScopy function. Original function dec-
laration:
void
cublasScopy
(int n, const float *x, int incx, float *y, int incy)
EXAMPLE
N = 10;
A = GPUsingle(rand(1,N));
B = GPUsingle(rand(1,N));
6.4.15 cublasSdot
cublasSdot - Wrapper to CUBLAS cublasSdot function
DESCRIPTION
Wrapper to CUBLAS cublasSdot function. Original function dec-
laration:
float
cublasSdot
(int n, const float *x, int incx, const float *y, int incy)
EXAMPLE
N = 10;
A = GPUsingle(rand(1,N));
B = GPUsingle(rand(1,N));
Sdot_mat = sum(single(A).*single(B));
Sdot = cublasSdot(N, getPtr(A), 1, getPtr(B), 1);
status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);
6.4.16 cublasSetVector
cublasSetVector - Wrapper to CUBLAS cublasSetVector function
DESCRIPTION
Wrapper to CUBLAS cublasSetVector function. Original function
declaration:
cublasStatus
cublasSetVector
(int n, int elemSize, const void *x, int incx,
void *y, int incy)
EXAMPLE
B =single( [1 2 3 4]);
disp(single(A));
6.4.17 cublasSgemm
cublasSgemm - Wrapper to CUBLAS cublasSgemm function
DESCRIPTION
Wrapper to CUBLAS cublasSgemm function. Original function
declaration:
void
cublasSgemm
(char transa, char transb, int m, int n, int k,
float alpha, const float *A, int lda,
const float *B, int ldb, float beta,
float *C, int ldc)
EXAMPLE
N = 10;
A = GPUsingle(rand(N,N));
B = GPUsingle(rand(N,N));
C = zeros(N,N,GPUsingle);
alpha = 2.0;
beta = 0.0;
opA = ’n’;
opB = ’n’;
status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);
6.4.18 cublasShutdown
cublasShutdown - Wrapper to CUBLAS cublasShutdown function
DESCRIPTION
Wrapper to CUBLAS cublasShutdown function. Original function
declaration:
cublasStatus
cublasShutdown (void)
6.4.19 cublasSnrm2
cublasSnrm2 - Wrapper to CUBLAS cublasSnrm2 function
DESCRIPTION
Wrapper to CUBLAS cublasSnrm2 function. Original function dec-
laration:
float
cublasSnrm2 (int n, const float *x, int incx)
EXAMPLE
N = 10;
A = GPUsingle(rand(1,N));
Snrm2_mat = sqrt(sum(single(A).*single(A)));
Snrm2 = cublasSnrm2(N, getPtr(A),1);
status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);
6.4.20 cublasSrot
cublasSrot - Wrapper to CUBLAS cublasSrot function
DESCRIPTION
Wrapper to CUBLAS cublasSrot function.
Original function declaration:
void
cublasSrot (int n, float *x, int incx,
float *y, int incy, float sc,
float ss)
6.4.21 cublasSscal
cublasSscal - Wrapper to CUBLAS cublasSscal function
DESCRIPTION
Wrapper to CUBLAS cublasSscal function.
Original function declaration:
void
sscal (int n, float alpha, float *x, int incx)
EXAMPLE
N = 10;
A = GPUsingle(rand(1,N));
alpha = 1/10.0;
A_mat = single(A)*alpha;
cublasSscal(N, alpha, getPtr(A), 1);
status = cublasGetError();
ret = cublasCheckStatus( status, ...
’!!!! kernel execution error.’);
6.4.22 cuCheckStatus
cuCheckStatus - Check the CUDA DRV status.
DESCRIPTION
cuCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an
error with message ’MSG’.
EXAMPLE
[status]=cuInit();
cuCheckStatus( status, ’Error initialize CUDA driver.’);
6.4.23 cudaCheckStatus
cudaCheckStatus - Check the CUDA run-time status
DESCRIPTION
RET = cudaCheckStatus(STATUS,MSG) returns EXIT FAILURE(1)
or EXIT SUCCESS(0) depending on STATUS value, and throws an
error with message ’MSG’.
EXAMPLE
status = cudaGetLastError();
cudaCheckStatus( status, ’Kernel execution error.’);
6.4.24 cudaGetDeviceCount
cudaGetDeviceCount - Wrapper to CUDA cudaGetDeviceCount
function.
DESCRIPTION
Wrapper to CUDA cudaGetDeviceCount function.
EXAMPLE
count = 0;
[status,count] = cudaGetDeviceCount(count);
if (status ~=0)
error(’Unable to get the number of devices’);
end
6.4.25 cudaGetDeviceMajorMinor
cudaGetDeviceMajorMinor - Returns CUDA compute capability
major and minor numbers.
DESCRIPTION
Returns CUDA compute capability major and minor numbers.
[STATUS, MAJOR, MINOR] = cudaGetDeviceMajorMinor(DEV)
returns the compute capability number (major, minor) of the
device=DEV. STATUS is the result of the operation.
EXAMPLE
dev = 0;
[status,major,minor] = cudaGetDeviceMajorMinor(dev);
if (status ~=0)
error([’Unable to get the compute capability’]);
end
major
minor
6.4.26 cudaGetDeviceMemory
cudaGetDeviceMemory - Returns device total memory
DESCRIPTION
[STATUS, TOTMEM] = cudaGetDeviceMemory(DEV) returns the to-
tal memory of the device=DEV. STATUS is the result of the oper-
ation.
EXAMPLE
dev = 0;
[status,totmem] = cudaGetDeviceMemory(dev);
if (status ~=0)
error(’Error getting total memory’);
end
totmem = totmem/1024/1024;
disp([’Total memory=’ num2str(totmem) ’MB’]);
6.4.27 cudaGetDeviceMultProcCount
cudaGetDeviceMultProcCount - Returns device multi-processors
count
DESCRIPTION
[STATUS, COUNT] = cudaGetDeviceMultProcCount(DEV) re-
turns the number of multi-processors of the device=DEV. STATUS
is the result of the operation.
EXAMPLE
dev = 0;
[status,count] = cudaGetDeviceMultProcCount(dev);
if (status ~=0)
error(’Error getting numer of multi proc’);
end
disp([’ Mult. processors = ’ num2str(count) ]);
6.4.28 cudaGetLastError
cudaGetLastError - Wrapper to CUDA cudaGetLastError function
DESCRIPTION
[STATUS] = cudaGetLastError() returns the last error from the
run-time call. STATUS is the result of the operation.
Original function declaration:
cudaError_t
cudaGetLastError(void)
6.4.29 cudaSetDevice
cudaSetDevice - Wrapper to CUDA cudaSetDevice function
DESCRIPTION
[STATUS] = cudaSetDevice(DEV) sets the device to DEV and re-
turns the result of the operation in STATUS.
Original function declaration:
cudaError_t
cudaSetDevice( int dev )
6.4.30 cudaThreadSynchronize
cudaThreadSynchronize - Wrapper to CUDA cudaThreadSyn-
chronize function.
DESCRIPTION
[STATUS] = cudaThreadSynchronize(). STATUS is the result of
the operation.
Original function declaration:
cudaError_t cudaThreadSynchronize(void)
6.4.31 cufftCheckStatus
cufftCheckStatus - Checks the CUFFT status
DESCRIPTION
cufftCheckStatus(STATUS,MSG) returns EXIT FAILURE(1) or
EXIT SUCCESS(0) depending on STATUS value, and throws an er-
ror with message ’MSG’. STATUS is compared to CUFFT possible
results.
EXAMPLE
fftType = cufftType;
A = GPUsingle(rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);
6.4.32 cufftDestroy
cufftDestroy - Wrapper to CUFFT cufftDestroy function
DESCRIPTION
Wrapper to CUFFT cufftDestroy function. Original function dec-
laration:
cufftResult
cufftDestroy(cufftHandle plan);
EXAMPLE
fftType = cufftType;
I = sqrt(-1);
A = GPUsingle(rand(1,128)+I*rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);
[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);
6.4.33 cufftExecC2C
cufftExecC2C - Wrapper to CUFFT cufftExecC2C function
DESCRIPTION
Wrapper to CUFFT cufftExecC2C function. Original function dec-
laration:
cufftResult
cufftExecC2C(cufftHandle plan,
cufftComplex *idata,
cufftComplex *odata,
int direction);
EXAMPLE
fftType = cufftType;
fftDir = cufftTransformDirections;
I = sqrt(-1);
A = GPUsingle(rand(1,128)+I*rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);
dir = fftDir.CUFFT_FORWARD;
[status] = cufftExecC2C(plan, getPtr(A),getPtr(A), dir);
cufftCheckStatus(status, ’Error in cufftExecC2C’);
[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);
6.4.34 cufftExecC2R
cufftExecC2R - Wrapper to CUFFT cufftExecC2R function
DESCRIPTION
Wrapper to CUFFT cufftExecC2R function. Original function dec-
laration:
cufftResult
cufftExecC2R(cufftHandle plan,
cufftComplex *idata,
cufftReal *odata);
6.4.35 cufftExecR2C
cufftExecR2C - Wrapper to CUFFT cufftExecR2C function
DESCRIPTION
Wrapper to CUFFT cufftExecR2C function. Original function dec-
laration:
cufftResult
cufftExecR2C(cufftHandle plan,
cufftReal *idata,
cufftComplex *odata);
6.4.36 cufftPlan1d
cufftPlan1d - Wrapper to CUFFT cufftPlan1d function
DESCRIPTION
Wrapper to CUFFT cufftPlan1d function. Original function decla-
ration:
cufftResult
cufftPlan1d(cufftHandle *plan,
int nx,
cufftType type,
int batch);
EXAMPLE
fftType = cufftType;
I = sqrt(-1);
A = GPUsingle(rand(1,128)+I*rand(1,128));
plan = 0;
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan1d(plan, numel(A), type, 1);
cufftCheckStatus(status, ’Error in cufftPlan1D’);
[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);
6.4.37 cufftPlan2d
cufftPlan2d - Wrapper to CUFFT cufftPlan2d function
DESCRIPTION
Wrapper to CUFFT cufftPlan2d function. Original function decla-
ration:
cufftResult
cufftPlan2d(cufftHandle *plan,
int nx, int ny,
cufftType type);
EXAMPLE
fftType = cufftType;
I = sqrt(-1);
A = GPUsingle(rand(128,128)+I*rand(128,128));
plan = 0;
% Vectors stored in column major format (FORTRAN)
s = size(A);
type = fftType.CUFFT_C2C;
[status, plan] = cufftPlan2d(plan, s(2), s(1),type);
cufftCheckStatus(status, ’Error in cufftPlan2D’);
[status] = cufftDestroy(plan);
cufftCheckStatus(status, ’Error in cuffDestroyPlan’);
6.4.38 cufftPlan3d
cufftPlan3d - Wrapper to CUFFT cufftPlan3d function
DESCRIPTION
Wrapper to CUFFT cufftPlan3d function. Original function decla-
ration:
cufftResult
cufftPlan2d(cufftHandle *plan,
int nx, int ny, int nz,
cufftType type);
6.4.39 cufftResult
cufftResult - Returns a structure with CUFFT result codes
DESCRIPTION
Returns a structure with CUFFT result codes
EXAMPLE
cufftResult
ans =
CUFFT_SUCCESS: 0
CUFFT_INVALID_PLAN: 1
CUFFT_ALLOC_FAILED: 2
CUFFT_INVALID_TYPE: 3
CUFFT_INVALID_VALUE: 4
...
6.4.40 cufftTransformDirections
cufftTransformDirections - Returns a structure with CUFFT
transform direction codes
DESCRIPTION
Returns a structure with CUFFT transform direction codes.
EXAMPLE
cufftTransformDirections
ans =
CUFFT_FORWARD: -1
CUFFT_INVERSE: 1
6.4.41 cufftType
cufftType - Returns a structure with CUFFT transform type codes
DESCRIPTION
Returns a structure with CUFFT transform type codes.
EXAMPLE
cufftType
ans =
CUFFT_R2C: 42
CUFFT_C2R: 44
CUFFT_C2C: 41
6.4.42 cuInit
cuInit - Wrapper to CUDA driver function cuInit
DESCRIPTION
Wrapper to CUDA driver function cuInit.
6.4.43 cuMemGetInfo
cuMemGetInfo - Wrapper to CUDA driver function cuMemGet-
Info
DESCRIPTION
Wrapper to CUDA driver function cuMemGetInfo.
EXAMPLE
freemem = 0;
c = 0;
[status, freemem, c] = cuMemGetInfo(freemem,c);
6.4.44 getPtr
getPtr - Get GPUsingle pointer on GPU memory
SYNTAX
R = getPtr(X)
X - GPU variable
R - the pointer to the GPU memory region
DESCRIPTION
This is a low level function used to get the pointer value to the
GPU memory of a GPUsingle object
EXAMPLE
A = GPUsingle(rand(10));
getPtr(A)
6.4.45 getSizeOf
getSizeOf - Get the size of the GPU datatype (similar to sizeof in
C)
SYNTAX
R = getSizeOf(X)
X - GPU variable
R - the size of the GPU variable datatype
DESCRIPTION
This is a low level function used to get the size of the datatype of
the GPU variable.
EXAMPLE
A = GPUsingle(rand(10));
getSizeOf(A)
6.4.46 getType
getType - Get the type of the GPU variable
SYNTAX
R = getType(X)
X - GPU variable
R - the type of the GPU variable
DESCRIPTION
This is a low level function used to get the type of the GPU variable
(FLOAT = 0, COMPLEX FLOAT = 1, DOUBLE = 2, COMPLEX
DOUBLE = 3)
EXAMPLE
A = GPUsingle(rand(10));
getType(A)
6.4.47 GPUallocVector
GPUallocVector - Variable allocation on GPU memory
SYNTAX
GPUallocVector(P)
P - GPUsingle
DESCRIPTION
P = GPUallocVector(P) allocates the required GPU memory for
P. The size of the allocated variable depends on the size of P.
A complex variable is allocated as an interleaved sequence of real
and imaginary values. It means that the memory size for a complex
on the GPU is numel(P)*2*SIZE OF FLOAT. It is mandatory to set
the size of the variable before calling GPUallocVector.
EXAMPLE
A = GPUsingle();
setSize(A,[100 100]);
GPUallocVector(A);
A = GPUsingle();
setSize(A,[100 100]);
setComplex(A);
GPUallocVector(A);
6.4.48 GPUdeviceInit
GPUdeviceInit - Initializes a CUDA capable GPU device
SYNTAX
GPUdeviceInit(dev)
dev - device number
DESCRIPTION
GPUdeviceInit(dev) initializes the GPU device dev, where dev is
an integer corresponding to the device number. By using GPUinfo
it is possible to see the available devices and the corresponding
number
EXAMPLE
GPUinfo
GPUdeviceInit(0)
6.4.49 istrans
istrans - True if GPUsingle TRANS flag is set to 1
SYNTAX
R = istrans(X)
X - GPUsingle
R - logical (0 or 1)
DESCRIPTION
ISTRANS(X) returns 1 if the flag TRANS is set
EXAMPLE
A = GPUsingle(rand(5));
istrans(A)
B = transpose(A);
istrans(B)
6.4.50 packfC2C
packfC2C - Pack two arrays into an interleaved complex array
SYNTAX
DESCRIPTION
PACKFC2C(RE IDATA, IM IDATA, ODATA) pack the values of
RE IDATA and IM IDATA into ODATA as shown in the example. The
type of elements of ODATA is complex.
EXAMPLE
re = GPUsingle(rand(1,100));
im = GPUsingle(rand(1,100));
r = GPUsingle();
setComplex(r);
setSize(r,size(re));
GPUallocVector(r);
packfC2C(re,im,r);
6.4.51 packfR2C
packfR2C - Transforms a real array into a complex array with zero
complex elements.
SYNTAX
PACKFR2C(RE_IDATA, ODATA)
RE_IDATA - GPUsingle, real part
ODATA - GPUsingle, complex
DESCRIPTION
PACKFR2C(RE IDATA, ODATA) transforms RE IDATA into a the com-
plex array ODATA. The type of elements of ODATA is complex. The
complex part of ODATA is set to zero.
EXAMPLE
re = GPUsingle(rand(1,100));
r = GPUsingle();
setComplex(r);
setSize(r,size(re));
GPUallocVector(r);
packfR2C(re,r);
6.4.52 setComplex
setComplex - Set a GPUsingle as complex
SYNTAX
setComplex(A)
A - GPUsingle
DESCRIPTION
setComplex(P) set the GPUsingle P as complex. Should be called
before using GPUallocVector.
EXAMPLE
A = GPUsingle();
setSize(A,[10 10]);
setComplex(A);
GPUallocVector(A);
6.4.53 setReal
setReal - Set a GPUsingle as real
SYNTAX
setReal(A)
A - GPUsingle
DESCRIPTION
setReal(P) set the GPUsingle P as real. Should be called before
using GPUallocVector.
EXAMPLE
A = GPUsingle();
setSize(A,[10 10]);
setReal(A);
GPUallocVector(A);
6.4.54 setSize
setSize - Set GPUsingle size
SYNTAX
setSize(A,SIZE)
A - GPUsingle
DESCRIPTION
setSize(R, SIZE) set the size of R to SIZE
EXAMPLE
A = GPUsingle();
setSize(A,[10 10]);
6.4.55 unpackfC2C
unpackfC2C - Unpack one complex array into two single precision
arrays
SYNTAX
DESCRIPTION
UNPACKFC2C(IDATA, RE ODATA, IM ODATA) unpack the values of
IDATA into two arrays RE ODATA and IM ODATA as shown in the
example. The type of elements of IDATA is complex.
EXAMPLE
r = GPUsingle(rand(1,100)+i*rand(1,100));
re = GPUsingle();
setReal(re);
setSize(re,size(r));
GPUallocVector(re);
im = GPUsingle();
setReal(im);
setSize(im,size(r));
GPUallocVector(im);
unpackfC2C(r,re,im);
6.4.56 unpackfC2R
unpackfC2R - Transforms a complex array into a real array dis-
carding the complex part
SYNTAX
UNPACKFC2C(IDATA, RE_ODATA)
DESCRIPTION
UNPACKFC2C(IDATA, RE ODATA) transforms the complex array
IDATA into the array RE ODATA discarding the imaginary part. The
type of elements of IDATA is complex.
EXAMPLE
r = GPUsingle(rand(1,100)+i*rand(1,100));
re = GPUsingle();
setReal(re);
setSize(re,size(r));
GPUallocVector(re);
unpackfC2R(r,re);
189