Be Template 2016

Linear ODE Solver using GPU Computation
B.E. Project Report

Submitted in partial fulfillment of the requirements
For the degree of
Bachelor of Engineering
(Electronics Engineering)
by
Ameya Choukhande (13EE1153)
Animesh Bhattacharya (13EE1127)
Manoj Gurav (13EE1102)
Rohan Chavan (13EE1057)
Supervisor
Dr.Vishwesh Vyawahare
Department of Electronics Engineering

Ramrao Adik Institute of Technology,
Sector 7, Nerul , Navi Mumbai
(Affiliated to University of Mumbai)
April 2017
Ramrao Adik Education Societys
Ramrao Adik Institute of Technology
(Affiliated to the University of Mumbai)
Dr. D. Y. Patil Vidyanagar,Sector 7, Nerul, Navi Mumbai 400 706.
Certificate
This is to certify that, the project report-a titled
Linear ODE Solver using GPU Computation

is a bonafide work done by
Ameya Choukhande (13EE1153)

Animesh Bhattacharya (13EE1127)
Manoj Gurav (13EE1102)
Rohan Chavan (13EE1057)
and is submitted in the partial fulfillment of the requirement for the
degree of
Bachelor of Engineering
(Electronics Engineering)
to the
University of Mumbai.
Examiner 1 Examiner 2 Supervisor
Project Coordinator Head of Department Principal
i
Declaration
I declare that this written submission represents my ideas in my own words and
where others ideas or words have been included, I have adequately cited and referenced
the original sources. I also declare that I have adhered to all principles of academic
honesty and integrity and have not misrepresented or fabricated or falsified any idea/-
data/fact/source in my submission. I understand that any violation of the above will be
cause for disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper permission has not
been taken when needed.
(Signature)
(Name of student and Roll No.)
Date:
Abstract
In this project we are going to use MATLAB software to develop programs and algo-
rithms to solve some mathematical problems using methods of numerical analysis and
subsequently implement it on a Graphics Processing Unit (GPU) system to achieve faster
computation and throughput in least amount of time.
Numerical analysis is an area of mathematics and computer science that creates, ana-
lyzes, and implements algorithms for obtaining numerical solutions to problems involving
continuous variables. The formal academic area of numerical analysis ranges from quite
theoretical mathematical studies to computer science issues. MATLAB is one of the most
widely used mathematical computing environments in technical computing. It has an
interactive environment which provides high performance computing (HPC) procedures
and easy to use.
After developing our programs in MATLAB we transfer them to a GPU system to utilize
its parallel computing feature. A GPU has a number of threads where each thread can
execution different program. This helps us achieve significant faster computation than
a normal CPU system. A GPU is a highly parallel computing device. Its designed to
accelerate the analysis of the large datasets such as image , video and voice processing
or to increase the performance with graphics rendering , computer games. The GPU has
gained significant popularity as powerful tools for high performance computing (HPC)
because of the low cost , flexible and accessible of the GPU.
Keywords:
MATLAB,GPU,Numerical methods,GPGPU, Ordinary differential equations, Initial value
problems.
iii
Contents
Abstract iii
List of Figures vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Organisation of the report . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 3
3 Overview 5
3.1 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Why Study Numerical Methods? . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Numerical Methods used . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4.1 Eulers method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4.2 Modified Eulers method . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.3 Runge-Kutta method of fourth order . . . . . . . . . . . . . . . . . 8
3.4.4 Application of Numerical Methods . . . . . . . . . . . . . . . . . . 8
3.5 Graphics Processing Unit (GPU) . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.1 System specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.7 Parallel Computing Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.8 Using GPU in Matlab Parallel Computing Toolbox . . . . . . . . . . . . . 15
4 System Methodology 18
4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 Implementation by Euler Method . . . . . . . . . . . . . . . . . . . 20
5 Result and Conclusion 24

5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Appendix 27
6.1 Implementation by Euler Method . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Implementation by Modified Euler Method . . . . . . . . . . . . . . . . . . 28
6.3 Implementation by Runge-Kutta(fourth order) . . . . . . . . . . . . . . . . 29
iv
Bibliography 39
Acknowledgments 41
v
List of Figures
3.1 Different Numerical Methods [1] . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Graphical representation of eulers method . . . . . . . . . . . . . . . . . . 7
3.3 Graphical representation of modified eulers method . . . . . . . . . . . . . 7
3.4 Graphical representation of RK4 method . . . . . . . . . . . . . . . . . . . 8
3.5 Difference between CPU and GPU [27] . . . . . . . . . . . . . . . . . . . . 9
3.6 Diff. between CPU and GPU [4] . . . . . . . . . . . . . . . . . . . . . . . . 10
3.7 GPU Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.8 Understanding Matlab Environment [22] . . . . . . . . . . . . . . . . . . . 13
3.9 Use of M-file [22] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.10 Acceleration by GPU [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.11 Matrix creation in CPU [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.12 Implementation using GPU [5] . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.13 Execution of a program in GPU [29] . . . . . . . . . . . . . . . . . . . . . 17
4.1 Project flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Profiler showing lines taking most amount of time . . . . . . . . . . . . . . 21
4.3 Profiler showing line by line analysis of time taken by the complete code . 21
5.1 Superposition of exact solution with numerical solution . . . . . . . . . . . 24

5.2 Graphical variation of Speed up with respect to various sample sizes . . . . 25
5.3 Speedup of ODE functions using different numerical methods . . . . . . . . 26
vi
Abbreviations
GPU Graphics Processing Unit
CPU Central Processing Unit
MATLAB Matrix Laboratory
PCT Parallel Computing Toolbox
IDE Integrated Development Environment
SPMD Single Program Multiple Data
ODE Ordinary Differential equations
GPGPU General Purpose Graphics Processing Unit

Chapter 1
Introduction
A large problem can usually be divided into smaller tasks that operate together in order
to create a solution. Take for example painting a house. Say you need to buy 5 liters of
paint and 5 brushes before having to paint the whole house. You can either run out and
buy everything and paint the whole house yourself, or you can get help by friends or rent
painters.
You probably want to do the latter, get help. In order to save time, you go out and buy
the paint, and another person gets the brushes. Then you get help from 4 persons that
will paint one wall of the house, each. This will save you time because you get help from
many persons, working on the same solution in parallel.
This applies to computing as well. Say you want to add two vectors v(x,y,z) and u(x,y,z),
where v=(1,2,3) and u=(4,5,6). You do this by saying v + u = x, (1,2,3)+(4,5,6)=(1+4,
2+5, 3+6)=(5,7,9). You can do this yourself, one calculation at a time, but as you proba-
bly can see, this problem can be divided into smaller problems. You can have one person
adding the x components together, another adding the y components together and a third
adding the z components together.
1.1 Motivation
The main motivation for this project are the recent works that are being done in this
field. Making processing speeds faster is what one of the crucial needs in todays world.
When we look up towards various complex mathematical computation, we tend to create
structures that will help us to perform calculations faster, where this concept of parallel
working comes into action.A graphical Processing Unit or GPU is the technology that
carries out this task in the computers.
1.2 Objective
In this project, we intend to show how a numerical method can be deciphered in shorter
duration of time using GPU. We will take some mathematical functions and imply them
on both, a CPU as well as GPU and compare their speeds, thus showing an analytical
proof by which factor is the speed improved for the implication.
1.3 Problem Definition
The basic task which we have to fulfill in this project is developing algorithms and sub-
sequent programs of some numerical methods particularly ordinary differential equations
(ODEs).This is the fundamental requirement for successful implementation on GPU as
some methods may contain some dependent terms and our major challenge will be to work
out a way to vectorize them so that they can be parallelly implemented by the GPU.
Given the Ordinary Differential equation functions, we then construct the sequential
MATLAB code for it and furthermore vectorize it.Finally we compare the processing
time of both approaches.
1.4 Organisation of the report

In the next chapter we will talk about literature survey done for the research of our
project.Then chapter 3 is the overview which includes the historical background of nu-
merical methods and its significance, a brief description of GPU, and a detailed article
about the software that we are going to use which is MATLAB and finally the Parallel
Computing Toolbox and it use for GPU computing.
Following the Overview,the system methodology for the whole operation is illuminated.
Thereafter, The implementation for the operation is provided where the problem state-
ment(the functions) and the solution(implication in scalar and vector code).
Finally the results,conclusions and the future scope of the project shall finally conclude
this report.
This implication of vectorize coding will come useful while we use the PCT or the Par-
allel Computing Toolbox. With the optimisation in sight, we shall be procuring all the
required analytics before we can jump onto the GPU implementation.
2
Chapter 2
Literature Survey
K. A. Stroud says that Numerical analysis involves the study of methods of computing
numerical data. In many problems this implies producing a sequence of approximations
by repeating the procedure again and again. People who employ numerical methods for
solving problems have to worry about the following issues: the rate of convergence (how
long does it take for the method to find the answer), the accuracy (or even validity) of
the answer, and the completeness of the response (do other solutions, in addition to the
one found, exist).[10]
Numerical methods provide approximations to the problems in question. No mat-

ter how accurate they are they do not, in most cases, provide the exact answer. In some
instances working out the exact answer by a different approach may not be possible or
may be too time consuming and it is in these cases where numerical methods are most
often used.[12]
In reference with Kendall E. Atkinson,Differential equations are among the most im-
portant mathematical tools used in producing models in the physical sciences, biological
sciences, and engineering.Ordinary Differential Equations can be solved by numerical
methods such as Eulers method,RungeKutta methods and the families of AdamsBash-
forth and AdamsMoulton methods where accuracy of result is of less importance than
speed of processing.[8]
GPU computing is the use of a GPU to do general purpose scientific and engineer-
ing computing.As given in proceedings of IEEE(volume 96,issue:5) John D. Owens,Mike
Houston with others discussed that CPU and GPU together form a heterogeneous comput-
ing model.Sequential part of the application runs on the CPU and the computationally-
intensive part runs on the GPU. From the users perspective, the application just runs
faster because it is using the high-performance of the GPU to boost performance.[14]
A simple way to understand the difference between a GPU and a CPU is to compare
how they process tasks. A CPU consists of a few cores optimized for sequential serial
processing while a GPU has a massively parallel architecture consisting of thousands of
smaller, more efficient cores designed for handling multiple tasks simultaneously and pro-
cessing parallel workloads efficiently.[3]
MATLAB is optimized for operations involving matrices and vectors. From R2016b
MATLAB Documentation,the process of revising loop-based, scalar-oriented code to use
MATLAB matrix and vector operations is called Vectorization .[4]
According to Nicholas Ide, Parallel Computing Toolbox Technical Expert,Parallel Com-

puting Toolbox lets you solve computationally and data-intensive problems using mul-
ticore processors, GPUs, and computer clusters. High-level for-loops, special array types,
and parallelized numerical algorithms.It let you parallelize MATLAB applications without
CUDA programming. You can use the toolbox with Simulink to run multiple simulations
of a model in parallel. The toolbox lets you use the full processing power of multicore
desktops by executing applications on workers (MATLAB computational engines) that
run locally. You can run parallel applications interactively or in batch.[4]
Steven C. Chapra had discussed various types of mathematical models and numerical
methods like ordinary differential equations(ODE) , initial value problems, etc. Further
he has tried to developed their algorithms and implement them in programming envi-
ronment of MATLAB so that the numerical methods can be used to solve problems in
engineering and science.
MATLAB offers a nice combination of handy programming features with powerful built-
in numerical capabilities. Its M-file programming environment allows us to implement
moderately complicated algorithms in a structured and coherent fashion.[1]
Jung W. Suh and Youngmin Kim have talked about how implementation of MATLAB
codes can be accelerated using powerful Graphical Processing Units (GPUs) for compu-
tationally heavy projects after simple simulations. Since MATLAB uses a vector/matrix
representation of data, which is suitable for parallel processing, it can benefit a lot from
GPU acceleration. They have further delved into MATLAB programming on GPU sup-
ported computers and how with the help of vectorization , speedup can be achieved
compared to implementation on normal CPUs.[5]
In our project we are going to combine the above two approaches and try to develop
some algorithms for a few ordinary differential equations(ODEs) and make vectorized
codes for their implementation in MATLAB using Parallel Computing Toolbox. Lastly
we are going to run these codes on a GPU supported computer for faster computation and
throughput.This will help us to analyse our results by comparing the time of execution
and eventual speedup compared to normal CPUs.
4
Chapter 3
Overview
3.1 Historical Background

Numerical algorithms are at least as old as the Egyptian Rhind papyrus (c. 1650 bc),
which describes a root-finding method for solving a simple equation. Ancient Greek
mathematicians made many further advancements in numerical methods. In particular,
Eudoxus of Cnidus (c. 400350 bc) created and Archimedes (c. 285212/211 bc) perfected
the method of exhaustion for calculating lengths, areas, and volumes of geometric figures.
When used as a method to find approximations, it is in much the spirit of modern nu-
merical integration; and it was an important precursor to the development of calculus by
Isaac Newton (16421727) and Gottfried Leibniz (16461716).
3.2 Numerical Methods

Numerical methods have been around for a long time. However, the usage of numerical
methods was limited due to the lengthy hand calculations involved in their implementa-
tion. In our current society the application of numerical analysis and numerical methods
occurs in just about every field of science and engineering. This is due in part to the
rapidly changing digital computer industry. Digital computers have provided a fast com-
putational device for the development and implementation of numerical methods which
can handle a variety of difficult mathematical problems.
Numerical methods are techniques by which mathematical problems are formulated so
that they can be solved with arithmetic and logical operations. Because digital comput-
ers excel at performing such operations, numerical methods are sometimes referred to as
computer mathematics.
In the pre-computer era, the time and drudgery of implementing such calculations seri-
ously limited their practical use. However, with the advent of fast, inexpensive digital
computers, the role of numerical methods in engineering and scientific problem solving
has exploded.
3.3 Why Study Numerical Methods?

Numerical methods are capable of handling large systems of equations, nonlineari-
ties, and complicated geometries that are not uncommon in engineering and science
and that are often impossible to solve analytically with standard calculus.
Numerical methods allow us to use canned software with insight.
Many problems cannot be approached using canned programs. If we are conversant

with numerical methods, and are adept at computer programming, we can design
our own programs to solve problems without having to buy or commission expensive
software.
Because numerical methods are expressly designed for computer implementation,

they are ideal for illustrating the computers powers and limitations. When we
successfully implement numerical methods on a computer, and then apply them to
solve otherwise intractable problems, we get a demonstration of how computers can
serve our professional development.
Numerical methods provide a vehicle for us to reinforce your understanding of math-

ematics. Because one function of numerical methods is to reduce higher mathematics
to basic arithmetic operations. Enhanced understanding and insight can result from
this alternative perspective.
Figure 3.1: Different Numerical Methods [1]
3.4 Numerical Methods used

We have solved Ordinary Differential Equations (ODEs) using numerical methods in our
project.The analytic methods of solving differential equation are applicable only to limited
class of equation and often differential equations appearing in physical problems do not
belong to any of these familiar types and one is obliged to resort to numerical methods.
We have used following three methods to solve ODEs :
3.4.1 Eulers method

Consider the differential equation
dy
= f (xi , yi ) where y(x0 ) = y0 (3.1)
dx
6
This estimate can be substituted into the equation
yi+1 = yi + hf (xi , yi ) (3.2)
This formula is referred to as Eulers method (or the Euler-Cauchy or point-slope method).
A new value of y is predicted using the slope (equal to the first derivative at the original
value of x) to extrapolate linearly over the step size h.
Figure 3.2: Graphical representation of eulers method
3.4.2 Modified Eulers method

The process used in Eulers method is very slow and obtain reasonable accuracy we need to
take a smaller value of h.Because of this restriction,the method is unsuitable for practical
use and hence a modification of it,known as modified Eulers method,which gives more
accurate results, that is for, equation 3.1
The general solution is
yi+1 = yi + (h/2)[f (xi , yi ) + f (xi+1 , yi +1 )] (3.3)
Figure 3.3: Graphical representation of modified eulers method
7
3.4.3 Runge-Kutta method of fourth order
In addition to overcome the inefficiency and unsuitability of Eulers method due to re-
quirement of smallness of h for attaining reasonable accuracy, Runge-Kutta methods are
designed to give greater accuracy and they possess the advantage of requiring only the
function values at some selected points on the sub-interval.Considering equation 3.1 the
general solution is
yi+1 = yi + (1/6)[k1 + 2k2 + 2k3 + k4 ] where (3.4)
k1 = hf (xi , yi )
k2 = hf (xi + h/2, yi + k1 /2)
(3.5)
k3 = hf (xi + h/2, yi + k2 /2)
k4 = hf (xi + h, yi + k3 )
Figure 3.4: Graphical representation of RK4 method
3.4.4 Application of Numerical Methods

Problems involving analysis of certain ecological and economical approximation for
better prediction use numerical methods to solve the problem specific generated
equation.
Various complex circuits that can be modelled as an RLC circuit, whose response
can be found out by solving the differential equation using numerical methods.
Astronomical projects involves trajectory control that uses numerical methods to

solve the equations that regards to the path.
Location services use numerical methods to approximate the location of the device.
8
3.5 Graphics Processing Unit (GPU)
A graphics processing unit (GPU), also occasionally called visual processing unit (VPU),
is a specialized electronic circuit designed to rapidly manipulate and alter memory to
accelerate the creation of images in a frame buffer intended for output to a display.
GPUs are used in embedded systems, mobile phones, personal computers,workstations,
and game consoles. Modern GPUs are very efficient at manipulating computer graphics
and image processing, and their highly parallel structure makes them more efficient than
general-purpose CPUs for algorithms where the processing of large blocks of data is done
in parallel.
The term GPU was popularized by Nvidia in 1999, who marketed the GeForce 256 as
the worlds first GPU, or Graphics Processing Unit. The GPUs advanced capabilities
were originally used primarily for 3D game rendering. But now those capabilities are
being harnessed more broadly to accelerate computational workloads in areas such as fi-
nancial modeling, cutting-edge scientific research and oil and gas exploration. GPUs are
optimized for taking huge batches of data and performing the same operation over and
over very quickly, unlike PC microprocessors, which tend to skip all over the place.
Comparison between GPU and CPU
Architecturally, the CPU is composed of just few cores with lots of cache memory that
can handle a few software threads at a time. In contrast, a GPU is composed of hun-
dreds of cores that can handle thousands of threads simultaneously. The ability of a GPU
with 100+ cores to process thousands of threads can accelerate some software by 100x
over a CPU alone. Whats more, the GPU achieves this acceleration while being more
power- and cost-efficient than a CPU. GPUs have thousands of cores to process parallel
Figure 3.5: Difference between CPU and GPU [27]
workloads efficiently. In hybrid CPU-GPU systems, CPUs and GPUs are used in parallel
compared to a heterogeneous co-processing computing model. Computationally-intensive
parts which can be processed in a massive-parallel manner are accelerated by the GPU
in order to benefit from their high computing performance while the CPU among other
tasks works on sequential algorithms. Overall, the application runs faster and the shar-
ing of tasks makes processing computationally-intensive algorithms very efficiently. The
performance advantage of graphics processing units makes this technology particularly
interesting for scientific applications. GPU computing is the use of a GPU to do general
purpose scientific and engineering computing. Its introduction opened new doors in the
9
Figure 3.6: Diff. between CPU and GPU [4]
areas of research and science.

Due to their massive-parallel architecture, using GPUs enables the completion of compu-
tationally intensive assignments much faster compared with conventional CPUs.
This is why GPU computing has enormous potential - particularly in areas where data
and compute-intensive basic research requires the processing of large volumes of measure-
ment data.
The CPU (central processing unit) has often been called the brains of the PC. But in-
creasingly, that brain is being enhanced by another part of the PC the GPU (graphics
processing unit), which is its soul.
3.5.1 System specifications

The specifications of the system on which our project is executed are as follows:
Operating System : Windows 10 Pro 64
Processor and graphics :Intel Core i7-7500U (2.7 GHz, up to 3.5 GHz, 2 cores)
+ NVIDIA R
GeForce
R
930MX (2GB DDR3 dedicated) (For SSD)
Memory : 16GB DDR4-2133 SDRAM (2 x 8GB)
Matlab Version : MATLAB 8.5 (R2015a)
3.6 MATLAB
MATLAB is a programming language developed by MathWorks. It started out as a ma-
trix programming language where linear algebra programming was simple. MATLAB
10
Figure 3.7: GPU Specifications
(matrix laboratory) is a fourth-generation high-level programming language and interac-

tive environment for numerical computation, visualization and programming.
Matlab was Developed primarily by Cleve Moler in the 1970s.It was Derived from FOR-
TRAN subroutines LINPACK (linear system package)and EISPACK (Eigen system pack-
age), linear and eigenvalue systems.It was rewritten in C in the 1980s with more function-
ality, which include plotting routines.The MathWorks Inc. was created (1984) to market
and continue development of MATLAB.
MATLAB has numerous built-in commands and math functions that help us in mathe-
matical calculations, generating plots, and performing numerical methods.
MATLABs Power of Computational Mathematics
MATLAB is used in every facet of computational mathematics. Following are some

commonly used mathematical calculations where it is used most commonly
Dealing with Matrices and Arrays
2-D and 3-D Plotting and graphics
Linear Algebra
Statistics
Data Analysis
Calculus and Differential Equations
Numerical Calculations
Integration
11
Transforms
Curve Fitting
Features of MATLAB
Following are the basic features of MATLAB

It is a high-level language for numerical computation, visualization and application
development.
It also provides an interactive environment for iterative exploration, design and

problem solving.
It provides vast library of mathematical functions for linear algebra, statistics,

Fourier analysis, filtering, optimization, numerical integration and solving ordinary
differential equations.
It provides built-in graphics for visualizing data and tools for creating custom plots.
MATLABs programming interface gives development tools for improving code qual-
ity maintainability and maximizing performance.
It provides tools for building applications with custom graphical interfaces.
It provides functions for integrating MATLAB based algorithms with external ap-
plications and languages such as C, Java, .NET and Microsoft Excel.
Uses of MATLAB
MATLAB is widely used as a computational tool in science and engineering encompassing

the fields of physics, chemistry, math and all engineering streams. It is used in a range of
applications including
Signal Processing and Communications
Image and Video Processing
Control Systems
Test and Measurement
Computational Finance
Computational Biology
User Interface Of Matlab
MATLAB development IDE can be launched from the icon created on the desktop. The
main working window in MATLAB is called the desktop. When MATLAB is started, the
desktop appears in its default layout
The desktop has the following panels:
Current Folder : This panel allows you to access the project folders and files.
Command Window : This is the main area where commands can be entered at the
12
Figure 3.8: Understanding Matlab Environment [22]
command line. It is indicated by the command prompt ().

Workspace : The workspace shows all the variables created and/or imported from files.
Command History : This panel shows or rerun commands that are entered at the
command line.
The M Files
MATLAB allows writing two kinds of program files

Scripts : Script files are program files with .m extension. In these files, you write series
of commands, which you want to execute together. Scripts do not accept inputs and do
not return any outputs. They operate on data in the workspace.
Functions : Functions files are also program files with .m extension. Functions can ac-
cept inputs and return outputs. Internal variables are local to the function.
MATLAB editor or any other text editor can be used to create .mfiles.A script file con-
tains multiple sequential lines of MATLAB commands and function calls. A script can
be run by typing its name at the command line.
3.7 Parallel Computing Toolbox

In scientific computing, Matlab is a visualization software which contains numerical anal-
ysis, matrix operations, signal processing and graphical display. It contains rich toolbox
functions, and can get a good solution of problems in the field of system simulation and
calculation which encountered in the study. But Matlab computing efficiency is low, com-
pared with other high-level languages, Matlab program execution is slow . MATLAB can
13
Figure 3.9: Use of M-file [22]
provide support for NVIDIA GPU by using Parallel Computing Toolbox, This support
allows and scientists make MATLAB computing computing to be faster, and without
having to do reprogramming, or increase equipment cost.
MATLAB has been written most for serial computation. That means it is run on
a single computer which having a single Central Processing Unit (CPU) .Therefore,the
problem will be divided into a number of series instructions. Where the execution of
the instructions will be sequentially Parallel computing is one of the computing methods
which execute many computation (processes) simultaneously. Where the principle of
parallel computing is often can be divided the large problem into smaller pieces, then
solved that concurrently (in parallel). On other words, parallel computing is use of the
multiple compute resources to solve a computational problem simultaneously .In fact, the
main advantages of parallel computing are :
1) save time and/or money;
2) solve larger problems;
3) provide concurrency;
4) use of non-local resources;
5) limits to serial computing
MATLAB provides useful tools for parallel processing from the Parallel Computing
Toolbox. The toolbox provides diverse methods for parallel processing, such as multiple
computers working via a network, several cores in multicore machines, cluster computing
as well as GPU parallel processing. Within the scope of this project, we focus more on
GPU part of the Parallel Computing Toolbox. One of good things in the toolbox is that we
can take advantage of GPUs without explicit CUDA programming or c-mex programming.
However, this comes with a heavy price tag for us to install the Parallel Computing
Toolbox. This chapter discusses GPU processing for built-in MATLAB functions. GPU
14
Figure 3.10: Acceleration by GPU [3]
processing for non-built-in MATLAB functions. Key Features of PCT are :
Parallel for-loops (parfor) for running task-parallel algorithms on multiple processors
Support for CUDA-enabled NVIDIA GPUs
Full use of multicore processors on the desktop via workers that run locally Com-
puter cluster and grid support (with MATLAB Distributed Computing Server)
Interactive and batch execution of parallel applications
Distributed arrays and single program multiple data (spmd) construct for large
dataset handling and data-parallel algorithms.
3.8 Using GPU in Matlab Parallel Computing Tool-

box
MATLAB was one of the early adopters of GPU in their products, even when GPU de-
velopment was still in its infancy. MATLAB has developed GPU capabilities in their
Parallel Computing Toolbox, and users can easily modify their codes to take advantage
of the GPU hardware installed on a computer.MATLAB only works on NVIDIA CUDA
GPUs.
The MATLAB parallel.gpu.GPUArray Class
MATLAB developers have encapsulated all the GPU functionalities within the Parallel
Computing Toolbox into a class called the parallel.gpu.GPUArray class. Heres a simple
example to illustrate the ease of using the class in your MATLAB program: The example
above creates a random array of 4x4 matrix and computes the sine of the random values.
All these are done on the main CPU. To perform the operation on the GPU, the values
first need to be transferred to the GPU, using the gpuArray() function. Next, the sine()
operation (overloaded to run on the GPU processors automatically) will operate on the
transferred data, within the GPU workspace. The final function, gather(), transfers the
data back to from the GPU to the local CPU workspace. The example above illustrates
how MATLAB has overloaded many of its built-in functions to run on the GPU hardware.
15
Figure 3.11: Matrix creation in CPU [5]
Figure 3.12: Implementation using GPU [5]
To find out which MATLAB functions are overloaded to use the GPUArray class, type:
methods(parallel.gpu.GPUArray) at the MATLAB prompt.
Does GPU speed up MATLAB programs?
The GPU is a specialised distributed computing hardware within a computer system.

Therefore, the normal criteria for distributed computing apply to GPU as well. Pro-
grams that are massively parallel (i.e. that can be subdivided into many individual
sub-programs) will run well on the GPU. In addition to this, the bottleneck on the GPU
is the data transfer between the main CPU memory to and from the GPU memory. The
GPU cannot access the CPU memory directly, and this is one of the drawbacks of GPU
computing.
Various commands of PCT used to access GPU
1. gpuArray
It is used to create an array on GPU.
Syntax :
G = gpuArray(X)
16
Description :
G = gpuArray(X) copies the numeric array X to the GPU, and returns a gpuArray
object.One can operate on this array by passing its gpuArray to the feval method of
a CUDA kernel object, or by using one of the methods defined for gpuArray objects
in Establish Arrays on a GPU. The MATLAB array X must be numeric (for exam-
ple: single, double, int8, etc.) or logical, and the GPU device must have sufficient
free memory to store the data. If the input argument is already a gpuArray, the
output is the same as the input. Gather command can be used to retrieve the array
from the GPU to the MATLAB. workspace.
2. gather
It is used to transfer distributed array or gpuArray to local workspace.
Syntax :
X = gather(A)
Description :
X = gather(A) can operate inside an spmd(single program multiple data) statement,
pmode, or communicating job to gather together the elements of a codistributed
array, or outside an spmd statement to gather the elements of a distributed array.
If you execute this inside an spmd statement, pmode, or communicating job, X is a
replicated array with all the elements of the array on every worker. If you execute
this outside an spmd statement, X is an array in the local workspace, with the
elements transferred from the multiple workers.
X = gather(distributed(X)) returns the original array X.For a gpuArray input, X
= gather(A) transfers the array elements from the GPU to the local workspace.
If the input argument to gather is not a distributed, a codistributed, or a gpuArray,
the output is the same as the input.
Figure 3.13: Execution of a program in GPU [29]
17
Chapter 4
System Methodology
Figure 4.1: Project flowchart

4.1 Implementation
The Basic segmentation of this project contains 4 major phases:
The first phase includes choosing an appropriate mathematical function to implement in
the MATLAB. Its really crucial that the functions that are picked are analytically clear
with the user itself. Knowing the range of outputs plays an important part in rectifying
the program as suitable. Thus, the functions that we are going to eventually use, or a
part of it had already been evaluated so that the pilot values or the approach is estimated.
Moreover, the functions that are taken here correlates to Differential Equations, which
will be the main focus point for this project.
The second phase includes writing the scalar codes for the functions.When you add, sub-
tract, multiply or divide a matrix by a number, this is called the scalar operation.
Scalar operations produce a new matrix with same number of rows and columns with
each element of the original matrix added to, subtracted from, multiplied by or divided
by the number.
We will intend to use these simple methods to implement our functions.Once the function
is successfully programmed, we shall use the Run and Time and Tic/Toc features of
MATLAB to get an idea that how much time is taken by the different steps of the function
to actually get implemented when the program is functioning in real time. This will let
us decide clearly which steps take the most time so that instead of working towards opti-
misation of the whole program at once, we can segregate the priorities for optimisation.
The third phase includes the vectorization of these functions.Vector operations in Matlab
allow you to apply a single command to an entire array. In fact what is happening
is that single command is applied over and over again to every element of the array.
Vectorized operations are equivalent to for loops and all vectorized operations can be
replaced with for loops. A Vector operation in Matlab is the ability to write condensed
code to apply an action to every element of an array with a single line of code. Vectorized
operators look like these basic math operators and generally do almost the same thing.
But when applied to an Array (or matrix), these operations are performed over EVERY
element of the array (very similar to our notion of a loop).
In this phase, we will vectorize our given functions, that is eliminating the loops that are
present in the program without changing the final output. Then using the features of
Run and Time and Tic/Toc again to get the run time data of the program.
The fourth phase includes comparison of the implication data of both the methods, that
is the same function when implied by scalar coding and vector coding. The normal ob-
servation that we wish to draw is the amount of time that the processor saves while
implementing the same code in vector method with respect to scalar method. This ob-
servations are crucial since our final target is eventually making the processing of our
functions more and more agile and swift, for the purpose of which we will eventually use
parallel processing as well of GPU.
An important part of our software implementation involves developing of sequential and

vectorized codes.
Sequential codes : They are the codes which execute in an ordered sequence , that
is, one after the other with respect to a predetermined condition. For example, take a
vector v=1,2,3,4 and let us write a sequential code in MATLAB to square every element
of the vector as shown,
19
v = [1 2 3 4 ] ;
f o r i =1:4
v ( i )=v ( i ) 2 ;
end
This code will square every element of the vector v sequentially ,that is, one element at a
time.
Vectorized codes : They are the codes which works on every element of a matrix
or vector parallel , that is, at the same time. Taking the same above example, let us write
a vectorized code for squaring the elements of vector v as shown,
v = [1 2 3 4 ] ;
v=v . 2 ;
This code will work on all the four elements at the same time. Thus, as a result the
vectorized codes are more faster and less time consuming than sequential codes.
Tic-toc command : TIC and TOC functions of MATLAB work together to measure
elapsed time.tic, by itself, saves the current time that TOC uses later to measure the time
elapsed between the two.toc, by itself, displays the elapsed time, in seconds, since the
most recent execution of the TIC command.
TSTART = tic saves the time to an output argument, TSTART. The numeric value of
TSTART is only useful as an input argument for a subsequent call to TOC.
T = toc; saves the elapsed time in T as a double scalar. toc(TSTART) measures the time
elapsed since the TIC command that generated TSTART.
The sample implementation of Euler method on a particular ODE and their sequen-
tial,vectorized and GPU executed code and their respective speedups are shown in the
following section.
Given ODE
dy
= 3sin(2x)cos(3x) (4.0)
dx
Exact solution of the ODE
y(x) = (12(10tan(x/2)2 20tan(x/2)4 + 30tan(x/2)6
(4.0)
5tan(x/2)8 + 1))/(5(tan(x/2)2 + 1)5 ) 7/5
4.1.1 Implementation by Euler Method

Sequential code
Sequential codes are written using iterative for loops. Here we evaluate the output solution
matrix based on input samples.We have taken 107 samples, that is, the operation written
inside the for loop is executed 107 times. The sequential code of solution of ODE 6.6 is
as follows,
n =10000000;
t 0 =0; t 1 =10; y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ; y (1)= y0 ;
20
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h (3 s i n (2 t ( i ) ) c o s (3 t ( i ) ) ) ;
end ;
s e q u e n t i a l t i m e=t o c ;
Y s e q u e n t i a l=y ;
Profiler
To find the part of code which takes maximum time, we use the Run and Time feature of
Matlab.This opens up a Profiler which gives us line by line analysis of the time taken by
all parts of the code.The part which takes maximum time is vectorized and subsequently
given to GPU for speedup.
Figure 4.2: Profiler showing lines taking most amount of time
Figure 4.3: Profiler showing line by line analysis of time taken by the complete code
21
Vectorized code
In contrast to sequential code, where each value of a matrix is evaluated one after the
other, in a vectorized code all the values of a matrix are evaluated together at the same
time, that is in parallel. Hence, a vectorized code takes considerably less time than a
sequential code.
Vectorization of a code helps us to further do its parallel processing on a GPU. The
vectorized version of the earlier sequential code is as shown,
n =10000000;
t 0 =0; t 1 =10; y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ; z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
z ( i +1)=h ( 3 . s i n (2 t ( i ) ) . c o s (3 t ( i ) ) ) ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c ;
GPU code
For getting speedup, the same vectorized code written previously is executed on a GPU
system, but before that the input variables required are transferred from a CPU to a
GPU using gpuArray() command. After the code is executed by the GPU , the calculated
output values are transferred from GPU to CPU using gather() command.The GPU code
for the vectorized code is as shown,
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=h ( 3 . s i n (2 t ( i ) ) . c o s (3 t ( i ) ) ) ;
y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c ;
Y gpu=g a t h e r ( y gpu ) ;
For our project we have taken eight ODE functions and developed their sequential
codes to solve them using Euler,Modified Euler and Runge-kutta (fourth order) method.
Then we did their profiling using Run and Time feature of Matlab and developed their
vectorized codes for speedup. These codes were subsequently converted for their imple-
mentation on GPU, to get maximum speedup.
The speedup analysis of the eight functions using all three methods is given in chapter 5.
For sample implementation and analysis we have taken one particular ODE out of the
eight and compared the speedup of its sequential,vectorized and GPU executed codes
22
using all three numerical methods.Their respective codes are given in chapter 6.
The speedup analysis of the above ODE and comparison with its exact solution shown in
chapter 5.
23
Chapter 5
Result and Conclusion
5.1 Results
We solved ODE 4.1 using numerical methods, implemented in Matlab and compared the
numerical solution with its exact solution (equation 4.2).
Figure 5.1: Superposition of exact solution with numerical solution
Figure 5.1 shows the superposition of graphs of exact solution given by equation 4.2
(shown in green) with that of the numerical solution of the ODE solved by any one of the
three numerical methods (shown in red).
The ODE 4.1 solved using codes for Euler, Modified Euler and Runge-Kutta(fourth or-
der) method, and their respective speedup comparison for sequential,vectorized and GPU
24
codes is tabulated in the following table.
Method Parallel vs sequen- GPU vs parallel GPU vs sequential

tial speedup speedup speedup
Euler method 6.6157 2.0224 13.479
Modified Euler 3.3212 3.7298 12.3875
method
Runge- 4.5523 2.6322 11.9827
Kutta(fourth
order)
Table 1. Speedup Comparison
Figure 5.2: Graphical variation of Speed up with respect to various sample sizes
Also the graphical variation of speedup of the ODE 4.1 with respect to various sample
sizes for all three numerical methods is shown by figure 5.2.
Similarly, we have developed codes for, and analysed the speedup of eight such ODE
functions using all the three numerical methods, as shown in figure 5.3.
5.2 Conclusion
The main conclusion we can draw is in the difference of processing times. Clearly, the
time taken by the vector codes in all the cases was less than its scalar counterpart.
Therefore, removing the loops from the program helped us to lessen the run time of
the functions. Proving it for multiple problem simply implies that any function, when
programmed in vectorized form, will take less time to generate the same precise output
in the end.
25
Figure 5.3: Speedup of ODE functions using different numerical methods
Moreover, it can be clearly seen from the analysis, how the GPU implementation im-
proved the processing speed of the functions. It can also be concluded that the increase
in complexity of the functions increases the speed-up. Another important observation
was in the sample size quantity.An improved speed-up was achieved in GPU, for a larger
sample size in comparison with its sequential and vectorized counterpart.
On the hardware side, the execution speed offered by the GPU depends on its num-
ber of cores and Compute Capability. Higher is the number of these two parameters,
logically higher will be the speedup offered by the GPU.
26
Chapter 6
Appendix
The sequential ,vectorized and GPU codes developed for ODEs to be solved using Euler,
Modified Euler and Runge-Kutta(fourth order) method are as follows :
Given ODE 1
dy
= x2 2 x + 6 (6.1)
dx
6.1 Implementation by Euler Method

Sequential code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h ( t ( i )22 t ( i ) + 6 ) ;
end ;
s e q u e n t i a l t i m e=t o c
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
27
z ( i +1)=h ( t ( i ).2 2 t ( i ) + 6 ) ;
GPU code
n =10000000;
t=gpuArray ( 0 : h : 1 0 ) ;
z (1)=1;
tic
z ( i +1)=h ( t ( i ).2 2 t ( i ) + 6 ) ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
6.2 Implementation by Modified Euler Method

Sequential code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+(h / 2 ) ( ( t ( i )22 t ( i )+6)+( t ( i +1)22 t ( i +1)+6));
end
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
z=z e r o s ( 1 , l e n g t h ( t ) ) ;
z (1)=1;
i =1:n ;
28
t=t 0 : h : t 1 ;
tic
z ( i +1)=(h / 2 ) ( ( t ( i ) . 2 2 . t ( i )+6)+( t ( i +1).2 2. t ( i +1)+6));
p a r a l l e l t i m e=t o c
GPU code
n =10000000;
t=gpuArray ( 0 : h : 1 0 ) ;
z (1)=1;
tic
z ( i +1)=(h / 2 ) ( ( t ( i ) . 2 2 . t ( i )+6)+( t ( i +1).2 2. t ( i +1)+6));
6.3 Implementation by Runge-Kutta(fourth order)

Sequential code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
k1=h ( t ( i )22 t ( i ) + 6 ) ;
k2=h ( ( t ( i )+h/2)2 2( t ( i )+h / 2 ) + 6 ) ;
k3=h ( ( t ( i )+h/2)2 2( t ( i )+h / 2 ) + 6 ) ;
k4=h ( ( t ( i )+h)2 2( t ( i )+h ) + 6 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+k ;
end ;
Vectorized code
29
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
tic
i =1:n ;
t=t 0 : h : t 1 ;
k1=h ( t ( i ) . 2 2 . t ( i ) + 6 ) ;
k2=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k3=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k4=h ( ( t ( i )+h ) . 2 2 . ( t ( i )+h ) + 6 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
GPU code
n =10000000;
t=gpuArray ( 0 : h : 1 0 ) ;
z (1)=1;
tic
k1=h ( t ( i ) . 2 2 . t ( i ) + 6 ) ;
k2=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k3=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k4=h ( ( t ( i )+h ) . 2 2 . ( t ( i )+h ) + 6 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
Given ODE 2
dy
= 2 exp(x) 4 x3 (6.2)
dx

Sequential code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
30
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h (2 exp ( t ( i )) 4 t ( i ) 3 ) ;
end ;
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
z ( i +1)=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;
GPU code
n =10000000;
t=gpuArray ( 0 : h : 1 0 ) ;
z (1)=1;
tic
z ( i +1)=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;

Sequential code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
31
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+(h / 2 ) ( ( 2 exp ( t ( i )) 4 t ( i )3)+(2 exp ( t ( i +1))4 t ( i + 1 ) 3 ) ) ;
end
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
z=z e r o s ( 1 , l e n g t h ( t ) ) ;
z (1)=1;
i =1:n ;
t=t 0 : h : t 1 ;
tic
z ( i +1)=(h / 2 ) ( ( 2 exp ( t ( i )) 4. t ( i ) . 3 ) + ( 2 exp ( t ( i +1)) 4. t ( i + 1 ) . 3 ) ) ;
GPU code
n =10000000;
t=gpuArray ( 0 : h : 1 0 ) ;
z (1)=1;
tic
z ( i +1)=(h / 2 ) ( ( 2 exp ( t ( i )) 4. t ( i ) . 3 ) + ( 2 exp ( t ( i +1)) 4. t ( i + 1 ) . 3 ) ) ;

Sequential code
32
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
k1=h (2 exp ( t ( i )) 4 t ( i ) 3 ) ;
k2=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) 3 ) ;
k3=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) 3 ) ;
k4=h (2 exp ( t ( i )+h) 4( t ( i )+h ) 3 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+k ;
end ;
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
tic
i =1:n ;
t=t 0 : h : t 1 ;
k1=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;
k2=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k3=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k4=h (2 exp ( t ( i )+h) 4( t ( i )+h ) . 3 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
GPU code
n =10000000;
t=gpuArray ( 0 : h : 1 0 ) ;
z (1)=1;
33
tic
k1=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;
k2=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k3=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k4=h (2 exp ( t ( i )+h) 4( t ( i )+h ) . 3 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
Given ODE 3
dy
= log(x)4 3 log(5 x) + exp(4 x) sin(2 x) + cos(x) (6.3)
dx

Sequential code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h ( ( l o g ( t ( i )))4 3 l o g (5 t ( i ) )
+exp (4 t ( i ) ) s i n (2 t ( i ))+ c o s ( t ( i ) ) ) ;
end ;
Vectorized code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
z ( i +1)=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
34
GPU code
n =10000000;
t=gpuArray ( 1 : h : 1 0 ) ;
z (1)=1;
tic
z ( i +1)=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;

Sequential code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
%ynew=y ( i )+h f ( x ( i ) , y ( i ) ) ;
y ( i +1)=y ( i )+(h / 2 ) ( ( ( l o g ( t ( i )))4 3 l o g (5 t ( i ))+ exp(4 t ( i ) )
s i n (2 t ( i ))+ c o s ( t ( i ) ) ) + ( ( l o g ( t ( i +1)))4 3 l o g (5 t ( i +1))
+exp (4 t ( i +1)) s i n (2 t ( i +1))+ c o s ( t ( i + 1 ) ) ) ) ;
end
Vectorized code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
z=z e r o s ( 1 , l e n g t h ( t ) ) ;
z (1)=1;
i =1:n ;
35
t=t 0 : h : t 1 ;
%ynew=y ( i )+h f ( x ( i ) , y ( i ) ) ;
tic
z ( i +1)=(h / 2 ) ( ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) )
. s i n (2 t ( i ))+ c o s ( t ( i ) ) ) + ( ( l o g ( t ( i +1))).4 3 l o g (5 t ( i +1))
+exp (4 t ( i + 1 ) ) . s i n (2 t ( i +1))+ c o s ( t ( i + 1 ) ) ) ) ;
GPU code
n =10000000;
t=gpuArray ( 1 : h : 1 0 ) ;
z (1)=1;
tic
z ( i +1)=(h / 2 ) ( ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) )
. s i n (2 t ( i ))+ c o s ( t ( i ) ) ) + ( ( l o g ( t ( i +1))).4 3 l o g (5 t ( i +1))
+exp (4 t ( i + 1 ) ) . s i n (2 t ( i +1))+ c o s ( t ( i + 1 ) ) ) ) ;

Sequential code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
k1=h ( ( l o g ( t ( i )))4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
k2=h ( ( l o g ( t ( i )+h/2))4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k3=h ( ( l o g ( t ( i )+h/2))4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k4=h ( ( l o g ( t ( i )+h))4 3 l o g ( 5 ( t ( i )+h))+ exp ( 4( t ( i )+h ) )
s i n ( 2 ( t ( i )+h))+ c o s ( t ( i )+h ) ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
t ( i +1)=t ( i )+h ;
36
y ( i +1)=y ( i )+k ;
end ;
Vectorized code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
tic
i =1:n ;
t=t 0 : h : t 1 ;
k1=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
k2=h ( ( l o g ( t ( i )+h /2)).4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k4=h ( ( l o g ( t ( i )+h )).4 3 l o g ( 5 ( t ( i )+h))+ exp ( 4( t ( i )+h ) )
. s i n ( 2 ( t ( i )+h))+ c o s ( t ( i )+h ) ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
GPU code
n =10000000;
t=gpuArray ( 1 : h : 1 0 ) ;
z (1)=1;
tic
k1=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k4=h ( ( l o g ( t ( i )+h )).4 3 l o g ( 5 ( t ( i )+h))+ exp ( 4( t ( i )+h ) )
. s i n ( 2 ( t ( i )+h))+ c o s ( t ( i )+h ) ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
37
z ( i +1)=k ;
38
Bibliography
[1] Steven C. Chapra, Applied Numerical Methods with MATLAB for engineers and
scientists
[2] Rudra Pratap, Getting Started with MATLAB : A Quick Introduction for Scientists
and Engineers
[3] The NVIDIA.Product news from NVIDIA India.Retrieved 9 September 2016, from
http://www.nvidia.in/object/gpu-computing-in.html
[4] The MathWorks.Documentation from MathWorks India.Retrieved 9 September 2016,

from http://in.mathworks.com/
[5] Jung W. Suh, Youngmin Kim, Accelerating MATLAB with GPU Computing: A
Primer with Examples
[6] Konrad Rudolph,Martin Scharrer,Questions page from TEX,Bitwise

operator in pseudo-code,Retrieved 9 September 2016, from
http://tex.stackexchange.com/questions/14227/bitwise-operator-in-pseudo-code
[7] Yeo Eng Hee(HPC,Computer centre),Article on how to use MAT-

LAB in GPU computing,Retrieved 9 September 2016, from
http://www.nus.edu.sg/comcen/news/HPC/articles/GPUprogramming.pdf
[8] Kendall E. Atkinson, Weimin Han, David Stewart,NUMERICAL SOLUTION OF

ORDINARY DIFFERENTIAL EQUATIONS,A JOHN WILEY SONS Publica-
tions,available on http://www.math.uiowa.edu/NumericalAnalysisODE.
[9] B.S. Grewal ,Numerical Methods in Engineering and Science: with Programs in C and
C++
[10] Kenneth Arthur Stroud, K.A. 2001. Engineering Mathematics. 5th Edition. PAL-
GRAVE.
[11] Gilat, Amos (2004). MATLAB: An Introduction with Applications 2nd Edition. John
Wiley and Sons.
[12] Anton, H., I. Bivens, S. Davis. 2005. Calculus. 8th Edition. John Wiley Sons.
[13] John Bird,Routledge, 31-Aug-2007,Engineering Mathematics.
[14] John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and
James C. Phillips: GPU Computing, Proceedings of the IEEE, May 2008.
39
[15] John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krger, Aaron
E. Lefohn, and Tim Purcell: A Survey of General-Purpose Computation on Graphics
Hardware, Computer Graphics Forum, , March 2007.
[16] L. F. SHAMPINE I. GLADWELL S. THOMPSON,Solving ODEs with MAT-

LAB,Cambridge University Press, 2003.
[17] J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone and J.C. Phillips, GPU
Computing, Proc. IEEE, vol. 96, May 2008.
[18] T. Preis, P. Virnau, W. Paul and J.J. Schneider, GPU Accelerated Monte Carlo
Simulation of the 2D and 3D Ising Model, J. Computational Physics,2
[19] X. Bai, John Junkins, Solving initial value problems by the Picard-Chebyshev method
with NVIDIA GPUs,Article in Advances in the Astronautical Sciences,January 2010.
[20] Karsten Ahnert, Denis Demidov and Mario Mulansky, Solving Ordi-
nary Differential Equations on GPUs,Retrieved 11 September 2016, from
http://www.mariomulansky.de/data/uploads/ncwg.pdf
[21] Yiyu Cai, Simon See,GPU Computing and Applications:A collection of state of the
art research on GPU Computing and Application,Springer,November-2014.
[22] The Tutorialpoints, MATLAB Tutorials,retrieved on 19/10/2016 from

https://www.tutorialspoint.com/matlab/matlab environment.html
[23] 1.Podlubny, I.: Parallel algorithms for initial and boundary value problems for linear
ordinary differential equations and their systems. Kybernetika, vol.32
[24] W.Jie Liu,Chunye Gong, Weimin Bao, Guojian Tang and Yuewen Jiang,Research
Article on Solving the Caputo Fractional Reaction-Diffusion Equation on
GPU.Published 17 June 2014
[25] 30.K. Diethelm, An efficient parallel algorithm for the numerical solution of fractional
differential equations, Fractional Calculus and Applied Analysis, vol. 14, no. 3,2011.
[26] K. Diethelm, N.J. Ford, A.D. Freed, Yu. Luchko, Algorithms for the fractional
calculus: A selection of numerical methods.Retrieved 11 September 2016, from
http://www.sciencedirect.com/science/article/pii/S0045782504002981
[27] Peter Messmer,Introduction to GPU Computing, published in 2010, Tech- Tech-X

Corporation.
[28] Changpin Li, Fanhai Zeng,Numerical Methods for Fractional Calculus,CRC

Press,Published in May 2015
[29] Chuan-Hsiang Han(Introduction to Matlab GPU Accelera-

tion for computational finance)Retrieved 21 October 2016, from
http://www.mathworks.com/help/distcomp/run-built-in-functions-on-a-gpu.html
40
Acknowledgments
My first and sincere appreciation goes to my H.O.D., Project guide and Project Co-
Ordinator Dr.Vishwesh Vyawahare for all I have learned from him and for his continuous
help and support in all stages of this dissertation. I would like to thank Mr. Parag Patil
and Mr. Navin Singhaniya for assisting us in explaining the concepts of GPU program-
ming and solving our doubts from time to time and thanks also to Mr. Narendrakumar
Dasre for solving our doubts regarding differential equations and providing us their real
life applications.
Also Special thanks to our Principal Dr. Ramesh Vasappanavara, for providing the nec-
essary infrastructure and facilities.
Date Signature

Be Template 2016

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Be Template 2016

Uploaded by

Copyright:

Available Formats

Linear ODE Solver using GPU Computation

B.E. Project Report

For the degree of

Ameya Choukhande (13EE1153)

Animesh Bhattacharya (13EE1127)

Manoj Gurav (13EE1102)

Rohan Chavan (13EE1057)

Department of Electronics Engineering

Linear ODE Solver using GPU Computation

Ameya Choukhande (13EE1153)

Examiner 1 Examiner 2 Supervisor

Project Coordinator Head of Department Principal

(Name of student and Roll No.)

5 Result and Conclusion 24

3.1 Different Numerical Methods [1] . . . . . . . . . . . . . . . . . . . . . . . . 6

4.1 Project flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.1 Superposition of exact solution with numerical solution . . . . . . . . . . . 24

GPU Graphics Processing Unit

CPU Central Processing Unit

MATLAB Matrix Laboratory

PCT Parallel Computing Toolbox

IDE Integrated Development Environment

SPMD Single Program Multiple Data

ODE Ordinary Differential equations

GPGPU General Purpose Graphics Processing Unit

1.4 Organisation of the report

Numerical methods provide approximations to the problems in question. No mat-

According to Nicholas Ide, Parallel Computing Toolbox Technical Expert,Parallel Com-

3.1 Historical Background

3.2 Numerical Methods

3.3 Why Study Numerical Methods?

Numerical methods allow us to use canned software with insight.

Many problems cannot be approached using canned programs. If we are conversant

Because numerical methods are expressly designed for computer implementation,

Numerical methods provide a vehicle for us to reinforce your understanding of math-

Figure 3.1: Different Numerical Methods [1]

3.4 Numerical Methods used

3.4.1 Eulers method

yi+1 = yi + hf (xi , yi ) (3.2)

Figure 3.2: Graphical representation of eulers method

3.4.2 Modified Eulers method

yi+1 = yi + (h/2)[f (xi , yi ) + f (xi+1 , yi +1 )] (3.3)

Figure 3.3: Graphical representation of modified eulers method

yi+1 = yi + (1/6)[k1 + 2k2 + 2k3 + k4 ] where (3.4)

Figure 3.4: Graphical representation of RK4 method

3.4.4 Application of Numerical Methods

Astronomical projects involves trajectory control that uses numerical methods to

Comparison between GPU and CPU

Figure 3.5: Difference between CPU and GPU [27]

areas of research and science.

3.5.1 System specifications

(matrix laboratory) is a fourth-generation high-level programming language and interac-

MATLABs Power of Computational Mathematics

MATLAB is used in every facet of computational mathematics. Following are some

Dealing with Matrices and Arrays

2-D and 3-D Plotting and graphics

Calculus and Differential Equations

Following are the basic features of MATLAB

It also provides an interactive environment for iterative exploration, design and

It provides vast library of mathematical functions for linear algebra, statistics,

It provides tools for building applications with custom graphical interfaces.

MATLAB is widely used as a computational tool in science and engineering encompassing

Image and Video Processing

command line. It is indicated by the command prompt ().