Professional Documents
Culture Documents
Bachelor of Engineering
(Electronics Engineering)
by
Supervisor
Dr.Vishwesh Vyawahare
Certificate
This is to certify that, the project report-a titled
Bachelor of Engineering
(Electronics Engineering)
to the
University of Mumbai.
i
Declaration
I declare that this written submission represents my ideas in my own words and
where others ideas or words have been included, I have adequately cited and referenced
the original sources. I also declare that I have adhered to all principles of academic
honesty and integrity and have not misrepresented or fabricated or falsified any idea/-
data/fact/source in my submission. I understand that any violation of the above will be
cause for disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper permission has not
been taken when needed.
(Signature)
Date:
Abstract
In this project we are going to use MATLAB software to develop programs and algo-
rithms to solve some mathematical problems using methods of numerical analysis and
subsequently implement it on a Graphics Processing Unit (GPU) system to achieve faster
computation and throughput in least amount of time.
Numerical analysis is an area of mathematics and computer science that creates, ana-
lyzes, and implements algorithms for obtaining numerical solutions to problems involving
continuous variables. The formal academic area of numerical analysis ranges from quite
theoretical mathematical studies to computer science issues. MATLAB is one of the most
widely used mathematical computing environments in technical computing. It has an
interactive environment which provides high performance computing (HPC) procedures
and easy to use.
After developing our programs in MATLAB we transfer them to a GPU system to utilize
its parallel computing feature. A GPU has a number of threads where each thread can
execution different program. This helps us achieve significant faster computation than
a normal CPU system. A GPU is a highly parallel computing device. Its designed to
accelerate the analysis of the large datasets such as image , video and voice processing
or to increase the performance with graphics rendering , computer games. The GPU has
gained significant popularity as powerful tools for high performance computing (HPC)
because of the low cost , flexible and accessible of the GPU.
Keywords:
MATLAB,GPU,Numerical methods,GPGPU, Ordinary differential equations, Initial value
problems.
iii
Contents
Abstract iii
List of Figures vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Organisation of the report . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 3
3 Overview 5
3.1 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Why Study Numerical Methods? . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Numerical Methods used . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4.1 Eulers method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4.2 Modified Eulers method . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.3 Runge-Kutta method of fourth order . . . . . . . . . . . . . . . . . 8
3.4.4 Application of Numerical Methods . . . . . . . . . . . . . . . . . . 8
3.5 Graphics Processing Unit (GPU) . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.1 System specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.7 Parallel Computing Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.8 Using GPU in Matlab Parallel Computing Toolbox . . . . . . . . . . . . . 15
4 System Methodology 18
4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 Implementation by Euler Method . . . . . . . . . . . . . . . . . . . 20
6 Appendix 27
6.1 Implementation by Euler Method . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Implementation by Modified Euler Method . . . . . . . . . . . . . . . . . . 28
6.3 Implementation by Runge-Kutta(fourth order) . . . . . . . . . . . . . . . . 29
iv
6.4 Implementation by Euler Method . . . . . . . . . . . . . . . . . . . . . . . 30
6.5 Implementation by Modified Euler Method . . . . . . . . . . . . . . . . . . 31
6.6 Implementation by Runge-Kutta(fourth order) . . . . . . . . . . . . . . . . 32
6.7 Implementation by Euler Method . . . . . . . . . . . . . . . . . . . . . . . 34
6.8 Implementation by Modified Euler Method . . . . . . . . . . . . . . . . . . 35
6.9 Implementation by Runge-Kutta(fourth order) . . . . . . . . . . . . . . . . 36
Bibliography 39
Acknowledgments 41
v
List of Figures
vi
Abbreviations
Introduction
A large problem can usually be divided into smaller tasks that operate together in order
to create a solution. Take for example painting a house. Say you need to buy 5 liters of
paint and 5 brushes before having to paint the whole house. You can either run out and
buy everything and paint the whole house yourself, or you can get help by friends or rent
painters.
You probably want to do the latter, get help. In order to save time, you go out and buy
the paint, and another person gets the brushes. Then you get help from 4 persons that
will paint one wall of the house, each. This will save you time because you get help from
many persons, working on the same solution in parallel.
This applies to computing as well. Say you want to add two vectors v(x,y,z) and u(x,y,z),
where v=(1,2,3) and u=(4,5,6). You do this by saying v + u = x, (1,2,3)+(4,5,6)=(1+4,
2+5, 3+6)=(5,7,9). You can do this yourself, one calculation at a time, but as you proba-
bly can see, this problem can be divided into smaller problems. You can have one person
adding the x components together, another adding the y components together and a third
adding the z components together.
1.1 Motivation
The main motivation for this project are the recent works that are being done in this
field. Making processing speeds faster is what one of the crucial needs in todays world.
When we look up towards various complex mathematical computation, we tend to create
structures that will help us to perform calculations faster, where this concept of parallel
working comes into action.A graphical Processing Unit or GPU is the technology that
carries out this task in the computers.
1.2 Objective
In this project, we intend to show how a numerical method can be deciphered in shorter
duration of time using GPU. We will take some mathematical functions and imply them
on both, a CPU as well as GPU and compare their speeds, thus showing an analytical
proof by which factor is the speed improved for the implication.
1.3 Problem Definition
The basic task which we have to fulfill in this project is developing algorithms and sub-
sequent programs of some numerical methods particularly ordinary differential equations
(ODEs).This is the fundamental requirement for successful implementation on GPU as
some methods may contain some dependent terms and our major challenge will be to work
out a way to vectorize them so that they can be parallelly implemented by the GPU.
Given the Ordinary Differential equation functions, we then construct the sequential
MATLAB code for it and furthermore vectorize it.Finally we compare the processing
time of both approaches.
2
Chapter 2
Literature Survey
K. A. Stroud says that Numerical analysis involves the study of methods of computing
numerical data. In many problems this implies producing a sequence of approximations
by repeating the procedure again and again. People who employ numerical methods for
solving problems have to worry about the following issues: the rate of convergence (how
long does it take for the method to find the answer), the accuracy (or even validity) of
the answer, and the completeness of the response (do other solutions, in addition to the
one found, exist).[10]
In reference with Kendall E. Atkinson,Differential equations are among the most im-
portant mathematical tools used in producing models in the physical sciences, biological
sciences, and engineering.Ordinary Differential Equations can be solved by numerical
methods such as Eulers method,RungeKutta methods and the families of AdamsBash-
forth and AdamsMoulton methods where accuracy of result is of less importance than
speed of processing.[8]
GPU computing is the use of a GPU to do general purpose scientific and engineer-
ing computing.As given in proceedings of IEEE(volume 96,issue:5) John D. Owens,Mike
Houston with others discussed that CPU and GPU together form a heterogeneous comput-
ing model.Sequential part of the application runs on the CPU and the computationally-
intensive part runs on the GPU. From the users perspective, the application just runs
faster because it is using the high-performance of the GPU to boost performance.[14]
A simple way to understand the difference between a GPU and a CPU is to compare
how they process tasks. A CPU consists of a few cores optimized for sequential serial
processing while a GPU has a massively parallel architecture consisting of thousands of
smaller, more efficient cores designed for handling multiple tasks simultaneously and pro-
cessing parallel workloads efficiently.[3]
MATLAB is optimized for operations involving matrices and vectors. From R2016b
MATLAB Documentation,the process of revising loop-based, scalar-oriented code to use
MATLAB matrix and vector operations is called Vectorization .[4]
Steven C. Chapra had discussed various types of mathematical models and numerical
methods like ordinary differential equations(ODE) , initial value problems, etc. Further
he has tried to developed their algorithms and implement them in programming envi-
ronment of MATLAB so that the numerical methods can be used to solve problems in
engineering and science.
MATLAB offers a nice combination of handy programming features with powerful built-
in numerical capabilities. Its M-file programming environment allows us to implement
moderately complicated algorithms in a structured and coherent fashion.[1]
Jung W. Suh and Youngmin Kim have talked about how implementation of MATLAB
codes can be accelerated using powerful Graphical Processing Units (GPUs) for compu-
tationally heavy projects after simple simulations. Since MATLAB uses a vector/matrix
representation of data, which is suitable for parallel processing, it can benefit a lot from
GPU acceleration. They have further delved into MATLAB programming on GPU sup-
ported computers and how with the help of vectorization , speedup can be achieved
compared to implementation on normal CPUs.[5]
In our project we are going to combine the above two approaches and try to develop
some algorithms for a few ordinary differential equations(ODEs) and make vectorized
codes for their implementation in MATLAB using Parallel Computing Toolbox. Lastly
we are going to run these codes on a GPU supported computer for faster computation and
throughput.This will help us to analyse our results by comparing the time of execution
and eventual speedup compared to normal CPUs.
4
Chapter 3
Overview
6
This estimate can be substituted into the equation
This formula is referred to as Eulers method (or the Euler-Cauchy or point-slope method).
A new value of y is predicted using the slope (equal to the first derivative at the original
value of x) to extrapolate linearly over the step size h.
7
3.4.3 Runge-Kutta method of fourth order
In addition to overcome the inefficiency and unsuitability of Eulers method due to re-
quirement of smallness of h for attaining reasonable accuracy, Runge-Kutta methods are
designed to give greater accuracy and they possess the advantage of requiring only the
function values at some selected points on the sub-interval.Considering equation 3.1 the
general solution is
k1 = hf (xi , yi )
k2 = hf (xi + h/2, yi + k1 /2)
(3.5)
k3 = hf (xi + h/2, yi + k2 /2)
k4 = hf (xi + h, yi + k3 )
Various complex circuits that can be modelled as an RLC circuit, whose response
can be found out by solving the differential equation using numerical methods.
Location services use numerical methods to approximate the location of the device.
8
3.5 Graphics Processing Unit (GPU)
A graphics processing unit (GPU), also occasionally called visual processing unit (VPU),
is a specialized electronic circuit designed to rapidly manipulate and alter memory to
accelerate the creation of images in a frame buffer intended for output to a display.
GPUs are used in embedded systems, mobile phones, personal computers,workstations,
and game consoles. Modern GPUs are very efficient at manipulating computer graphics
and image processing, and their highly parallel structure makes them more efficient than
general-purpose CPUs for algorithms where the processing of large blocks of data is done
in parallel.
The term GPU was popularized by Nvidia in 1999, who marketed the GeForce 256 as
the worlds first GPU, or Graphics Processing Unit. The GPUs advanced capabilities
were originally used primarily for 3D game rendering. But now those capabilities are
being harnessed more broadly to accelerate computational workloads in areas such as fi-
nancial modeling, cutting-edge scientific research and oil and gas exploration. GPUs are
optimized for taking huge batches of data and performing the same operation over and
over very quickly, unlike PC microprocessors, which tend to skip all over the place.
Architecturally, the CPU is composed of just few cores with lots of cache memory that
can handle a few software threads at a time. In contrast, a GPU is composed of hun-
dreds of cores that can handle thousands of threads simultaneously. The ability of a GPU
with 100+ cores to process thousands of threads can accelerate some software by 100x
over a CPU alone. Whats more, the GPU achieves this acceleration while being more
power- and cost-efficient than a CPU. GPUs have thousands of cores to process parallel
workloads efficiently. In hybrid CPU-GPU systems, CPUs and GPUs are used in parallel
compared to a heterogeneous co-processing computing model. Computationally-intensive
parts which can be processed in a massive-parallel manner are accelerated by the GPU
in order to benefit from their high computing performance while the CPU among other
tasks works on sequential algorithms. Overall, the application runs faster and the shar-
ing of tasks makes processing computationally-intensive algorithms very efficiently. The
performance advantage of graphics processing units makes this technology particularly
interesting for scientific applications. GPU computing is the use of a GPU to do general
purpose scientific and engineering computing. Its introduction opened new doors in the
9
Figure 3.6: Diff. between CPU and GPU [4]
3.6 MATLAB
MATLAB is a programming language developed by MathWorks. It started out as a ma-
trix programming language where linear algebra programming was simple. MATLAB
10
Figure 3.7: GPU Specifications
Linear Algebra
Statistics
Data Analysis
Numerical Calculations
Integration
11
Transforms
Curve Fitting
Features of MATLAB
It provides built-in graphics for visualizing data and tools for creating custom plots.
MATLABs programming interface gives development tools for improving code qual-
ity maintainability and maximizing performance.
It provides functions for integrating MATLAB based algorithms with external ap-
plications and languages such as C, Java, .NET and Microsoft Excel.
Uses of MATLAB
Control Systems
Computational Finance
Computational Biology
User Interface Of Matlab
MATLAB development IDE can be launched from the icon created on the desktop. The
main working window in MATLAB is called the desktop. When MATLAB is started, the
desktop appears in its default layout
The desktop has the following panels:
Current Folder : This panel allows you to access the project folders and files.
Command Window : This is the main area where commands can be entered at the
12
Figure 3.8: Understanding Matlab Environment [22]
The M Files
13
Figure 3.9: Use of M-file [22]
provide support for NVIDIA GPU by using Parallel Computing Toolbox, This support
allows and scientists make MATLAB computing computing to be faster, and without
having to do reprogramming, or increase equipment cost.
MATLAB has been written most for serial computation. That means it is run on
a single computer which having a single Central Processing Unit (CPU) .Therefore,the
problem will be divided into a number of series instructions. Where the execution of
the instructions will be sequentially Parallel computing is one of the computing methods
which execute many computation (processes) simultaneously. Where the principle of
parallel computing is often can be divided the large problem into smaller pieces, then
solved that concurrently (in parallel). On other words, parallel computing is use of the
multiple compute resources to solve a computational problem simultaneously .In fact, the
main advantages of parallel computing are :
1) save time and/or money;
2) solve larger problems;
3) provide concurrency;
4) use of non-local resources;
5) limits to serial computing
MATLAB provides useful tools for parallel processing from the Parallel Computing
Toolbox. The toolbox provides diverse methods for parallel processing, such as multiple
computers working via a network, several cores in multicore machines, cluster computing
as well as GPU parallel processing. Within the scope of this project, we focus more on
GPU part of the Parallel Computing Toolbox. One of good things in the toolbox is that we
can take advantage of GPUs without explicit CUDA programming or c-mex programming.
However, this comes with a heavy price tag for us to install the Parallel Computing
Toolbox. This chapter discusses GPU processing for built-in MATLAB functions. GPU
14
Figure 3.10: Acceleration by GPU [3]
Full use of multicore processors on the desktop via workers that run locally Com-
puter cluster and grid support (with MATLAB Distributed Computing Server)
Distributed arrays and single program multiple data (spmd) construct for large
dataset handling and data-parallel algorithms.
MATLAB developers have encapsulated all the GPU functionalities within the Parallel
Computing Toolbox into a class called the parallel.gpu.GPUArray class. Heres a simple
example to illustrate the ease of using the class in your MATLAB program: The example
above creates a random array of 4x4 matrix and computes the sine of the random values.
All these are done on the main CPU. To perform the operation on the GPU, the values
first need to be transferred to the GPU, using the gpuArray() function. Next, the sine()
operation (overloaded to run on the GPU processors automatically) will operate on the
transferred data, within the GPU workspace. The final function, gather(), transfers the
data back to from the GPU to the local CPU workspace. The example above illustrates
how MATLAB has overloaded many of its built-in functions to run on the GPU hardware.
15
Figure 3.11: Matrix creation in CPU [5]
To find out which MATLAB functions are overloaded to use the GPUArray class, type:
methods(parallel.gpu.GPUArray) at the MATLAB prompt.
1. gpuArray
It is used to create an array on GPU.
Syntax :
G = gpuArray(X)
16
Description :
G = gpuArray(X) copies the numeric array X to the GPU, and returns a gpuArray
object.One can operate on this array by passing its gpuArray to the feval method of
a CUDA kernel object, or by using one of the methods defined for gpuArray objects
in Establish Arrays on a GPU. The MATLAB array X must be numeric (for exam-
ple: single, double, int8, etc.) or logical, and the GPU device must have sufficient
free memory to store the data. If the input argument is already a gpuArray, the
output is the same as the input. Gather command can be used to retrieve the array
from the GPU to the MATLAB. workspace.
2. gather
It is used to transfer distributed array or gpuArray to local workspace.
Syntax :
X = gather(A)
Description :
X = gather(A) can operate inside an spmd(single program multiple data) statement,
pmode, or communicating job to gather together the elements of a codistributed
array, or outside an spmd statement to gather the elements of a distributed array.
If you execute this inside an spmd statement, pmode, or communicating job, X is a
replicated array with all the elements of the array on every worker. If you execute
this outside an spmd statement, X is an array in the local workspace, with the
elements transferred from the multiple workers.
X = gather(distributed(X)) returns the original array X.For a gpuArray input, X
= gather(A) transfers the array elements from the GPU to the local workspace.
If the input argument to gather is not a distributed, a codistributed, or a gpuArray,
the output is the same as the input.
17
Chapter 4
System Methodology
Sequential codes : They are the codes which execute in an ordered sequence , that
is, one after the other with respect to a predetermined condition. For example, take a
vector v=1,2,3,4 and let us write a sequential code in MATLAB to square every element
of the vector as shown,
19
v = [1 2 3 4 ] ;
f o r i =1:4
v ( i )=v ( i ) 2 ;
end
This code will square every element of the vector v sequentially ,that is, one element at a
time.
Vectorized codes : They are the codes which works on every element of a matrix
or vector parallel , that is, at the same time. Taking the same above example, let us write
a vectorized code for squaring the elements of vector v as shown,
v = [1 2 3 4 ] ;
v=v . 2 ;
This code will work on all the four elements at the same time. Thus, as a result the
vectorized codes are more faster and less time consuming than sequential codes.
Tic-toc command : TIC and TOC functions of MATLAB work together to measure
elapsed time.tic, by itself, saves the current time that TOC uses later to measure the time
elapsed between the two.toc, by itself, displays the elapsed time, in seconds, since the
most recent execution of the TIC command.
TSTART = tic saves the time to an output argument, TSTART. The numeric value of
TSTART is only useful as an input argument for a subsequent call to TOC.
T = toc; saves the elapsed time in T as a double scalar. toc(TSTART) measures the time
elapsed since the TIC command that generated TSTART.
The sample implementation of Euler method on a particular ODE and their sequen-
tial,vectorized and GPU executed code and their respective speedups are shown in the
following section.
Given ODE
dy
= 3sin(2x)cos(3x) (4.0)
dx
Exact solution of the ODE
y(x) = (12(10tan(x/2)2 20tan(x/2)4 + 30tan(x/2)6
(4.0)
5tan(x/2)8 + 1))/(5(tan(x/2)2 + 1)5 ) 7/5
Sequential codes are written using iterative for loops. Here we evaluate the output solution
matrix based on input samples.We have taken 107 samples, that is, the operation written
inside the for loop is executed 107 times. The sequential code of solution of ODE 6.6 is
as follows,
n =10000000;
t 0 =0; t 1 =10; y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ; y (1)= y0 ;
20
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h (3 s i n (2 t ( i ) ) c o s (3 t ( i ) ) ) ;
end ;
s e q u e n t i a l t i m e=t o c ;
Y s e q u e n t i a l=y ;
Profiler
To find the part of code which takes maximum time, we use the Run and Time feature of
Matlab.This opens up a Profiler which gives us line by line analysis of the time taken by
all parts of the code.The part which takes maximum time is vectorized and subsequently
given to GPU for speedup.
Figure 4.3: Profiler showing line by line analysis of time taken by the complete code
21
Vectorized code
In contrast to sequential code, where each value of a matrix is evaluated one after the
other, in a vectorized code all the values of a matrix are evaluated together at the same
time, that is in parallel. Hence, a vectorized code takes considerably less time than a
sequential code.
Vectorization of a code helps us to further do its parallel processing on a GPU. The
vectorized version of the earlier sequential code is as shown,
n =10000000;
t 0 =0; t 1 =10; y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ; z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
z ( i +1)=h ( 3 . s i n (2 t ( i ) ) . c o s (3 t ( i ) ) ) ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c ;
GPU code
For getting speedup, the same vectorized code written previously is executed on a GPU
system, but before that the input variables required are transferred from a CPU to a
GPU using gpuArray() command. After the code is executed by the GPU , the calculated
output values are transferred from GPU to CPU using gather() command.The GPU code
for the vectorized code is as shown,
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=h ( 3 . s i n (2 t ( i ) ) . c o s (3 t ( i ) ) ) ;
y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c ;
Y gpu=g a t h e r ( y gpu ) ;
For our project we have taken eight ODE functions and developed their sequential
codes to solve them using Euler,Modified Euler and Runge-kutta (fourth order) method.
Then we did their profiling using Run and Time feature of Matlab and developed their
vectorized codes for speedup. These codes were subsequently converted for their imple-
mentation on GPU, to get maximum speedup.
The speedup analysis of the eight functions using all three methods is given in chapter 5.
For sample implementation and analysis we have taken one particular ODE out of the
eight and compared the speedup of its sequential,vectorized and GPU executed codes
22
using all three numerical methods.Their respective codes are given in chapter 6.
The speedup analysis of the above ODE and comparison with its exact solution shown in
chapter 5.
23
Chapter 5
5.1 Results
We solved ODE 4.1 using numerical methods, implemented in Matlab and compared the
numerical solution with its exact solution (equation 4.2).
Figure 5.1 shows the superposition of graphs of exact solution given by equation 4.2
(shown in green) with that of the numerical solution of the ODE solved by any one of the
three numerical methods (shown in red).
The ODE 4.1 solved using codes for Euler, Modified Euler and Runge-Kutta(fourth or-
der) method, and their respective speedup comparison for sequential,vectorized and GPU
24
codes is tabulated in the following table.
Figure 5.2: Graphical variation of Speed up with respect to various sample sizes
Also the graphical variation of speedup of the ODE 4.1 with respect to various sample
sizes for all three numerical methods is shown by figure 5.2.
Similarly, we have developed codes for, and analysed the speedup of eight such ODE
functions using all the three numerical methods, as shown in figure 5.3.
5.2 Conclusion
The main conclusion we can draw is in the difference of processing times. Clearly, the
time taken by the vector codes in all the cases was less than its scalar counterpart.
Therefore, removing the loops from the program helped us to lessen the run time of
the functions. Proving it for multiple problem simply implies that any function, when
programmed in vectorized form, will take less time to generate the same precise output
in the end.
25
Figure 5.3: Speedup of ODE functions using different numerical methods
Moreover, it can be clearly seen from the analysis, how the GPU implementation im-
proved the processing speed of the functions. It can also be concluded that the increase
in complexity of the functions increases the speed-up. Another important observation
was in the sample size quantity.An improved speed-up was achieved in GPU, for a larger
sample size in comparison with its sequential and vectorized counterpart.
On the hardware side, the execution speed offered by the GPU depends on its num-
ber of cores and Compute Capability. Higher is the number of these two parameters,
logically higher will be the speedup offered by the GPU.
26
Chapter 6
Appendix
The sequential ,vectorized and GPU codes developed for ODEs to be solved using Euler,
Modified Euler and Runge-Kutta(fourth order) method are as follows :
Given ODE 1
dy
= x2 2 x + 6 (6.1)
dx
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h ( t ( i )22 t ( i ) + 6 ) ;
end ;
s e q u e n t i a l t i m e=t o c
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
27
z ( i +1)=h ( t ( i ).2 2 t ( i ) + 6 ) ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c ;
GPU code
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=h ( t ( i ).2 2 t ( i ) + 6 ) ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+(h / 2 ) ( ( t ( i )22 t ( i )+6)+( t ( i +1)22 t ( i +1)+6));
end
s e q u e n t i a l t i m e=t o c
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
z=z e r o s ( 1 , l e n g t h ( t ) ) ;
z (1)=1;
i =1:n ;
28
t=t 0 : h : t 1 ;
tic
z ( i +1)=(h / 2 ) ( ( t ( i ) . 2 2 . t ( i )+6)+( t ( i +1).2 2. t ( i +1)+6));
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c
GPU code
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=(h / 2 ) ( ( t ( i ) . 2 2 . t ( i )+6)+( t ( i +1).2 2. t ( i +1)+6));
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
k1=h ( t ( i )22 t ( i ) + 6 ) ;
k2=h ( ( t ( i )+h/2)2 2( t ( i )+h / 2 ) + 6 ) ;
k3=h ( ( t ( i )+h/2)2 2( t ( i )+h / 2 ) + 6 ) ;
k4=h ( ( t ( i )+h)2 2( t ( i )+h ) + 6 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+k ;
end ;
s e q u e n t i a l t i m e=t o c
Y s e q u e n t i a l=y ;
Vectorized code
29
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
tic
i =1:n ;
t=t 0 : h : t 1 ;
k1=h ( t ( i ) . 2 2 . t ( i ) + 6 ) ;
k2=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k3=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k4=h ( ( t ( i )+h ) . 2 2 . ( t ( i )+h ) + 6 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c
GPU code
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
k1=h ( t ( i ) . 2 2 . t ( i ) + 6 ) ;
k2=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k3=h ( ( t ( i )+h / 2 ) . 2 2 . ( t ( i )+h / 2 ) + 6 ) ;
k4=h ( ( t ( i )+h ) . 2 2 . ( t ( i )+h ) + 6 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
Given ODE 2
dy
= 2 exp(x) 4 x3 (6.2)
dx
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
30
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h (2 exp ( t ( i )) 4 t ( i ) 3 ) ;
end ;
s e q u e n t i a l t i m e=t o c
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
z ( i +1)=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c
GPU code
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
31
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+(h / 2 ) ( ( 2 exp ( t ( i )) 4 t ( i )3)+(2 exp ( t ( i +1))4 t ( i + 1 ) 3 ) ) ;
end
s e q u e n t i a l t i m e=t o c
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
z=z e r o s ( 1 , l e n g t h ( t ) ) ;
z (1)=1;
i =1:n ;
t=t 0 : h : t 1 ;
tic
z ( i +1)=(h / 2 ) ( ( 2 exp ( t ( i )) 4. t ( i ) . 3 ) + ( 2 exp ( t ( i +1)) 4. t ( i + 1 ) . 3 ) ) ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c
GPU code
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=(h / 2 ) ( ( 2 exp ( t ( i )) 4. t ( i ) . 3 ) + ( 2 exp ( t ( i +1)) 4. t ( i + 1 ) . 3 ) ) ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
32
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
k1=h (2 exp ( t ( i )) 4 t ( i ) 3 ) ;
k2=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) 3 ) ;
k3=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) 3 ) ;
k4=h (2 exp ( t ( i )+h) 4( t ( i )+h ) 3 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+k ;
end ;
s e q u e n t i a l t i m e=t o c
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =0; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
tic
i =1:n ;
t=t 0 : h : t 1 ;
k1=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;
k2=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k3=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k4=h (2 exp ( t ( i )+h) 4( t ( i )+h ) . 3 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c
GPU code
n =10000000;
h=gpuArray ((10 0)/ n ) ;
t=gpuArray ( 0 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
33
tic
k1=h (2 exp ( t ( i )) 4 t ( i ) . 3 ) ;
k2=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k3=h (2 exp ( t ( i )+h/2) 4( t ( i )+h / 2 ) . 3 ) ;
k4=h (2 exp ( t ( i )+h) 4( t ( i )+h ) . 3 ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
Given ODE 3
dy
= log(x)4 3 log(5 x) + exp(4 x) sin(2 x) + cos(x) (6.3)
dx
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
y ( i +1)=y ( i )+h ( ( l o g ( t ( i )))4 3 l o g (5 t ( i ) )
+exp (4 t ( i ) ) s i n (2 t ( i ))+ c o s ( t ( i ) ) ) ;
end ;
s e q u e n t i a l t i m e=t o c ;
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
i =1:n ;
tic
t=t 0 : h : t 1 ;
z ( i +1)=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c ;
34
GPU code
n =10000000;
h=gpuArray ((10 1)/ n ) ;
t=gpuArray ( 1 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c ;
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
t ( i +1)=t ( i )+h ;
%ynew=y ( i )+h f ( x ( i ) , y ( i ) ) ;
y ( i +1)=y ( i )+(h / 2 ) ( ( ( l o g ( t ( i )))4 3 l o g (5 t ( i ))+ exp(4 t ( i ) )
s i n (2 t ( i ))+ c o s ( t ( i ) ) ) + ( ( l o g ( t ( i +1)))4 3 l o g (5 t ( i +1))
+exp (4 t ( i +1)) s i n (2 t ( i +1))+ c o s ( t ( i + 1 ) ) ) ) ;
end
s e q u e n t i a l t i m e=t o c
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
z=z e r o s ( 1 , l e n g t h ( t ) ) ;
z (1)=1;
i =1:n ;
35
t=t 0 : h : t 1 ;
%ynew=y ( i )+h f ( x ( i ) , y ( i ) ) ;
tic
z ( i +1)=(h / 2 ) ( ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) )
. s i n (2 t ( i ))+ c o s ( t ( i ) ) ) + ( ( l o g ( t ( i +1))).4 3 l o g (5 t ( i +1))
+exp (4 t ( i + 1 ) ) . s i n (2 t ( i +1))+ c o s ( t ( i + 1 ) ) ) ) ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c
GPU code
n =10000000;
h=gpuArray ((10 1)/ n ) ;
t=gpuArray ( 1 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
z ( i +1)=(h / 2 ) ( ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) )
. s i n (2 t ( i ))+ c o s ( t ( i ) ) ) + ( ( l o g ( t ( i +1))).4 3 l o g (5 t ( i +1))
+exp (4 t ( i + 1 ) ) . s i n (2 t ( i +1))+ c o s ( t ( i + 1 ) ) ) ) ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
y (1)= y0 ;
tic
f o r i =1:n
k1=h ( ( l o g ( t ( i )))4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
k2=h ( ( l o g ( t ( i )+h/2))4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k3=h ( ( l o g ( t ( i )+h/2))4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k4=h ( ( l o g ( t ( i )+h))4 3 l o g ( 5 ( t ( i )+h))+ exp ( 4( t ( i )+h ) )
s i n ( 2 ( t ( i )+h))+ c o s ( t ( i )+h ) ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
t ( i +1)=t ( i )+h ;
36
y ( i +1)=y ( i )+k ;
end ;
s e q u e n t i a l t i m e=t o c ;
Y s e q u e n t i a l=y ;
Vectorized code
n =10000000;
t 0 =1; t 1 =10;
y0 =1;
h=(t1t 0 ) / n ;
t (1)= t 0 ;
z (1)= y0 ;
tic
i =1:n ;
t=t 0 : h : t 1 ;
k1=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
k2=h ( ( l o g ( t ( i )+h /2)).4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k3=h ( ( l o g ( t ( i )+h /2)).4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k4=h ( ( l o g ( t ( i )+h )).4 3 l o g ( 5 ( t ( i )+h))+ exp ( 4( t ( i )+h ) )
. s i n ( 2 ( t ( i )+h))+ c o s ( t ( i )+h ) ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
z ( i +1)=k ;
Y p a r a l l e l=cumsum ( z ) ;
p a r a l l e l t i m e=t o c ;
GPU code
n =10000000;
h=gpuArray ((10 1)/ n ) ;
t=gpuArray ( 1 : h : 1 0 ) ;
z=gpuArray ( z e r o s ( 1 , l e n g t h ( t ) ) ) ;
z (1)=1;
i=gpuArray ( 2 : l e n g t h ( t ) 1);
tic
k1=h ( ( l o g ( t ( i ) ) ) . 4 3 l o g (5 t ( i ))+ exp(4 t ( i ) ) . s i n (2 t ( i ) )
+c o s ( t ( i ) ) ) ;
k2=h ( ( l o g ( t ( i )+h /2)).4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k3=h ( ( l o g ( t ( i )+h /2)).4 3 l o g ( 5 ( t ( i )+h/2))+ exp ( 4( t ( i )+h / 2 ) )
. s i n ( 2 ( t ( i )+h/2))+ c o s ( t ( i )+h / 2 ) ) ;
k4=h ( ( l o g ( t ( i )+h )).4 3 l o g ( 5 ( t ( i )+h))+ exp ( 4( t ( i )+h ) )
. s i n ( 2 ( t ( i )+h))+ c o s ( t ( i )+h ) ) ;
k=1/6( k1+2k2+2k3+k4 ) ;
37
z ( i +1)=k ;
Y gpu=cumsum ( z ) ;
p a r a l l e l g p u t i m e=t o c ;
38
Bibliography
[1] Steven C. Chapra, Applied Numerical Methods with MATLAB for engineers and
scientists
[2] Rudra Pratap, Getting Started with MATLAB : A Quick Introduction for Scientists
and Engineers
[3] The NVIDIA.Product news from NVIDIA India.Retrieved 9 September 2016, from
http://www.nvidia.in/object/gpu-computing-in.html
[5] Jung W. Suh, Youngmin Kim, Accelerating MATLAB with GPU Computing: A
Primer with Examples
[9] B.S. Grewal ,Numerical Methods in Engineering and Science: with Programs in C and
C++
[10] Kenneth Arthur Stroud, K.A. 2001. Engineering Mathematics. 5th Edition. PAL-
GRAVE.
[11] Gilat, Amos (2004). MATLAB: An Introduction with Applications 2nd Edition. John
Wiley and Sons.
[12] Anton, H., I. Bivens, S. Davis. 2005. Calculus. 8th Edition. John Wiley Sons.
[14] John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and
James C. Phillips: GPU Computing, Proceedings of the IEEE, May 2008.
39
[15] John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krger, Aaron
E. Lefohn, and Tim Purcell: A Survey of General-Purpose Computation on Graphics
Hardware, Computer Graphics Forum, , March 2007.
[17] J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone and J.C. Phillips, GPU
Computing, Proc. IEEE, vol. 96, May 2008.
[18] T. Preis, P. Virnau, W. Paul and J.J. Schneider, GPU Accelerated Monte Carlo
Simulation of the 2D and 3D Ising Model, J. Computational Physics,2
[19] X. Bai, John Junkins, Solving initial value problems by the Picard-Chebyshev method
with NVIDIA GPUs,Article in Advances in the Astronautical Sciences,January 2010.
[20] Karsten Ahnert, Denis Demidov and Mario Mulansky, Solving Ordi-
nary Differential Equations on GPUs,Retrieved 11 September 2016, from
http://www.mariomulansky.de/data/uploads/ncwg.pdf
[21] Yiyu Cai, Simon See,GPU Computing and Applications:A collection of state of the
art research on GPU Computing and Application,Springer,November-2014.
[23] 1.Podlubny, I.: Parallel algorithms for initial and boundary value problems for linear
ordinary differential equations and their systems. Kybernetika, vol.32
[24] W.Jie Liu,Chunye Gong, Weimin Bao, Guojian Tang and Yuewen Jiang,Research
Article on Solving the Caputo Fractional Reaction-Diffusion Equation on
GPU.Published 17 June 2014
[25] 30.K. Diethelm, An efficient parallel algorithm for the numerical solution of fractional
differential equations, Fractional Calculus and Applied Analysis, vol. 14, no. 3,2011.
[26] K. Diethelm, N.J. Ford, A.D. Freed, Yu. Luchko, Algorithms for the fractional
calculus: A selection of numerical methods.Retrieved 11 September 2016, from
http://www.sciencedirect.com/science/article/pii/S0045782504002981
40
Acknowledgments
My first and sincere appreciation goes to my H.O.D., Project guide and Project Co-
Ordinator Dr.Vishwesh Vyawahare for all I have learned from him and for his continuous
help and support in all stages of this dissertation. I would like to thank Mr. Parag Patil
and Mr. Navin Singhaniya for assisting us in explaining the concepts of GPU program-
ming and solving our doubts from time to time and thanks also to Mr. Narendrakumar
Dasre for solving our doubts regarding differential equations and providing us their real
life applications.
Also Special thanks to our Principal Dr. Ramesh Vasappanavara, for providing the nec-
essary infrastructure and facilities.
Date Signature