You are on page 1of 8

Convolution: Mathematically, an array of operation in which each the output data element is a weighted sum of a collection of neighboring input

elements is called as convolution. to define the weights used in the weighted sum calculation an input mask array is used, commonly referred to as the convolution kernel. There is a name conflict between the CUDA kernel functions and convolution kernels. To avoid the confusion we will refer to these mask arrays as convolution masks. same convolution mask is used for all elements of array. n other words, convolution of two functions f!t" # g!t" is defined as the area under the two curves f!t"$g!t".

IMPLEMENTATION OF CONVOLUTION: To define the ma%or input parameters for the kernel the first step is used. &e assume that the 'D convolution kernel receives five arguments( pointer to input array ), pointer to input mask M, pointer to output array *, si+e of the mask Mask,&idth, and si+e of the input and output arrays &idth. Thus, we have the following setup( ,,global,, voidconvolution,'D,basic,kernel!float ), float M, float *, int Mask,&idth,int&idth".. kernelbody / n second step the mapping of threads to output elements is determined and implemented. The output array is one dimensional, so a simple and good approach is to organi+e the threads into a 'D grid and to calculate one output element have each thread in the grid. As far as the output elements are concerned this is the same arrangement as the vector addition e0ample. Therefore, we can use the following statement to calculate an output element inde0 from the block inde0, block dimension, and thread inde0 for each thread( int i 1 block d0.0 blockDim.0' thread d0.02 3nce if we determine the output element inde0, then we can access the input ) elements and the mask M elements using off sets to the output element inde0. 4or simplicity, we can assume that Mask,&idth is an odd number and the convolution is symmetric, that is, Mask,&idth is 5 n'' where n is an integer. The calculation of *6i7 will use )6i8n7, )6i8n''7,. . ., )6i8'7, )6i7, )6i''7, . . ., )6i'n8'7, )6i'n7. &e can use a simple loop to do this calculation in the kernel( float * value 1 92 int ),start,point 1 i 8!Mask,&idth.5"2 for !int % 1 92 % , Mask,&idth2 %''" if !),start,point' % .1 9 ##),start,point' % , &idth" * value ' 1 )6),start,point' %7 M6%72 / /

*6i7 1 * value2 :.5 'D ';< *arallel Convolution is =asic Algorithm, in which The variable * value will allow all intermediate results to be accumulated in a register to save D>AM band width .All the contributions from the neighboring elements to the output * elements accumulated by for loop. f any f the input ) elements used are host elements tested by the if statement,

SOURCE CODE:

,,constant,, int MA?,MA@A,& DTBC12

,global,void convolution, D,basic,kernal !float $),float $*,float $M, int Mask,&idth,int &idth" -

int iCblock d0.0$blockDim.0Dthread d0.02 ,shared,float ),ds!T EF,@ GFDMA?,MA@A,& DTB8'"2 int nCMask,&idth.52 int halo,inde0,leftC!block d0.08'"$blockDim.0Dthread d0.02 if!thread d0.0HCblockDim.08n" ),ds6thread d0.08!blockDim.08n"7C!halo,inde0,leftI9"J9()6halo,inde0,left72 /

),ds6nDthread d0.07C)6block d0.0$ blockDim.0D thread d0.072 int halo,inde0,rightC! block d0.0D'"$ blockDim.0D thread d0.02

if!thread d0.0In" -

),ds6nDblockDim.0Dthread d0.07C!halo,inde0,rightHCwidth"J9()6halo,inde0,right72 / ,syncthreads!"2 float pvalueC92

for !int %C92%IMask,&idth2%DD" pvalueDC),ds6thread d0.0D%7$M6%72 / p6i7Cpvalue2 / int main!" int dev D C 92 if !checkCmdEine4lag!argc, !const char $$"argv, KdeviceK"" dev D C getCmdEineArgument nt!argc, !const char $$"argv, KdeviceK"2 .. @ets the Device cuda@etDevice!dev D"2 / cudaFrror,t error2 cudaDevice*rop device*rop2 error C cudaLetDevice!#dev D"2 .. f Device is not valid and checks whether the error is Cuda @uccess or not if !error MC cuda@uccess" .. *rints the error when cuda device returned Frror printf!KcudaLetDevice returned error code Nd, line!Nd"OnK, error, ,,E )F,,"2 / .. Lets the Device *roperties error C cudaLetDevice*roperties!#device*rop, dev D"2

.. f the Device mode is Compute mode prohibited. if !device*rop.computeMode CC cudaComputeMode*rohibited" .. *rints the error message fprintf!stderr, KFrror( device is running in ICompute Mode *rohibitedH, no threads can use ((cuda@etDevice!".OnK"2 e0it!F? T,@UCCF@@"2 / .. f Frror is not Cuda success if !error MC cuda@uccess" printf!KcudaLetDevice*roperties returned error code Nd, line!Nd"OnK, error, ,,E )F,,"2 / else .. prints the L*U device and its properties printf!KL*U Device Nd( OKNsOK with compute capability Nd.NdOnOnK, dev D, device*rop.name, device*rop.ma%or, device*rop.minor"2 / int nC'P2 float *6'P72 float )6'P7 C -9,',5,Q,R,1,P,;,:,<,'9,'','5,'Q,'R,'12 float M617 C -5,Q,R,1,P/2 float $d,), $d,M, $d,*2 cudaMalloc!!void $" #d,), 'P$si+eof!float""2 cudaMalloc!!void $" #d,M, 1$si+eof!float""2 cudaMalloc!!void $" #d,*, 'P$si+eof!float""2 cudaMemcpy!d,), ), 'P$si+eof!float", cudaMemcpyBostToDevice"2 cudaMemcpy!d,M, M, 1$si+eof!float", cudaMemcpyBostToDevice"2 int block@i+e, grid@i+e2 int MA@A,& DTB,& DTB2 MA@A,& DTBC12 & DTBC1

.. )umber of threads in each thread block block@i+e C R2

.. )umber of thread blocks in grid

grid@i+e C !int"ceil!!float"n.block@i+e"2 convolution, D,basic,kernal d,M,MA@A,& DTB,& DTB"2 III grid@i+e, block@i+e HHH!d,), d,*,

cudaMemcpy!*, d,*, 'P$si+eof!float", cudaMemcpyDeviceToBost"2 printf!K3utput( *(OnK"2 for !int i C 92 iI 'P2 iDD" printf!KNd K,*6i7"2 printf!KOnK"2 / cuda4ree!d,)"2 cuda4ree!d,M"2 cuda4ree!d,*"2

.. >elease host memory free!)"2 free!*"2 free!M"2 return 92 /

SEQUENCE DIAGRAM:

FUNCTION DIAGRAM:

You might also like