You are on page 1of 8

PROJECT SUPERVISOR

DR. PRASANNA MISRA

RAVI SHANKAR MEHTA (IEC2014087)


DEEPAK BAIRWA (IEC2014044)
ALOK RAJ (IBM2014036)

1).Abstract
vision is a complex process, because unlike other mathematical operation , it has a anomaly that there cant be a
division by zero. Division have important role in many computation process, i.e.. Image compression etc.. Thus We
implemented a gate level circuit to divide two 16 bit number. The basic implementation is binary adder which later
led to 16 bit adder or divider.The divider can be implemented on FPGA and thus we analyse the design and
schematics of the design and get the power analysis of the design.
2).FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a
designer after manufacturing hence "field-programmable". The FPGA configuration is generally specified using a
hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC). (Circuit
diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare.)
FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the
blocks to be "wired together", like many logic gates that can be inter-wired in different configurations. Logic blocks
can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In
most FPGAs, logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of
memory
3).DIVISION
Division algorithms can be grouped into two classes, according to their iterative operator. The first class, where
subtraction is the iterative operator, contains many familiar algorithms (such as no restoring division) which are
relatively slow, as their execution time is proportional to the operand (divisor) length. We then examine a higher
speed class of algorithm, where multiplication is the iterative operator. Here, the algorithm converges quadratically;
its execution time is proportional to log2 of the divisor length.
Division by subtraction
This algorithm is very simple because it is used in the elementary school for division and it is well known. This
algorithm is described below -:
1.
2.
2.

Align the divisor with the most significant bit of dividend and mark this value like a div. Set i according to
number of shifts. Set remainder R to zero.
T = R div.
If T >= 0 set Q(0) and R = T.
If T < 0 clear Q(0).

3.

Shift Q one bit right and div one bit left.

4.

if i = -1 Q is quotient and R is remainder.


This algorithm has many variations, but its function is always the same.

Division by multiplying
This algorithm is described mainly in [1] and is using multiplication and comparing/subtraction. If the 16bit
division is desired, it must be used 16bit multiplication with 32bit result. This algorithm is described below.
1.

Set the initial value of Q(8-n) = 1 and the bits right of Q(8-n) to 0.

2.

Calculate D*Q.

3.

Calculate N-(D*Q) or compare.


a. If D*Q >= 0, N > (D*Q), set Q(8-n) to 1.
b. If D*Q < 0, N < (D*Q), set Q(8-n) to 0.

4.

Increment n and repeat until n >= 0 from point 2.

As mentioned above this algorithm is need for 32bit multipliers with 64bit result what is not very good solution
for FPGA. However if division of 16bit long integer is not required, only 16bit can be realized on Artix-7 FPGA.

4).CODE
First Implementation and its Report
timescale 1ns / 1ps;
module
simple_divider(dividend,start,clk,d1r,d2r,a,b,c,d,e,
f,g,dp,an,ready);
input wire [15:0] dividend;
input
start, clk,d1r,d2r;
output
a,b,c,d,e,f,g,dp;
output
ready;
output [7:0] an;
reg [15:0] quotient;
reg [15:0] dividend_copy, divider_copy, diff;
wire [15:0] divisor;
wire [15:0] remainder = dividend_copy[15:0];
reg [4:0] bit;
wire
ready = !bit;
wire [31:0] binary;
bch myb(binary,clk,clear,a,b,c,d,e,f,g,dp,an);
wire out_clk;
clk_div myclk(clk,clear,out_clk);
assign divisor =16'b0011101010111010;
initial bit = 0;
assign binary={quotient,remainder};
always @( posedge clk )
if( ready && start ) begin
bit = 16;
quotient = 0;
dividend_copy = {1'b0,dividend,15'd0};
divider_copy = {1'b0,divisor,15'd0};
end else begin
diff = dividend_copy - divider_copy;
quotient = quotient << 1;
if( !diff[15] ) begin
dividend_copy = diff;
quotient[0] = 1'd1;
end
divider_copy = divider_copy >> 1;
bit = bit - 1;
end
endmodule
module clk_div
#(
parameter WIDTH = 14, // Width of the register
required
parameter N = 5000// We will divide by 12 for example
in this case
)
(clk,reset, clk_out);
input clk;
input reset;
output clk_out;
reg [WIDTH-1:0] r_reg;
wire [WIDTH-1:0] r_nxt;
reg clk_track;
always @(posedge clk or posedge reset)
begin
if (reset)
begin
r_reg <= 0;

module sevenseg(
input clock, reset,
input [3:0]in0,in1,in2,in3,in4,in5,in6,in7, //the 4 inputs
for each display
output a, b, c, d, e, f, g, dp, //the individual LED output
for the seven segment along with the digital point
output [7:0] an // the 4 bit enable signal
);
localparam N = 40;
reg [N-1:0]count; //the 18 bit counter which allows us to
multiplex at 1000Hz
always @ (posedge clock or posedge reset)
begin
if (reset)
count <= 0;
else
count <= count + 1;
end
reg [3:0]sseg; //the 7 bit register to hold the data to
output
reg [7:0]an_temp; //register for the 7 bit enable
always @ (*)
begin
case(count[N-1:N-3]) //using only the 3 MSB's of the
counter
3'b000 :
begin
sseg = in0;
an_temp = 8'b11111110;
end
3'b001:
begin
sseg = in1;
an_temp = 8'b11111101;
end
3'b010:
begin
sseg = in2;
an_temp = 8'b11111011;
end
3'b011:
begin
sseg = in3;
an_temp = 8'b11110111;
end
'b100:
begin
sseg = in4;
an_temp = 8'b11101111;
end
3'b101:
begin
sseg = in5;
an_temp = 8'b11011111;
end
3'b110:
begin

clk_track <= 1'b0;


end
else if (r_nxt == N)
begin
r_reg <= 0;
clk_track <= ~clk_track;
end
else
r_reg <= r_nxt;
end
assign r_nxt = r_reg+1;
assign clk_out = clk_track;
endmodule
module bch(binary,clk,clr,a,b,c,d,e,f,g,dp,an);
input [31:0]binary;
input clk,clr;
reg [3:0]i0,i1,i2,i3,i4,i5,i6,i7;
output a,b,c,d,e,f,g,dp;
output [7:0]an;
integer i;
sevenseg
mys(clk,clr,i1,i2,i3,i4,i5,i6,i7a,b,c,d,e,f,g,dp,an);
always@(binary)
begin
i7[3]=binary[31];
i7[2]=binary[30];
i7[1]=binary[29];
i7[0]=binary[28];
i6[3]=binary[27];
i6[2]=binary[26];
i6[1]=binary[25];
i6[0]=binary[24];
i5[3]=binary[23];
i5[2]=binary[22];
i5[1]=binary[21];
i5[0]=binary[20];
i4[3]=binary[19];
i4[2]=binary[18];
i4[1]=binary[17];
i4[0]=binary[16];
i3[3]=binary[15];
i3[2]=binary[14];
i3[1]=binary[13];
i3[0]=binary[12];
i2[3]=binary[11];
i2[2]=binary[10];
i2[1]=binary[9];
i2[0]=binary[8];
i1[3]=binary[7];
i1[2]=binary[6];
i1[1]=binary[5];
i1[0]=binary[4];
i0[3]=binary[3];
i0[2]=binary[2];
i0[1]=binary[1];
i0[0]=binary[0];
end
endmodule
Synthesis Report
Starting RTL Elaboration : Time (s): cpu = 00:00:09 ;
elapsed = 00:00:11 . Memory (MB): peak = 303.012 ;
gain = 96.363
Parameter WIDTH bound to: 14 - type: integer
Parameter N bound to: 5000 - type: integer
Finished RTL Elaboration : Time (s): cpu = 00:00:10 ;
elapsed = 00:00:13 . Memory (MB): peak = 340.367 ;
gain = 133.719
Finished Constraint Validation : Time (s): cpu =
00:00:24 ; elapsed = 00:00:28 . Memory (MB): peak =
642.441 ; gain = 435.793

Report RTL Partitions:


Detailed RTL Component Info :
Adders :
3 Input 16 Bit
Adders := 1
2 Input 14 Bit
Adders := 1

sseg = in6;
an_temp = 8'b10111111;
end
3'b111:
begin
sseg = in7;
an_temp = 8'b01111111;
end
endcase
end
assign an = an_temp;
reg [6:0] sseg_temp; // 7 bit register to hold the binary
value of each input given
always @ (*)
begin
case(sseg)
4'b0000 : sseg_temp = 7'b1000000; //to display 0
4'b0001 : sseg_temp = 7'b1111001; //to display 1
4'b0010 : sseg_temp = 7'b0100100; //to display 2
4'b0011 : sseg_temp = 7'b0110000; //to display 3
4'b0100 : sseg_temp = 7'b0011001; //to display 4
4'b0101 : sseg_temp = 7'b0010010; //to display 5
4'b0110 : sseg_temp = 7'b0000010; //to display 6
4'b0111 : sseg_temp = 7'b1111000; //to display 7
4'b1000 : sseg_temp = 7'b0000000; //to display 8
4'b1001 : sseg_temp = 7'b0010000; //to display 9
4'b1010 : sseg_temp = 7'b0001000; //to display A
4'b1011 : sseg_temp = 7'b0011100; //to display B
4'b1100 : sseg_temp = 7'b1000110; //to display C
4'b1101 : sseg_temp = 7'b0100011; //to display D
4'b1110 : sseg_temp = 7'b0000110; //to display E
4'b1111 : sseg_temp = 7'b0001100; //to display F
default : sseg_temp = 7'b0111111; //dash
endcase
end
assign {g, f, e, d, c, b, a} = sseg_temp; //concatenate the
outputs to the register, this is just a
assign dp = 1'b1; //since the decimal point is not needed,
all 4 of them are turned off
endmodule

Start Loading Part and Timing Information


Loading part: xc7a100tcsg324-3
Finished Loading Part and Timing Information : Time (s):
cpu = 00:00:24 ; elapsed = 00:00:28 . Memory (MB): peak =
642.441 ; gain = 435.793
Start Applying 'set_property' XDC Constraints
Finished applying 'set_property' XDC Constraints : Time (s):
cpu = 00:00:24 ; elapsed = 00:00:28 . Memory (MB): peak =
642.441 ; gain = 435.793
Finished RTL Optimization Phase 2 : Time (s): cpu =
00:00:24 ; elapsed = 00:00:28 . Memory (MB): peak =
642.441 ; gain = 435.793

Report Cell Usage:


+------+-------+------+
| |Cell |Count |
+------+-------+------+
|1 |BUFG | 1|

2 Input 5 Bit
Adders := 1
+---Registers :
16 Bit Registers := 3
5 Bit Registers := 1
1 Bit Registers := 1
+---Muxes :
2 Input 16 Bit
Muxes := 1
8 Input 8 Bit
Muxes := 1
16 Input 7 Bit
Muxes := 1
8 Input 4 Bit
Muxes := 1
2 Input 1 Bit
Muxes := 1
Finished RTL Component Statistics
Module clk_div
Detailed RTL Component Info :
+---Adders :
2 Input 14 Bit
Adders := 1
+---Registers :
1 Bit Registers := 1
Finished RTL Hierarchical Component Statistics

. Slice Logic
+-------------------------+------+-------+-----------+------+
|
Site Type
| Used | Fixed | Available |
Util% |
+-------------------------+------+-------+-----------+------+
| Slice LUTs*
| 36 | 0 | 63400 |
0.06 |
| LUT as Logic
| 36 | 0 | 63400 |
0.06 |
| LUT as Memory
| 0 | 0 | 19000 |
0.00 |
| Slice Registers
| 73 | 0 | 126800 |
0.06 |
| Register as Flip Flop | 73 | 0 | 126800 |
0.06 |
| Register as Latch | 0 | 0 | 126800 |
0.00 |
| F7 Muxes
| 4 | 0 | 31700 |
0.01 |
| F8 Muxes
| 0 | 0 | 15850 |
0.00 |
+-------------------------+------+-------+-----------+------+

|2 |CARRY4 | 14|
|3 |LUT1 | 58|
|4 |LUT2 | 2|
|5 |LUT3 | 1|
|6 |LUT4 | 9|
|7 |LUT5 | 10|
|8 |LUT6 | 1|
|9 |MUXF7 | 4|
|10 |FDRE | 72|
|11 |FDSE | 1|
|12 |IBUF | 3|
|13 |OBUF | 15|
|14 |OBUFT | 2|
+------+-------+------+
Report Instance Areas:
+------+---------+---------+------+
| |Instance |Module |Cells |
+------+---------+---------+------+
|1 |top |
| 193|
|2 | myb |bch | 109|
|3 | mys |sevenseg | 109|
+------+---------+---------+------+

IO and GT Specific
+-----------------------------+------+-------+-----------+-------+
|
Site Type
| Used | Fixed | Available | Util% |
+-----------------------------+------+-------+-----------+-------+
| Bonded IOB
| 20 | 0 |
210 | 9.52 |
| Bonded IPADs
| 0| 0|
2 | 0.00 |
| PHY_CONTROL
| 0| 0|
6 | 0.00 |
| PHASER_REF
| 0| 0|
6 | 0.00 |
| OUT_FIFO
| 0| 0|
24 | 0.00 |
| IN_FIFO
| 0| 0|
24 | 0.00 |
| IDELAYCTRL
| 0| 0|
6 | 0.00 |
| IBUFDS
| 0| 0|
202 | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY | 0 | 0 |
24
| 0.00 |
| PHASER_IN/PHASER_IN_PHY | 0 | 0 |
24 |
0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 |
300 |
0.00 |
| ILOGIC
| 0| 0|
210 | 0.00 |
| OLOGIC
| 0| 0|
210 | 0.00 |
+-----------------------------+------+-------+-----------+-------+

Global Clock-:
GlbID

SrcId

G0

SCR0

Driver
Constraint Site
Clock
Clock
Type/Pin
Region Loads
Bufg/o
nONE
BUFGCTRL_X0Y16 1
73

Clock
Periof
10

Clock
clk_IBUF_BUFG_inst/O

Second Implementation
timescale 1ns / 1ps;
module division(A,Res);
//the size of input and output ports of the division
module is generic.
parameter WIDTH = 16;
//input and output ports.
input [WIDTH-1:0] A;
output [WIDTH-1:0] Res;
//internal variables
reg [WIDTH-1:0] Res = 0;
reg [WIDTH-1:0] a1,b1;
reg [WIDTH:0] p1;
wire [WIDTH-1:0] B;
assign B=16'b0000000000000111;
integer i;
always@ (A or B)
begin
//initialize the variables.

Adders :
2 Input
2 Input

16 Bit
15 Bit

Adders := 16
Adders := 15

+-------------------------+------+-------+-----------+-------+
|
Site Type
| Used | Fixed | Available | Util% |
+-------------------------+------+-------+-----------+-------+
| Slice LUTs*
| 393 | 0 | 63400 | 0.62 |
| LUT as Logic
| 393 | 0 | 63400 | 0.62 |
| LUT as Memory
| 0 | 0 | 19000 | 0.00 |
| Slice Registers
| 0 | 0 | 126800 | 0.00 |
| Register as Flip Flop | 0 | 0 | 126800 | 0.00 |
| Register as Latch | 0 | 0 | 126800 | 0.00 |
| F7 Muxes
| 0 | 0 | 31700 | 0.00 |
| F8 Muxes
| 0 | 0 | 15850 | 0.00 |
+-------------------------+------+-------+-----------+-------+

+----------+------+---------------------+
| Ref Name | Used | Functional Category |
+----------+------+---------------------+
| LUT1 | 236 |
LUT |
| LUT2 | 173 |
LUT |
| CARRY4 | 109 |
CarryLogic |
| OBUF | 16 |
IO |
| IBUF | 16 |
IO |
+----------+------+---------------------+

a1 = A;
b1 = B;
p1= 0;
for(i=0;i < WIDTH;i=i+1) begin //start the for loop
p1 = {p1[WIDTH-2:0],a1[WIDTH-1]};
a1[WIDTH-1:1] = a1[WIDTH-2:0];
p1 = p1-b1;
if(p1[WIDTH-1] == 1) begin
a1[0] = 0;
p1 = p1 + b1; end
else
a1[0] = 1;
end
Res = a1;
end
endmodule

-+-----------------------------+------+-------+-----------+-------+
|
Site Type
| Used | Fixed | Available | Util% |
+-----------------------------+------+-------+-----------+-------+
| Bonded IOB
| 32 | 0 |
210 | 15.24 |
| Bonded IPADs
| 0| 0|
2 | 0.00 |
| PHY_CONTROL
| 0| 0|
6 | 0.00 |
| PHASER_REF
| 0| 0|
6 | 0.00 |
| OUT_FIFO
| 0| 0|
24 | 0.00 |
| IN_FIFO
| 0| 0|
24 | 0.00 |
| IDELAYCTRL
| 0| 0|
6 | 0.00 |
| IBUFDS
| 0| 0|
202 | 0.00 |
| PHASER_OUT/PHASER_OUT_PHY | 0 | 0 |
24 |
0.00 |
| PHASER_IN/PHASER_IN_PHY | 0 | 0 |
24 |
0.00 |
| IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 |
300 |
0.00 |
| ILOGIC
| 0| 0|
210 | 0.00 |
| OLOGIC
| 0| 0|
210 | 0.00 |
+-----------------------------+------+-------+-----------+-------+

+----------------+-----------+----------+-----------+-----------------+
| On-Chip
| Power (W) | Used | Available | Utilization (%) |
+----------------+-----------+----------+-----------+-----------------+
| Clocks
| 0.000 |
3|
--- |
--- |
| Slice Logic | 0.025 | 520 |
--- |
--- |
| LUT as Logic | 0.017 | 393 | 63400 |
0.62 |
| CARRY4
| 0.008 | 109 | 15850 |
0.69 |
| Others
| 0.000 |
2|
--- |
--- |
| Signals
| 0.043 | 506 |
--- |
--- |
| I/O
| 0.405 |
32 |
210 |
15.24 |
| Static Power | 0.099 |
|
|
|
| Total
| 0.571 |
|
|
|
+----------------+-----------+----------+-----------+-----------------+

5) Simulation Result

6). Comperision Summary


Sl.
1

Parameter
RTL
Components

Implementation 1
Adders : 3 Input 16 Bit
Adders := 2 Input 14 Bit
Adders := 2 Input 5 Bit
Registers :- (% in total)
Muxes : (5 in Total)

Implementation 2
2 Input 16 Bit
Adders := 16
2 Input 15 Bit
Adders := 15

2
3
4
5

Slice LUTs*
LUT as Logic
Bounded IOB
Area

Power

36(0.06%)
36(0.06%)
20(9.52%)
North Dir 1x1 Area, Max Cong = 27.027%, No
Congested Regions.
South Dir 1x1 Area, Max Cong = 18.018%, No
Congested Regions.
East Dir 1x1 Area, Max Cong = 26.4706%, No
Congested Regions.
West Dir 1x1 Area, Max Cong = 10.2941%, No
Congested Regions.
Total On-Chip Power (W) :- 0.114
Dynamic (W)
:- 0.017 Device Static
(W)
:-0.097
Effective TJA (C/W)
:-4.6
Max Ambient (C)
:-99.5
Junction Temperature (C) :- 25.5

393(0.62%)
393(0.62%)
32(15%)
North Dir 1x1 Area, Max Cong = 36.036%, No
Congested Regions.
South Dir 1x1 Area, Max Cong = 30.6306%, No
Congested Regions.
East Dir 1x1 Area, Max Cong = 20.5882%, No
Congested Regions.
West Dir 1x1 Area, Max Cong = 17.6471%, No
Congested Regions.
Total On-Chip Power (W) :- 0.571
Dynamic (W)
:- 0.472
Device Static (W)
:_0.099
Effective TJA (C/W)
:_ 4.6
Max Ambient (C)
:-97.4
Junction Temperature (C) :- 27.6

Delay

Setup:8 Failing Endpoints,


Worst Slack :- -1.497ns,
Total Violation -9.4984ns
Hold :
0 Failing Endpoints,
Worst Slack :- 0.175ns,
Total Violation
0.000ns

Setup:14 Failing Endpoints,


Worst Slack :- -41.961ns,
Total Violation -358.154ns
Hold :
0 Failing Endpoints,
Worst Slack :- 2.787ns,
Total Violation
0.000ns

7)Refrences

1).An algorithm facilitating Fast BCD Division on Low End Processors using Ancient Indian
Vedic Mathematics Sutras :Diganta Sengupta, Mahamuda Sultana, Atai Chaudhuri
INSPEC Accession Number: 13272569, Date of Conference: 28-29 Dec. 2012
2).Binary Division Algorithm and Implementation in VHDL Ing. Filip ADAMEC, Ing. Tom FRYZA, Ph.D[online]
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5158757
3).Xilinx, refrence guide for ARTIX-7 FPGAs, DS181, Artix-7 FPGAs Data Sheet: DC and AC Switching Characteristics
4).Verilog HDL, A guide to digital design and synthesis(IEEE 1346-2001 compliant) [book] by Samir Palnitkar
5).Space Efficient Division on FPGAs Donald G. Bailey [online]
http://seat.massey.ac.nz/research/centres/SPRG/pdfs/2006_ENZCon_206.pdf
6).Binary Division Algorithm and High Speed Deconvolution Algorithm(Based on Ancient Indian Vedic
Mathematics)Surabhi Jain, Mukul Pancholi, Harsh Garg, Sandeep Saini [online],
https://www.researchgate.net/publication/264118516_Binary_division_algorithm_and_high_speed_deconvolution_a
lgorithm_Based_on_Ancient_Indian_Vedic_Mathematics

You might also like