You are on page 1of 52

Introduction to FPGA Devices

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

George Mason University

World of Integrated Circuits


Integrated Circuits
Full-Custom ASICs Semi-Custom ASICs User Programmable

PLD

FPGA

PAL

PLA

PML

LUT
(Look-Up Table)

MUX

Gates
2

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

Two competing implementation approaches


ASIC Application Specific Integrated Circuit
designs must be sent for expensive and time consuming fabrication in semiconductor foundry designed all the way from behavioral description to physical layout
ECE 645 Computer Arithmetic
.

FPGA Field Programmable Gate Array


bought off the shelf and reconfigured by designers themselves no physical layout design; design ends with a bitstream used to configure a device
3

Copyright 2012 Xilinx

What is an FPGA?
Configurable Logic Blocks
Block RAMs ECE 645 Computer Arithmetic Block RAMs
Copyright 2012 Xilinx
.

I/O Blocks Block RAMs

Which Way to Go?


ASICs FPGAs

Off-the-shelf
High performance Low development cost Low power Short time to market Low cost in high volumes

Reconfigurability

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

Other FPGA Advantages


Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower
Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits

Easy upgrades like in case of software Unique applications


reconfigurable computing
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

Major FPGA Vendors


SRAM-based FPGAs Xilinx, Inc. Share over 60% of the market Altera Corp. Atmel Lattice Semiconductor Flash & antifuse FPGAs Actel Corp. Quick Logic Corp.
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

Xilinx

Primary products: FPGAs and the associated CAD software

Programmable Logic Devices

ISE Alliance and Foundation Series Design Software

Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company


UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

Xilinx FPGA Families


Old families
XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. Not recommended for modern designs.

High-performance families
Virtex (0.22m) Virtex-E, Virtex-EM (0.18m) Virtex-II, Virtex-II PRO (0.13m) Virtex-4 (0.09m) Spartan/XL derived from XC4000 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3

Low Cost Family

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

10

Xilinx FPGA Block Diagram

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

11

CLB Structure

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

12

CLB Slice Structure


Each slice contains two sets of the following:
Four-input LUT
Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register

Carry & Control


Fast arithmetic logic Multiplier logic Multiplexer logic

Storage element
Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control
Copyright 2012 Xilinx
.

ECE 645 Computer Arithmetic

13

LUT (Look-Up Table) Functionality


x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 x1 x2 x3 x4

LUT

x1 x2 x3 x4

x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0

Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs

x1 x2 y y

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

14

5-Input Functions implemented using two LUTs


One CLB Slice can implement any function of 5 inputs Logic function is partitioned between two LUTs F5 multiplexer selects LUT
A4 A3 A2 A1 WS DI
0

LUT ROM RAM

F5
F5 GXOR G

F4 F3 F2 F1 BX

A4 A3 A2 A1

WS

DI D

LUT ROM RAM

nBX BX 1 0

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

15

5-Input Functions implemented using two LUTs


X X X X X 5 4 3 2 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 Y 0 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0

LUT

OUT

LUT

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

16

Distributed RAM
RAM16X1S

CLB LUT configurable as Distributed RAM


A LUT equals 16x1 RAM Implements Single and DualPorts Cascade LUTs to increase RAM size

LUT

=
RAM32X1S
D WE WCLK A0 A1 A2 A3 A4
O

D WE WCLK A0 A1 A2 A3

LUT

Synchronous write Synchronous/Asynchronous read


Accompanying flip-flops used for synchronous read

=
LUT

or

RAM16X2S
D0 D1 WE WCLK A0 A1 A2 A3

O0 O1

RAM16X1D
D WE WCLK A0 A1 A2 SPO

or

A3
DPRA0 DPO DPRA1

DPRA2
DPRA3

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

17

Shift Register
Each LUT can be configured as shift register
Serial in, serial out
LUT IN CE CLK
D CE Q

Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth

D CE

LUT

D CE

OUT

D CE

DEPTH[3:0]

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

18

Shift Register
12 Cycles
Operation A 64 4 Cycles Operation C 3 Cycles Operation B 8 Cycles

64

Register-rich FPGA 3 Cycles

Allows for addition of pipeline stages to increase throughput

9-Cycle imbalance

Data paths must be balanced to keep desired functionality


ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

19

Carry & Control Logic


COUT YB G4 G3 G2 G1 Y Look-Up O Table S

Carry & Control Logic

D CK EC

F5IN BY SR XB

F4 F3 F2 F1

X Look-Up Table O

S D CK EC R Q

Carry & Control Logic

CIN CLK CE

SLICE

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

20

Fast Carry Logic

Each CLB contains separate logic and routing for the fast generation of sum & carry signals
Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters

MSB
Carry Logic Routing LSB
21

Carry logic is independent of normal logic and routing resources

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

Accessing Carry Logic

All major synthesis tools can infer carry logic for arithmetic functions
Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then) Counters (count <= count +1)

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

22

Block RAM
Port B Port A
Spartan-II True Dual-Port Block RAM

Most efficient memory implementation


Dedicated blocks of memory

Block RAM

Ideal for most memory requirements


4 to 104 memory blocks
18 kbits = 18,432 bits per block

Use multiple blocks for larger memories

Builds both single and true dual-port RAMs


ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

23

Spartan-3 Block RAM Amounts

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

24

Block RAM Port Aspect Ratios

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

25

Block RAM Port Aspect Ratios


1 0 0 2 0 4

8k x 2
4,095

4k x 4

16k x 1

8,191 0

8+1

2k x (8+1)
2047 16+2 0 1023 16,383
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

1024 x (16+2)
26

Dual Port Block RAM

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

27

Dual-Port Bus Flexibility


RAMB4_S4_S16
WEA

Port A In 1K-Bit Depth

ENA RSTA CLKA ADDRA[9:0] DIA[17:0] DOA[17:0]

Port A Out 18-Bit Width

WEB
ENB

Port B In 2k-Bit Depth

RSTB CLKB ADDRB[8:0] DIB[15:0]

DOB[8:0]

Port B Out 9-Bit Width

Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

28

Two Independent Single-Port RAMs


RAMB4_S1_S1 Port A In 8K-Bit Depth VCC, ADDR[12:0]
WEA ENA RSTA CLKA ADDRA[12:0] DIA[0] DOA[0]

Port A Out 1-Bit Width

Port B In 8K-Bit Depth GND, ADDR[12:0]

WEB ENB RSTB CLKB ADDRB[12:0] DIB[0] DOB[0]

Port B Out 1-Bit Width

Added advantage of True DualPort


No wasted RAM Bits

To access the lower RAM


Tie the MSB address bit to Logic Low

Can split a Dual-Port 16K RAM into two Single-Port 8K RAM


Simultaneous independent access to each RAM

To access the upper RAM


Tie the MSB address bit to Logic High

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

29

New 18 x 18 Embedded Multiplier


Fast arithmetic functions
Optimized to implement multiply / accumulate modules
18 x 18 signed multiplier Fully combinatorial Optional registers with CE & RST (pipeline) Independent from adjacent block RAM

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

30

18 x 18 Multiplier
Embedded 18-bit x 18-bit multiplier
2s complement signed operation

Multipliers are organized in columns


Data_A (18 bits)

18 x 18 Multiplier
Data_B (18 bits)

Output (36 bits)

Note: See Virtex-II Data Sheet for updated performances


ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

31

Basic I/O Block Structure


Three-State FF Enable Clock Set/Reset Output FF Enable D Q EC SR Direct Input FF Enable Input Path

D Q EC SR

Three-State Control

Output Path

Registered Input

D EC

SR

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

32

IOB Functionality
IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered
advised for high-performance I/O

Inputs can be delayed


ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

33

Routing Resources
CLB CLB CLB

PSM
CLB CLB

PSM
CLB Programmable Switch Matrix

PSM
CLB CLB

PSM
CLB

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

34

Clock Distribution

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

35

Spartan-3 FPGA Family Members

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

36

FPGA Nomenclature

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

37

Device Part Marking


Were Using: XC3S100-4FG256

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

38

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

39

Multipliers 18 x 18

Virtex-II 1.5V Architecture

Block RAMs

Block RAMs

Copyright 2012 Xilinx

Multipliers 18 x 18

Configurable Logic Block

I /O Block

Multipliers 18 x 18 Block RAMs ECE 645 Computer Arithmetic Multipliers 18 x 18 Block RAMs

40

Virtex-II 1.5V
Device CLB Array Slices Maximum I/O BlockRAM (18kb) Multiplier Blocks Distributed RAM bits

XC2V40
XC2V80 XC2V250 XC2V500 XC2V1000 XC2V1500

8x8
16x8 24x16 32x24 40x32 48x40

256
512 1,536 3,072 5,120 7,680

88
120 200 264 432 528

4
8 24 32 40 48

4
8 24 32 40 48

8,192
16,384 49,152 98,304 163,840 245,760

XC2V2000
XC2V3000 XC2V4000 XC2V6000

56x48
64x56 80x72 96x88

10,752
14,336 23,040 33,792

624
720 912 1,104 1,108

56
96 120 144 168

56
96 120 144 168

344,064
458,752 737,280 1,081,344 1,490,944

XC2V8000 112x104 46,592


ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

41

Virtex-II Block SelectRAM


Virtex-II BRAM is 18 kbits
Additional parity bits available in selected configurations
Width Depth 1 2 4 16,386 8,192 4,096 Address [13:0] [12:0] [11:0] Data [0] [1:0] [3:0] Parity N/A
WEB WEA ENA SSRA CLKA ADDRA[# : 0] DIA[# : 0] DIPA[# : 0] DOA[# : 0] DOPA[# : 0]

N/A N/A

ENB RSTB CLKB ADDRB[# : 0] DIB[# : 0] DIPA[# : 0] DOB[# : 0] DOPB[# : 0]

9
18 36

2,048
1,024 512

[10:0]
[9:0] [8:0]

[7:0]
[15:0] [31:0]

[0]
[1:0] [3:0]

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

42

Using Library Components in VHDL Code

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

George Mason University

RAM 16x1 (1)


library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all;

entity RAM_16X1_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC; DATA_OUT : out STD_LOGIC ); end RAM_16X1_DISTRIBUTED;
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

44

RAM 16x1 (2)


architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is attribute INIT : string; attribute INIT of RAM16X1_S_1: label is "F0C1"; -- Component declaration of the "ram16x1s(ram16x1s_v)" unit -- File name contains "ram16x1s" entity: ./src/unisim_vital.vhd component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component;

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

45

RAM 16x1 (3)


begin

RAM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1") port map (O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>DATA_IN, WCLK=>CLK, WE=>WE );
end RAM_16X1_DISTRIBUTED_STRUCTURAL;

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

46

RAM 16x8 (1)


library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all;

entity RAM_16X8_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC_VECTOR(7 downto 0); DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0) ); end RAM_16X8_DISTRIBUTED;
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

47

RAM 16x8 (2)


architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is attribute INIT : string; attribute INIT of RAM16X1_S_1: label is "0000"; -- Component declaration of the "ram16x1s(ram16x1s_v)" unit -- File name contains "ram16x1s" entity: ./src/unisim_vital.vhd component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component;
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

48

RAM 16x8 (3)


begin GENERATE_MEMORY: for I in 0 to 7 generate RAM_16X1_S_1: ram16x1s generic map (INIT => X"0000") port map (O=>DATA_OUT(I), A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>DATA_IN(I), WCLK=>CLK, WE=>WE ); end generate; end RAM_16X8_DISTRIBUTED_STRUCTURAL;
ECE 645 Computer Arithmetic
Copyright 2012 Xilinx
.

49

ROM 16x1 (1)


library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all;

entity ROM_16X1_DISTRIBUTED is port( ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_OUT : out STD_LOGIC ); end ROM_16X1_DISTRIBUTED;

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

50

ROM 16x1 (2)


architecture ROM_16X1_DISTRIBUTED_STRUCTURAL of ROM_16X1_DISTRIBUTED is attribute INIT : string; attribute INIT of ROM16X1_S_1: label is "F0C1"; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; signal Low : std_ulogic := 0;

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

51

ROM 16x1 (3)


begin ROM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1") port map (O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>Low, WCLK=>Low, WE=>Low

);
end ROM_16X1_DISTRIBUTED_STRUCTURAL;

ECE 645 Computer Arithmetic

Copyright 2012 Xilinx


.

52

You might also like