You are on page 1of 84

Optimized VLSI Implementation of Digital Filters

M. TECH PROJECT

Submitted in partial fulfillment of


The requirements for the degree of

MASTER OF TECHNOLOGY

IN

INSTRUMENTATION AND CONTROL


(Process Instrumentation)

Krunal H. Bhavsar

M0629P03

Department of Instrumentation and Control


College of Engineering, Pune-411005

(2007-08)
ACCEPTANCE CERTIFICATE

DEPARTMENT OF INSTRUMENTATION AND CONTROL

COLLEGE OF ENGINEERING, PUNE


(An Autonomous institute of Govt. of Maharashtra)

The project title entitledOptimized VLSI Implementation of Digital Filters.


Submitted by Krunal H. Bhavsar having Roll no: M0629P03 accepted for being
evaluated.

Head
Project Guide
Department of
(Prof. D. N. Sonawane)
Instrumentation and Control

Date: Date:

i
DISSERTATION APPROVAL CERTIFICATE

DEPARTMENT OF INSTRUMENTATION AND CONTROL

COLLEGE OF ENGINEERING, PUNE


(An Autonomous institute of Govt. of Maharashtra)

The dissertation entitled Optimized VLSI Implementation of Digital Filters


submitted by Krunal H. Bhavsar having Roll no: M0629P03 is approved for the
degree of Master of Technology in Instrumentation and Control (Process Instrumen-
tation)

Head
Project Guide Department of
Examiner
Prof. D. N. Sonawane Instrumentation and
Control

Date:

ii
Abstract

DSP filters are mandatory part of DSP applications like digital camera, mobile
phones, portable media players, etc. Till today DSP processors were the best choice
of the developers because of some features of DSP processors like inbuilt MAC units,
floating point engine etc. Though there are some disadvantages are there of DSP
processors like they are serial in nature, not reconfigurable, for simple operation like
filter the whole chip is unutilised. FPGAs becomes an attractive platform for these
applications, because of its parallel nature, reconfigurable architecture and dedicated
DSP blocks, IBM PowerPC Processor, Ethrnet MAC, Rocket IO, and floating point
processor inside it. DSP algorithms are mainly consists of adder, multiplier, counter,
etc. I present complete optimized architecture for FIR filter which does all kind of
filter operations inside FPGA.This architecture is common for all kind of filters
with different sampling frequency, cutoff frequency and different numbers of bits
for data and coefficients. Its performance is compared with the filter available in
the Simulink in signal processing tool box. This filter can use any one even the
coefficient values are generated from Simulink. Design of ADC and DAC are
also given which are implemented in FPGA and its performance is compared with
the other onboard ADC and DAC. The design of floating point MAC unit is also
given which performs 8 operation circular convolution with it in 302 clock cycles.

iii
Contents

Acceptancecertificate i

Approval certificate ii

Abstract iii

1 Introduction 1

2 Literature Survey 4

3 Conceptual Explanation of Dessertation Topic 11


3.1 Introduction to Field Programming Gate Array . . . . . . . . . . . . 12
3.2 Floating point unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Multiply Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 M odulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Implementing FIR/IIR Filter . . . . . . . . . . . . . . . . . . . . . . 21

4 Hardware - Software Implementation 25


4.1 Spartan 3E Starter Kit . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Spartan 3A Starter Kit . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Xilinx FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 JTAG Programmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 SPI interface of ADC and DAC with FPGA . . . . . . . . . . . . . . 32
4.5.1 SPI interface of Programmable Amplifier(LTC6912-1) . . . . . 32
4.5.2 SPI Interface of ADC (LTC1407A-1 ) . . . . . . . . . . . . . . 34
4.5.3 SPI Interface of DAC (LTC2624) . . . . . . . . . . . . . . . . 35
4.6 Configure Spartan 3E FPGA thought on-board Platform flash PROM 38
4.7 Hardware in the loop . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

iv
4.8 Fixed Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.9 Floating Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.10 Implementation of ADC and DAC . . . . . . . . . . . . . . . . 46
4.10.1 DAC: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.10.2 ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.11 Implementation of FIR Filter . . . . . . . . . . . . . . . . . . . . . . 51

5 Results 57
5.1 Fixed Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Floating Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 FIR filter results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6 Case Study : Signal Conditioning for Magnetostrictive Level Trans-


mitter 61
6.1 Working Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Features,Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.2 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3 Challenges and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4 Board Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7 Conclusion 70

8 Future Scope 71

References 74

Acknowledgements 75

v
List of Figures

1.1 Basic Setup of Digital Filter . . . . . . . . . . . . . . . . . . . . . . . 2

3.1 Basic Architecture of FPGA . . . . . . . . . . . . . . . . . . . . . . . 13


3.2 Floating Point Numbering System . . . . . . . . . . . . . . . . . . . . 14
3.3 Floating Point Addition Algorithm . . . . . . . . . . . . . . . . . . . 16
3.4 Floating Point Multiplication . . . . . . . . . . . . . . . . . . . . . . 18
3.5 Multiply Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Block diagram of Sigma-Delta ADC . . . . . . . . . . . . . . . . . . . 20
3.7 Sigma Delta DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.8 FIR Direct form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.9 IIR Direct form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Spartan 3E strter kit . . . . . . . . . . . . . . . . . . . . . . . . . . . 26


4.2 Spartan 3A Starter Kit . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Spartan 3 Family FPGA . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 JTAG Connection for multiple devices . . . . . . . . . . . . . . . . . 30
4.5 Circuit diagram of JTAG programmer for FPGA . . . . . . . . . . . 31
4.6 JTAG connection with the board . . . . . . . . . . . . . . . . . . . . 32
4.7 Programmable Amplifier and ADC conncection with FPGA in Spar-
tan 3E starter kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.8 Bit transfer pattern between Amplifier and FPGA . . . . . . . . . . . 34
4.9 Timing Diagram of Amplifier SPI communication . . . . . . . . . . . 35
4.10 Master slave connection of ADC with FPGA in SPI mode . . . . . . 35
4.11 Bit pattern to be transfer to ADC from FPGA . . . . . . . . . . . . . 36
4.12 Timing diagram of ADC SPI communication . . . . . . . . . . . . . . 36
4.13 Connection of DAC to FPGA in Spartan 3E starter kit . . . . . . . . 37
4.14 Bit transfer patter for DAC to FPGA . . . . . . . . . . . . . . . . . . 37
4.15 Timing diagram of DAC communication with FPGA . . . . . . . . . 38
4.16 Hardware Co-Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.17 Block Diagram of Fix point MAC . . . . . . . . . . . . . . . . . . . . 41

vi
4.18 Timing Diagram of fix point MAC . . . . . . . . . . . . . . . . . . . 42
4.19 Block Diagram of Floating point MAC . . . . . . . . . . . . . . . . . 43
4.20 Simulink diagram for floating point MAC . . . . . . . . . . . . . . . . 44
4.21 Timing Diagram for floating point MAC . . . . . . . . . . . . . . . . 45
4.22 Low pass filter connection with DAC . . . . . . . . . . . . . . . . . . 46
4.23 PWM generated by DAC . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.24 Low pass filter output of DAC . . . . . . . . . . . . . . . . . . . . . . 49
4.25 Connection for ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.26 Block Diagram of ADC implemented . . . . . . . . . . . . . . . . . . 51
4.27 Multichannel ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.28 Block Diagram of FIR filter . . . . . . . . . . . . . . . . . . . . . . . 53
4.29 GUI for FIR filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.30 Memory Control and Address Generator . . . . . . . . . . . . . . . . 55
4.31 Upscaling and Concatination . . . . . . . . . . . . . . . . . . . . . . . 55
4.32 MAC unit connection . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1 Resource usage for DAC . . . . . . . . . . . . . . . . . . . . . . . . . 58


5.2 FIR filter frequency and time response . . . . . . . . . . . . . . . . . 60
5.3 Magnitude response of FIR filter . . . . . . . . . . . . . . . . . . . . 60

6.1 Probe and float design of Magnetostrictive level transmitter . . . . . 62


6.2 Reflected wave from float when float at different positions . . . . . . . 64
6.3 Detecting the peak of sine pulse . . . . . . . . . . . . . . . . . . . . . 65
6.4 Pulse Detection setup . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.5 Board Schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.6 Application diagram of voltage regulator . . . . . . . . . . . . . . . . 69

vii
List of Tables

3.1 Ranges Of Floating Point Number . . . . . . . . . . . . . . . . . . . . 14

5.1 Resources used for DAC . . . . . . . . . . . . . . . . . . . . . . . . . 58


5.2 Comparision of DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

viii
Chapter 1

Introduction

Today in all type of system ranging from video application, audio application,
processing plant data, some security system like face recognition, iris recognition,
voice recognition etc., filters are mandatory for removal of noise. In signal processing,
the function of filters are to remove unwanted parts of the signal, such as random
noise, or to extract useful parts of the signal, such as the components lying within
a certain frequency range. There are two main kinds of filter, analog and digital.
They are quite different in their physical makeup and in how they work. An analog
filter uses analog electronic circuits made up from components such as resistors,
capacitors and op amps to produce the required filtering effect. Such filter circuits
are widely used in such applications as noise reduction, video signal enhancement,
graphic equalizers in hi-fi systems, and many other areas.
A digital filter uses a digital processor to perform numerical calculations on
sampled values of the signal. The processor may be a general-purpose computer
such as a PC, or a specialized DSP (Digital Signal Processor) chip. The analog
input signal must first be sampled and digitized using an ADC (analog to digital
converter). The resulting binary numbers, representing successive sampled values
of the input signal, are transferred to the processor, which carries out numerical
calculations on them. These calculations typically involve multiplying the input
values by constants and adding the products together. If necessary, the results of
these calculations, which now represent sampled values of the filtered signal, are
output through a DAC (digital to analog converter) to convert the signal back to
analog form. In a digital filter, the signal is represented by a sequence of numbers,
rather than a voltage or current.

1
The basic setup of digital filter is shown in fig. 1.1

Figure 1.1: Basic Setup of Digital Filter

Today Digital filters are more popular than Analog filters because of some of the
advantages like

Digital filters are programmable i.e. its operation is determined by a program


stored in the processors memory. This means the digital filter can easily be
changed without affecting the circuitry (hardware). An analog filter can only
be changed by redesigning the filter circuit.

Digital filters are easily designed, tested and implemented on a general-purpose


computer or workstation.

The characteristics of analog filter circuits (particularly those containing active


components) are subject to drift and are dependent on temperature. Digital
filters do not suffer from these problems, and so are extremely stable with
respect both to time and temperature.

Unlike their analog counterparts, digital filters can handle low frequency sig-
nals accurately. As the speed of DSP/FPGA technology continues to increase,
digital filters are being applied to high frequency signals in the RF (radio
frequency) domain, which in the past was the exclusive preserve of analog
technology.

Digital filters are very much more versatile in their ability to process signals
in a variety of ways; this includes the ability of some types of digital filter to
adapt to changes in the characteristics of the signal.

2
Digital filters includes some of the disadvantages also like

The speed of digital filter is depends on the speed of ADCs and DACs used in
the design. In simulation it is not major problem but in actual hardware the
speed is major problem.

Accuracy of filter is depends on the resolution and accuracy of ADCs and


DACs.

Today many types of digital processors available with different resources like DSP
from Texas Instrumentation, dS-PIC family from Microchip, FPGAs from different
companies etc. But these type of processors are not reconfigurable and parallel in
nature. Because of these they takes much more time to perform any operation. But
if these tasks are execute parallely then the speed of the operation will be much
more faster. So for application of these kind of application it is more important to
select a suitable processor for your application.

3
Chapter 2

Literature Survey

To implement filex point algorithms in FPGA is easy compared to floating point


operations. All software which makes the DSP algorithms to implement into FPGA
like System Generator, Accel DSP converts the signals coming from MATLAB into
fixed point. But the range of the fixed point numbering system is very less. Today
all processors with latest technology coming with floating point engine which makes
the floating point operation very easy and fast. So the first thing is to decide
for your application that what numbering system to be use? floating point or fixed
point? Texas Instrumentations TMS320C62x DSPs are fixed point processors while
TMS320C67x DSPs are floating point processors[1]. IEEE 754 is standard for float-
ing point numbering system. There are two types of floating point numbering system
which are single precision and double precision floating point numbers. Single pre-
cision numbering formate is of 32 bits while double precision is of 64 bits. When
does floating point operations in FPGA it takes much more resources as compare
to fixed point operations.

Standard floating point numbers are represented using an exponent and a mantissa
in the following format:

(signbit)mantissa base(exponent+bias)

The mantissa is a binary, positive fixed-point value. Generally, the fixed point is
located after the first bit, m0 , so that mantissa = m0.m1m2...mn, where mi is the
ith bit of the mantissa. The floating point number is normalized when m0 is one. The
exponent, combined with a bias, sets the range of representable values. A common
value for the bias is 2k1 , where k is the bit-width of the exponent. The IEEE
floating point standard makes floating point unit implementation portable and the
precision of the results predictable. Many VLSI people has worked on implementing
4
the floating point operations like addition [2],squre root[3]in, FPGA. In [4] auther
has implemented floating point addition and multiplication in the Xilinx 4020E,
Xilinx 6062XL, Xilinx 40250XV, and compared their performance in FPGA. Adder
with different architectures like standard floating point adder which contains the
steps like

Exponent Difference

Pre-shift for mentissa alignment

Mentissa addition and substraction

Post-shift for result normalization

Rounding

Leading-One-Predictor (LOP)

is implement in [5]. LOP adder requires more area than standard adder but results
in higher throughput.  In LOP adder
 the the leading is detected by the LUT function
F = (Ai Bi ) & Ai1 &Bi1 . Rounding and Normalization are also taken into
account in designing of adder. Xilinx Core generator also include these all operations
as IP which we can use as our need.
It is very important to compare the performance of FPGA with CPU for floating
point operations. As per the Moores law the number of transistor are getting
double at every 18 months. At the end of 2009 the CMOS technology will be 45 nm
technology. Today the Vertex family available with 65 nm while the Spartan family
is available with 90 nm. Every two years the feature size for CMOS technology
drops by over 40% [6]. This translates into a doubling of transistors per unit area
and a doubling of clock frequency every two years. Unlike CPUs, FPGAs have a
high degree of hardware configurability. Thus, while CPU designers must select a
resource allocation and a memory hierarchy that performs well across a range of
applications, FPGA designers can leave many of those choices to the application
designer. Simultaneously, the dataflow nature of computation implemented in field
programmable gate arrays (FPGAs) overcomes some of the issues with the memory
wall. There is no instruction fetch and much more local state can be maintained (i.e.
there is a larger register set). Thus, data retrieved from memory is much more likely
to stay in the FPGA until the application is done with it. As such, applications
implemented in FPGAs are free to utilize the improvements in area that accompany
Moores law. the floating-point performance of FPGAs has been increasing more

5
rapidly than that of commodity CPUs. Using the Moores law factors of 2 the
area and 2 the clock rate every two years, one would expect a 4 increase in FPGA
floating-point performance every two years. This is significantly faster than the 4
increase in CPU performance every three years. Architectural changes to FPGAs
have the potential to accelerate (or decelerate) the improvement in FPGA floating-
point performance. For example, the introduction of 1818 multipliers into Spartan
series as well as DSP 48 slices in Virtex dramatically reduce the area needed to
implement floating point operations.
There are some techniques through which it is easy to implement DSP algo-
rithms into FPGA. Some softwares are available which makes this task easy. These
softwares are like Handel-C from Celoxica, System Generator and Accel DSP from
Xilinx, Polis, C++ extention such as System-C, JAVA class such as JHDL etc.[7].
MAC(Multiply Accumulator) is a very important part of the any processor because
it does all the mathamatical operations like addition, multiplication, subtraction,
division etc. In Digital Signal Processors, only one MAC unit is there so the ex-
ecution of complex algorithms takes more time to execute. This can be overcome
by putting more MAC units which works parallely. In [8] the two architectures
are compared for implementing MAC unit. These two different architecture are
Distributed Arithnmatic (DA)[9] and Residue Number System (RNS). They have
compared the performance between DA, RNS and DA-RNS. In serial DA architec-
ture the values of two operands are given bit by bit i.e. serially. So in this type of
architecture the speed is too low and which will be not of any use. This architecture
is useful where there are very much less resources. But nowadays the FPGA are
became little chaper so the cost is not a issue. The basic serial architecture requires
N clock cycles to process N bit-operand. In RNS modulo adders are used. For this
we have to convert the operand into a number into log2 m form. So first convert
the operand and then reconvert into normal number. At last the both architectures
are combined and DA-RNS is produced. The total delay they got is 17.327 ns i.e.
maximum supported clock rate is 57.71 MHz which is very low.
Another technique is Delay Addition [10]. When an arithmetic calculation is
carried out in a RISC microprocessor, each instruction typically has two source
operands and one result. In many computations, however, the result of one arith-
metic instruction is just an intermediate result in a long series of calculations. For
example, dot product and other long summations use a long series of integer or
floating-point operations to compute a final result. While FPGA designs often suf-
fer from much slower clock rates than custom VLSI, configurable hardware allows
us to make specialized hardware for these cases; with this, we can optimize the

6
pipelining characteristics for the particular computation.A typical multiplier in a
full-custom integrated circuit has three stages. First, it uses Booth encoding to
generate the partial products. Second, it uses one or more levels of Wallace tree
compression to reduce the number of partial products to two. Third, it uses a fi-
nal adder to add these two numbers and get the result. For such a multiplier, the
third stage, performing the final add, generally takes about one-third of the total
multiplication time. If implemented using FPGAs, stage 3 could become an even
greater bottleneck because of the carry propagation problem. It is hard to apply fast
adder techniques to speed up carry propagation within the constraints of current
FPGAs. In this design Wallace tree used in place of simple multiplication. Because
of multiplication is conssits of number of additions whcih take more time to execute.
In this for pipelining additon they used the bit array approach instead of bit serial
approach. Bit serial approach is quite good for low resource platforms but results
in low throughtput. Bit array approach gives one product at every clock cycle but
the resource uses is quite large than bit serial approach.
One level of Wallace tree is composed of arrays of 3-2 adders (or compressors).
The logic of a 3-2 adder is the same as a full adder except the carry-out from the
previous bit has now become an external input. For each bit of a 3-2 adder, the
logic is:
S[i] = A1[i] A2[i] A3[i]
C[i] = A1[i]A2[i] + A2[i]A3[i] + A3[i]A1[i]
For the while array S + 2C = A1 + A2 + A3
S and C are partial results that we refer to in this paper as the pseudo-sum.
They can be combined during a final addition phase to compute a true sum. The
total number of inputs across an entire level of a 3-2 adder array is the same as
the bit-width of the inputs.In some Wallace tree designs, 4-2 adder arrays have also
been used because they reduce the number of compressor levels required [11]. Each
bit of such an array is composed of a 4-2 adder. The typical logic is:

Cout [i] = A1[i]A2[i] + A2[i]A3[i] + A3[i]A1[i]

S[i] = A1[i] A2[i] A3[i] A4[i] Cin [i]


C[i] = (A1[i] A2[i] A3[i] A4[i]) Cin [i] + (A1[i] A2[i] A3[i] A4[i])A4[i]
For the whole array, S + 2C = A1 + A2 + A3 + A4 For an integer MAC unit, the
implementation is straightforward because integers are fixed-point and are therefore
aligned. Our design looks exactly like a traditional multiplier design with Booth

7
encoding and Wallace tree except that a 4-2 adder array is inserted into the pipeline
before the final addition.
Similar as Integer MAC, we repeatedly execute pseudosum = pseudosum + in-
coming operand. Each incoming operand is an IEEE single-precision floating-point
number, with 1-bit sign, 8-bit exponent (EXP[7-0]) and 23-bit fraction. For sim-
plicity of discussion, we consider the exponent bits as three subfields: high-order
exponent, a decision bit, and low-order exponent. High-order exponent refers to
the EXP[7-6], the decision bit is EXP[5], and low-order exponent refers to EXP[4-
0]. We take different actions according to the value of these three fields. Like the
traditional adder, our design first extends the 23-bit fraction into 24-bit mantissa.
However, unlike the traditional adder, we choose not to align the incoming operand
and the current pseudosum directly because that way the alignment process could
easily become the bottleneck of the whole pipeline. In a traditional adder, the
incoming operand interacts with the accumulated pseudosum throughout the align-
ment process, which makes further pipelining impossible. Instead, we keep summary
information about the high-order exponent of the accumulated result and align its
mantissa to a fixed boundary according the its low-order exponent.Though with this
design [10] the the speed for floating point MAC get is 40 MHz which is very low
compare to other processors.
Some VLSI designs for MAC is given by Intel engineers which works on 6.2 GHz
and 5 GHz speed [11],[12], but these designs are not implemented in FPGA and the
design is more backhand than fronthand. These design is made with 90nm CMOS
technology. in [13] the FIR filter is developed with the use of MAC whcih is made by
use of 4 multipliers. The inverted form is well-suited for achieving a high sampling
rate even for higher order filters. This is possible because the throughput does not
depend strongly on the number of taps due to extensive pipelining. The fact that the
multipliers occupy a large area, however,might render the implementation of higher
order filters impractical. It has been shown that a high performance FIR filter with
substantial number of taps can be implemented on FPGAs by approximating the
filter coefficients to a sum or difference of two power-of-two terms. Implementation
of digital filters may be simplified by using only a limited number of power-of-two
terms so that only a small number of shift and add operations is required. A variety
of techniques have been proposed [13] to minimize the deterioration of the frequency
response due to these constraints. Such coefficient optimization techniques yield
performance sufficient for most practical applications. When the size of the chip is
a constraint, the arithmetic resources need to be shared at the expense of speed.
They have also implemented IIR with simple architecture by using MAC. They have

8
also compare their design with or without pipeline. But this design is only for fixed
point numbering system.
The FIR filter is simply made by a use of MAC. Even you can specifically develop
FIR filter in DSP with use of MAC which is inbuilt in processor. So one can use all
the techniques which are used for MAC to develop FIR or IIR filters. [14] have used
bit serial approach to develop FIR filter with bit serial approach. The maximum
frequency they get is 33 MHz for fixed point numbers which is too low. In [15]
the author has implemented FIR filter with the Distributed Arithmatic algorithm
and use a bit serial approach. Again for implementing the 16 tap FIR filter the
maximum frequency is 55.9 MHz and 288 bit of memory is used. In fir27 the FIR
filter is designed with parallel MAC and DA FIR for specifically Xilinx FPGAs with
SysGen. With this design the maximum throughput is abtained and compared for
different number of tap and also for Virtex and Spartan family. The performanc
using this design is satisfactory but this design is limited for fixed point numbering
system but SysGen block cant support floating point system. Now optimization
in resources is very important. In [16] the Dempster-Macleod (DM) Algorithm is
proposed to reduce the number of adders needed to implement FIR filter. It works
as follow: 115 can be represented as

115 = 26 + 25 + 24 + 21 + 20

This binary representation requires four adders. We can reduce the number of adders
by one using CSD:
115 = 27 24 22 + 20
This requires two subtractors and one adder. In this one adder is saved. Using the
approach of Dempster and Macleod, we can reduce the number of adders by one
more by factoring the number as follows:

115 = (7) (15) = 23 20 24 20


 

The Dempster and Macleod (DM) algorithm uses only two subtractors. The mul-
tiply can be accomplished by cascading the two circuits. So adder used to design
filter are reduced. But this design is not implemented for floating point numbers. In
fixed point it is quite easy to do this type of conversions compare to floating point
numbers.
Since now no one has proposed the Digital Filter architecture which include
ADC and DAC inside the FPGA. Sigma-Delta Modulators are usful to implement
ADC and DAC inside the FPGA because in modulator most of the work is

9
done in digital mode. [17] shows the implementation of DAC with help of
low pass filter outside of FPGA. Xilinx application notes (xapp 154 for DAC and
xapp 155) are useful to implement ADC and DAC inside FPGA. In [17]
DAC is implemented in Spartan II (Xc2S300E). Analog Devices also manufactures
some of the ADCs and DACs based on the modulation. AN-283 (Application
note)[18] of Analog Devices shows the operation of the ADCs and DACs.

10
Chapter 3

Conceptual Explanation of
Dessertation Topic

The most common approaches to the implementation of digital filtering algorithms


are general purpose digital signal processing chips ,or special purpose digital filtering
chips and application-specific integrated circuits (ASICs) for higher rates . Here
I suggest an approach to the implementation of digital filter algorithms on field
programmable gate arrays (FPGAs).
The architecture of these filters has been largely determined by the target appli-
cations of the particular implementations. Several widely used digital signal proces-
sors such as the Texas Instruments TMS320, Motorola 56000, and Analog Devices
ADSP-2100 families have been designed to efficiently implement filtering operations
at audio rates. These devices are extremely flexible, but are limited in performance.
High performance designs for filtering at sampling rates above 100 MHz have also
been demonstrated using CMOS and BiCMOS technologies, using approaches rang-
ing from full customto traditional factory-configured gate arrays. These efforts have
produced high performance designs for specific application domains.
Field programmable gate arrays (FPGAs) can be used to alleviate some of the
problems with the custom approach. FPGAs are programmable logic devices which
bear a significant resemblance to traditional custom gate arrays. While there are
a variety of approaches to FPGA implementation, some of the more popular series
consist of an array of arbitrarily programmable function blocks, with configurable
routing resources which are used to interconnect these blocks. Many of the most
popular FPGAs are in-system programmable, which allows the modification of the
operation of the device through simple reprogramming.
Till today the people are come up with different architectures of floating point
MAC units and filters. But people has not tried with the combination of the ADC,

11
DAC and filter all are inside the FPGA. I tried to make all these three components
inside the one FPGA with some analog devices outside of FPGA.
As per the comparision of the FPGA with DSPprocessors, General Purpose pro-
cessors and ASICS my main objective is to Implement Digital filter with ADC and
DAC in FPGA with optimization in terms of area, power, speed. The subobjectives
are:

Implement floating point unit: To perform floating point mathematical func-


tions like addition, multiplication, subtraction etc.

Implement floating point MAC : By arranging floating point unit and internal
RAM to develop digital filter

Implement Digital filter: To filter the noise from the input signal.

Implement ADC/DAC: To convert analog signal into digital and then give it
to filter and convert digital value coming from filter into again analog form.

Optimization: Optimize the whole design in terms of area, power and speed.

3.1 Introduction to Field Programming Gate Ar-


ray
A field-programmable gate array is a semiconductor device containing programmable
logic components called logic blocks, and programmable interconnects. Logic
blocks can be programmed to perform the function of basic logic gates such as
AND, and XOR, or more complex combinational functions such as decoders or
mathematical functions. In most FPGAs, the logic blocks also include memory
elements, which may be simple flip-flops or more complete blocks of memory. A
hierarchy of programmable interconnects allows logic blocks to be interconnected as
needed by the system designer, somewhat like a one-chip programmable breadboard.
Logic blocks and interconnects can be programmed by the customer or designer,
after the FPGA is manufactured, to implement any logical functionhence the name
field-programmable.
The historical roots of FPGAs are in complex programmable logic devices (CPLDs)
of the early to mid 1980s. A Xilinx co-founder, Ross Freeman, invented the field
programmable gate array in 1984. CPLDs and FPGAs include a relatively large
number of programmable logic elements. CPLD logic gate densities range from the
equivalent of several thousand to tens of thousands of logic gates, while FPGAs
12
typically range from tens of thousands to several million.A recent trend has been
to take the coarse-grained architectural approach a step further by combining the
logic blocks and interconnects of traditional FPGAs with embedded microproces-
sors and related peripherals to form a complete system on a programmable chip.
Examples of such hybrid technologies can be found in the Xilinx Virtex-II PRO and
Virtex-4 devices, which include one or more PowerPC processors embedded within
the FPGAs logic fabric. The Atmel FPSLIC is another such device, which uses an
AVR processor in combination with Atmels programmable logic architecture. The
basic structure of FPGA is shown in the fig 3.1.

Figure 3.1: Basic Architecture of FPGA

3.2 Floating point unit


Before going to directly floating point operation, first understand the IEEE stan-
dards of floating point:
The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most
widely-used standard for floating point computation. The standard defines formats
13
for representing floating-point numbers (including negative zero and denormal num-
bers) and special values (infinities and NaNs) together with a set of floating-point
operations that operate on these values. It also specifies four rounding modes and
five exceptions (including when the exceptions occur, and what happens when they
do occur). IEEE 754 specifies four formats for representing floating-point values:
single-precision (32-bit), double-precision (64-bit), single-extended precision (= 43-
bit, not commonly used) and double-extended precision (= 79-bit, usually imple-
mented with 80 bits). The full title of the standard is IEEE Standard for Binary
Floating-Point Arithmetic (ANSI/IEEE Std 754-1985), and it is also known as IEC
60559:1989, Binary floating-point arithmetic for microprocessor systems. A single
precision and double precision binary floating point number is stored in 32 bits as
shown below in fig 3.2.

Figure 3.2: Floating Point Numbering System

Here first bit shows the sign. If it is 1, it is negative number and if it is 0, it is


positive number. Next 8 bits i.e. bits 30 to 23 are for exponent which shows the
value to the power of 2 and last 23 bits i.e. 22 to 0 shows the fraction part. Similarly
in double precision binary floating point number, the number of exponent is 11 and
the number of mantissa is 52.

Ranges of Floating Point Numbers:


The range of positive floating point numbers can be split into normalized
numbers (which preserve the full precision of the mantissa), and denormalized
numbers (discussed later) which use only a portion of the fractionss preci-
sion.The range of floating point number is shown in table no. 3.1
Type Denormalized Normalized Approximate Decimal
Signele Preci- 2149 to (1 223 ) 2126 to (1 223 ) 1044.85 to 1038.52
sion 2126 2127
1074
Dignele Pre- 2 to (1 2 ) 21022 to (1 252 ) 10323.3 to 10308.3
52

cision 21022 21023

Table 3.1: Ranges Of Floating Point Number

Since the sign of floating point numbers is given by a special leading bit, the
range for negative numbers is given by the negation of the above values.
14
Special Values:

Zero: Zero is not directly representable in the straight format, due to the
assumption of a leading 1 (wed need to specify a true zero mantissa to
yield a value of zero). Zero is a special value denoted with an exponent
field of zero and a fraction field of zero. Note that -0 and +0 are distinct
values, though they both compare as equal.
Denormalized: If the exponent is all 0s, but the fraction is non-zero (else
it would be interpreted as zero), then the value is a denormalized number,
which does not have an assumed leading 1 before the binary point. Thus,
this represents a number (1)s 0.f 2 126, where s is the sign bit and f is
the fraction. For double precision, denormalized numbers are of the form
(1)s 0.f 2 1022. From this you can interpret zero as a special type of
denormalized number.
Infinity: The values +infinity and -infinity are denoted with an exponent
of all 1s and a fraction of all 0s. The sign bit distinguishes between
negative infinity and positive infinity. Being able to denote infinity as
a specific value is useful because it allows operations to continue past
overflow situations. Operations with infinite values are well defined in
IEEE floating point.
Not A Number: The value NaN (Not a Number) is used to represent a
value that does not represent a real number. NaNs are represented by a
bit pattern with an exponent of all 1s and a non-zero fraction. There are
two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).
A QNaN is a NaN with the most significant fraction bit set. QNaNs
propagate freely through most arithmetic operations. These values pop
out of an operation when the result is not mathematically defined. An
SNaN is a NaN with the most significant fraction bit clear. It is used to
signal an exception when used in operations. SNaNs can be handy to
assign to uninitialized variables to trap premature usage. Semantically,
QNaNs denote indeterminate operations, while SNaNs denote invalid
operations.

Floating Point Addition: Simple steps for floating point addition is shown
in the fig 3.3. The addition is explained here with example. From figure 3.3
the floating point addition is made of five steps: Suppose we want addition of
two numbers 21.44(A) + 7.24(B)

15
Figure 3.3: Floating Point Addition Algorithm

Step 0: Convert both numbers into binary formate.

21.44 = 10101.01110000101 = 1.010101110000101E + 4

7.24 = 111.0011110101110 = 1.110011110101110E + 2


Adding the bias, these exponents become

4 + 127 = 131 = 10000011


2 + 127 = 129 = 10000001
The numbers are stored according to the IEEE floating point standard.
The first significant bit is always a 1 and is not stored as part of the
significant.

21.44(A) = 0 10000011 01010111000010101010001


7.24(B) = 0 10000001 11001111010111000010100

Compare Expnonents: The exponents of the two registers are subtracted.


The difference is positive, indicating that the exponent in register A (on
16
the left is larger. Control selects the exponent from register A (by as-
serting 0 at the multiplexer on the left) to pass to the next section of the
adder to be used as the preliminary result for the exponent.
Shift Smaller Number Right: The difference between the two exponent is
2, indicating the significand in register B must be shifted right two places.
Before entering the ALU or the shift register, the 23-bit significands are
expanded to 32 bits by inserting the leading implicit 1 and filling in
leading 0s. (To provide for roundoff, trailing 0s may also be appended to
the original 23 bits.)Control selects the (expanded) contents of register B
to be placed in the shift register and the contents of register A (expanded)
to be sent directly to the ALU. The contents of register B are shifted right
two places and the two terms are added.
In this example, the 23 bits of the significand are mapped into bits 24
2 during the process of expanding to 32 bits. Bits 0 and 1 are set to 0
initially and used for calculating roundoff. The implicit leading 1 is set
in bit 25 and bits 26 31 hold leading 0s. The input to the ALU (after
shifting) is shown in the diagram below. (Note! Since the last two bits
of the significand in register B are both 0, shifting right just moves these
two 0s into the additional trailing bits.) After shifting B two bit right
the value of mentissa of B becomes: 00111001110101110000101
Add: Adding the two mentissa of A and B. It is a simple binary addition
of two numbers. The answer of addition of two mentissa is

111001010111000011010110

Normalize: After adding the two mentissa the MSB must be 1. So for
getting the MSB 1 we have to shift the mentissa left and according to
that the exponent value will be changed.
Rounding: Rounding modes are used when the exact result of a floating-
point operation (or a conversion to floating-point format) would need
more significant digits than there are digits in the significand. The round-
ing methods rounds the ideal (infinitely precise) result of an arithmetic
operation to the nearest representable value, and give that representation
as the result. Some of the rounding methods are
Round to nearest (the default; by far the most common mode)
Round up (toward +; negative results round toward zero)
Round down (toward -; negative results round away from zero)
17
Round toward zero (sometimes called chop mode; it is similar to
the common behavior of float-to-integer conversions, which convert
-3.9 to -3)
Floating Point Multiplication: The Floating Point Addition is described
in detail with example in previous section. In this section only the algo-
rithm of floating point multiplication is shown in the fig 3.4.

Figure 3.4: Floating Point Multiplication

3.3 Multiply Accumulator


MAC(Multiply Accumulator) is a basic part of any digital processor. Today all
latest digital processors are equipped with MAC unit. Mainly the number of MAC
unit in this type of processor is one so they are working in serial manner so these
processsors are slow. The basic MAC unit operation is described in equation 3.3
Xx
Q= (1) A(n) B(n) (3.1)
0

Q is the primary data output of the core. A and B are multiplied together and the
product added or subtracted from the current result. A simplified schematic of the
core is shown in Fig 3.5.
Speed can be increased by putting multiple MAC units which works in parallel.
18
Figure 3.5: Multiply Accumulator

3.4 M odulation
The Delta-Sigma () modulation, which is also called Sigma-Delta () mod-
ulation, is a kind of analog-to-digital or digital-to-analog conversion characterized
by integrating () differences (). An analog to digital converter (ADC) or DAC
circuit which implements this technique can be easily realized using low-cost CMOS
processes, such as the processes used to produce digital integrated circuits. Delta
Sigma ADC and DAC are easy to implement in to digital domain. The benefits
of Delta Sigma converter is that it moves most of the conversion process in digital
form. This makes possible to combine high performance analog with digital pro-
cessing. The analog component use a single comparator, integrator and 1-bit DAC.
In this case I use DAC and integrator which are inside of FPGA and comparator
which is LM339.

3.4.1 ADC
Sigma-Delta ADC is given in fig 3.6. This is a first order Sigma-Delta ADC. The
order of DAC is decided form the number of integrators in the Sigma-Delta ADC.
First order Sigma-Delta convertors(fig. 3.6a) are stable but for second or higher
order convertors stability must be taken in to account. But the accuracy of second
or higher order convertors is higher than that of first order convertors. So selection
must be made in base of your application. The basic block diagram of first order 3.6a
and second order 3.6b ADC is shown in fig 3.6.
19
(a) First Order Sigma-Delta ADC

(b) Second Order Sigma-Delta ADC

Figure 3.6: Block diagram of Sigma-Delta ADC

3.4.2 DAC
Digital to analog converters (DACs) convert a binary number into a voltage directly
proportional to the value of the binary number. A variety of applications use DACs
including waveform generators and programmable voltage sources. A Delta-Sigma
DAC uses digital techniques. Consequently, it is impervious to temperature change,
and may be implemented in programmable logic. Delta-Sigma DACs are actually
high-speed single bit DACs. Using digital feedback, a string of pulses is generated.
The average duty cycle of the pulse string is proportional to the value of the binary
input. The analog signal is created by passing the pulse string through an analog
low-pass filter. Delta-Sigma DACs are used extensively in audio applications. They
are suited for low frequency applications that require relatively high accuracy. As
is standard practice, the DAC binary input in this implementation is an unsigned
number with zero representing the lowest voltage level. The analog voltage output
is also positive only. A zero on the input produces zero volts at the output. All
ones on the input cause the output to nearly reach reference voltage. Basic block
diagram of Sigma Delta DAC is given in fig. 3.7

20
Figure 3.7: Sigma Delta DAC

Features of DAC are listed below:

Conversion with Less number of Passive Components

Can be used for programmable Design.

Less dependent on temperature.

Number of passive components required is not dependent on word length.

Suitable for high frequency Applications.

3.5 Implementing FIR/IIR Filter


In signal processing, there are many instances in which an input signal to a system
contains extra unnecessary content or additional noise which can degrade the quality
of the desired portion. In such cases we may remove or filter out the useless samples.
For example, in the case of the telephone system, there is no reason to transmit
very high frequencies since most speech falls within the band of 400 to 3,400 Hz.
Therefore, in this case, all frequencies above and below that band are filtered out.
The frequency band between 400 and 3,400 Hz, which isnt filtered out, is known as
the passband, and the frequency band that is blocked out is known as the stopband.
FIR, Finite Impulse Response, filters are one of the primary types of filters used in
Digital Signal Processing. FIR filters are said to be finite because they do not have
21
any feedback. Therefore, if you send an impulse through the system (a single spike)
then the output will invariably become zero as soon as the impulse runs through
the filter.
The basic form of FIR filter can be given as 3.2 in differential form and 3.3 as
general form.

y[n] = h[0]x[n] + h[1]x[n 1] + + h[N ]x[n N ] (3.2)


N
X 1
y[n] = h[k]x[n i] (3.3)
i=0

Same as above the basic form of IIR filter can be given as 3.4 in differential
form and 3.5 as general form.

Y [n] = b[0]x[n]+b[1]x[n1]+ +b[P ]x[nP ]a[0]y[n]a[1]y[n1] a[Q]y[nQ]


(3.4)
P 1 Q1
X X
Y [n] = b[i]x[n i] a[j]x[n j] (3.5)
i=0 j=0

From the equations shown above the FIR filter can be described in direct form
as shown in fig 3.8 and IIR filter can be described in direct form as shown in fig. .

Figure 3.8: FIR Direct form

An FIR filter has a number of useful properties which sometimes make it prefer-
able to an infinite impulse response filter. FIR filters:

Are inherently stable. This is due to the fact that all the poles are located at
the origin and thus are located within the unit circle.
22
Figure 3.9: IIR Direct form

Require no feedback. This means that any rounding errors are not com-
pounded by summed iterations. The same relative error occurs in each calcu-
lation.

They can be designed to be linear phase, which means the phase change is pro-
portional to the frequency. This is usually desired for phase-sensitive applica-
tions, for example crossover filters, and mastering, where transparent filtering
is adequate.

There are a few terms used to describe the behavior and performance of FIR
filter including the following:

Filter Coefficients - The set of constants, also called tap weights, used to
multiply against delayed sample values. For an FIR filter, the filter coefficients
are, by definition, the impulse response of the filter.

23
Impulse Response A filters time domain output sequence when the input is
an impulse. An impulse is a single unity-valued sample followed and preceded
by zero-valued samples. For an FIR filter the impulse response of a FIR filter
is the set of filter coefficients.

Tap The number of FIR taps, typically N, tells us a couple things about
the filter. Most importantly it tells us the amount of memory needed, the
number of calculations required, and the amount of filtering that it can
do. Basically, the more taps in a filter results in better stopband attenuation
(less of the part we want filtered out), less rippling (less variations in the
passband), and steeper rolloff (a shorter transition between the passband and
the stopband).

Multiply-Accumulate (MAC) In the context of FIR Filters, a MAC is the


operation of multiplying a coefficient by the corresponding delayed data sample
and accumulating the result. There is usually one MAC per tap.

There are many of the software tools and many of the digital signal processors
are available in which one can easily implement the FIR filter. In MATLAB one can
easily implement FIR filter with the ready function even in SIMULINK one FDA
tool is available in which the use has just to define the frequencies of interest and
the type of the filter and it will automatically generated. One can easily generate
this kind of filter in just 10 to 15 minutes. But user dont have any command on this
design and it is only in softform. Even in SIMULINK there is a Real time Workshop
in which you can download this filter in DSP processor kit like TMS320C6713 and
you will ready to filter the design. But to implement such kind of filter is not that
much easy for FPGA. But there are lots of advantages to implement filter in FPGA
because of its parallalism and reconfigurability.
There are generally five steps in the design of a digital filter:

Specify the filter requirements.

Calculate the filter coefficients.

Use a suitable structure to represent the filter.

Analyze the effects of finite word length on the filters performance.

Implement the filter in software and/or hardware.

some of the steps are grouped together when automated tool is used like FDA tool.

24
Chapter 4

Hardware - Software
Implementation

Hardware Implementation:
The Hardware which I have used is given below:

Spartan 3E starter kit

Spartan 3A starter kit

Logic Analyzer (ADM 8002)

JTAG programmer

4.1 Spartan 3E Starter Kit


Here some of the features of the Spartan 3E and Spartan 3A starter kit is given
below.

Devices Supported

Spartan-3E (XC3S500E-4FG320C)
CoolRunner-II (XC2C64A-5VQ44C)
Platform Flash (XCF04S-VO20C)

Clocks :50 MHz crystal clock oscillator

Memory

16Mbit SPI Flash

25
64MByte DDR SDRAM

Connectors and Interfaces

Ethernet 10/100 Phy


JTAG USB download
Two 9-pin RS-232 Serial Port
PS/2- style mouse/keyboard port
rotary encoder with push button
Four Slide Switches
Eight Individual LED Outputs
Four Momentary-Contact Push Buttons
100-Pin hirose Expansion Connection Ports
Three 6-pin expansion connectors

Display

16 character - 2 Line LCD

Figure 4.1: Spartan 3E strter kit

26
4.2 Spartan 3A Starter Kit
Here some of the features of the Spartan 3E starter kit is given

Devices Supported

Spartan-3A (XC3S700A-FG484)
Platform Flash (XCF04S-VO20C)

Clocks

50 MHz crystal clock oscillator


open slot for optional user-installed clock

Memory

Mbit Platform Flash PROM


32Mx16 DDR2 SDRAM
32 Mbit parallel Flash
2-16 Mbit SPI Flash Devices

Connectors and Interfaces:

Ethernet 10/100 Phy


JTAG USB download
Two 9-pin RS-232 Serial Port
PS/2- style mouse/keyboard port
rotary encoder with push button
Four Slide Switches
Eight Individual LED Outputs
Four Momentary-Contact Push Buttons
100-Pin hirose Expansion Connection Ports
Three 6-pin expansion connectors
15-pin VGA connector capable of 4,096 colors

Display

16 character - 2 Line LCD


27
Analog Interface Devices

4-channel D/A converter


2-channel A/D converter
Signal Amplifier

Figure 4.2: Spartan 3A Starter Kit

4.3 Xilinx FPGA


Spartan-3 Generation of FPGAs 4.3 offers a choice of five platforms, each delivering
a unique cost-optimized balance of programmable logic, connectivity, and dedicated
hard IP for your low-cost applications. Some Main Stream FPGAs are described
below for optimized to a specific application domain for lowest system cost

Spartan 3A Platform: For applications where I/O count and capabilities


matter more than logic density. Ideal for bridging, differential signaling and
memory interfacing applications

28
Figure 4.3: Spartan 3 Family FPGA

Spartan-3E platform - For applications where logic densities matter more


than I/O count. Ideal for logic integration, DSP co-processing and embed-
ded control

Spartan-3 platform - For applications where both high logic density and high
I/O count are important. Ideal for highly integrated data-processing applications

Spartan-3AN Platform - For applications where non-volatile system integration,


security or large user Flash is required.Breakthrough marriage of uncompro-
mised SRAM FPGA and Flash technologies.Outperforms non-volatile FPGAs
with unparalleled Flash reliability combined with performance and features
previously available only in SRAM FPGAs.Industry-leading security helps
prevent reverse engineering, cloning, and unauthorized overbuilding. Superior
system flexibility with up to 11 Mb of on-chip user Flash Memory.

Spartan-3A DSP Platform - For applications where integrated DSP MACs


and expanded memory are required.Supports high density designs with up
to 53K logic cells and robust on-chip memory.Over 20 GMACS DSP perfor-
mance for under 30 utilizing cost-optimized integrated DSP48A slices.Ideal for
designs requiring low-cost FPGAs for signal processing applications such as
military radio, surveillance cameras, medical imaging, etc. Significant gains
in application efficiency using highly parallel architectures.

29
4.4 JTAG Programmer
Joint Test Action Group (JTAG) is the usual name used for the IEEE 1149.1
standard entitled Standard Test Access Port and Boundary-Scan Architecture for
test access ports used for testing printed circuit boards using boundary scan. JTAG
was an industry group formed in 1985 to develop a method to test populated circuit
boards after manufacture. A JTAG interface is a special four/five-pin interface
added to a chip, designed so that multiple chips on a board can have their JTAG
lines daisy-chained together, and a test probe need only connect to a single JTAG
port to have access to all chips on a circuit board. The connector pins are

TDI (Test Data In)

TDO (Test Data Out)

TCK (Test Clock)

TMS (Test Mode Select)

TRST (Test Reset) optional.

The basic connections for multiple devices in a single chain is shown in fig 4.4

Figure 4.4: JTAG Connection for multiple devices

Since only one data line is available, the protocol is necessarily serial like SPI.
The clock input is at the TCK pin. Configuration is performed by manipulating a
state machine one bit at a time through a TMS pin. One bit of data is transferred
in and out per TCK clock pulse at the TDI and TDO pins, respectively. Different
instruction modes can be loaded to read the chip ID, sample input pins, drive
30
(or float) output pins, manipulate chip functions, or bypass (pipe TDI to TDO
to logically shorten chains of multiple chips). The operating frequency of TCK
varies depending on the chip, but it is typically 10-100 MHz (100-10 ns per bit).
When performing boundary scan on integrated circuits, the signals manipulated
are between different functional blocks of the chip, rather than between different
chips. The TRST pin is an optional active-low reset to the test logic - usually
asynchronous, but sometimes synchronous, depending on the chip. If the pin is not
available, the test logic can be reset by clocking in a reset instruction synchronously.
Data presented to TDI must be valid for some chip-specific Setup time before and
Hold time after the rising edge of TCK. TDO data is valid for some chip-specific
time after the falling edge of TCK. From Xilinx JTAG programmers are available in
two models (i) connection throuth USB and (ii) connection through parallel port.
Their cost is $199 and $111 respectively. I made JTAG programming cable which
programs throuth parallel port of cable. USB connected JTAG cable is difficult to
develop because for that you need USB controller. I can use iMPACT software for
programming the FPGA through parallel port JTAG programmer. It supports all
the family of FPGA as well as CPLD. The circuit diagram of this programmer is
shown in fig 4.5. These six points will connect to the six pins on the board which

Figure 4.5: Circuit diagram of JTAG programmer for FPGA

31
you want to configure which is shown in fig 4.6.

Figure 4.6: JTAG connection with the board

4.5 SPI interface of ADC and DAC with FPGA


4.5.1 SPI interface of Programmable Amplifier(LTC6912-1)
The Spartan-3E Starter Kit board includes a two-channel analog capture circuit,
consisting of a programmable scaling pre-amplifier and an analog-to-digital converter
(ADC). Analog inputs are supplied on the J7 header. The analog capture circuit
consists of a Linear Technology LTC6912-1 programmable preamplifier that scales
the incoming analog signal on header J7. The output of pre-amplifier connects to
a Linear Technology LTC1407A-1 ADC. Both the pre-amplifier and the ADC are
serially programmed or controlled by the FPGA. The detailed connections is shown
in fig 4.7.
The Serial Peripheral Interface (SPI) is formally described as being a full-duplex,
synchronous, character-oriented channel employing a 4-wire interface. As each bit
is transmitted by the master, the slave also transmits a bit allowing one byte to be
passed in each direction at the same time. The signals between Amplifier and FPGA
is shown below in fig 4.5.1. Communication is only possible with the LTC6912-1
device when the select signal (AMP-CS) is Low. Therefore the PicoBlaze master is
responsible for driving AMP-CS Low before transmitting and receiving a command
byte and then driving AMP-CS High. It is the act of driving AMP-CS High which
actually causes the amplifier to use the new gain setting for both channels. Looking
specifically at the LTC6912-1 Amplifier, each communication is formed of 1 byte or
8-bits. Inside the Amplifier, the SPI interface is formed by an 8-bit shift register. As

32
Figure 4.7: Programmable Amplifier and ADC conncection with FPGA in Spartan
3E starter kit

a new 8-bit command byte is transmitted to it, the byte previously sent is echoed
back to the master. The bit transfer pattern between amplifier and FPGA is shown
in fig 4.8.
The timing diagram is given in fig 4.9 . All timing shown in diagram are in
nenoseconds.
This timing diagram has been created approximately to scale assuming that the
highest speed SCK is being used (minimum of 50ns Low and 50ns High). The
LTC6912-1 captures data (SDI) on the rising edge of SCK, so the data needs to be
valid for at least 30ns before the rising edge. The LTC6912-1 outputs data (AMP-
DO) on the falling edge of SCK. This output may take up to 85ns, so if the AMP-D0
value needs to be read, then it is advisable to delay the reading of this signal as long
as possible or operate at a slower clock speed. As the above diagram indicates, it is
definitely not possible for the master to read AMP-DO using the next rising edge

33
Figure 4.8: Bit transfer pattern between Amplifier and FPGA

of SCK if the maximum clock rate is being used.

4.5.2 SPI Interface of ADC (LTC1407A-1 )


LTC1407A-1 device has a serial interface with a couple of similarly named pins
to other SPI devices, it really is a little different to work with and it is useful that
PicoBlaze can be programmed to accommodate the special requirements. The rising
edge of AD-CONVE signal is used to trigger the sampling of the analogue inputs, to
start the conversion and start the serial data transfer. This is really quite different to
the chip select found on most SPI devices. The accurate timing of regularly spaced
AD-CONV pulses would be fundamental to any digital signal processing (DSP)
algorithms but more sporadic sampling can be useful in monitoring applications.
The signal transfer between ADC and FPGA is shown in fig 4.10. The actuall
connection of LTC1407A-1 with FPGA in Spartan 3E starter kit is shown in fig.
4.7
A typical communication requires 34 cycles to be provided by the master follow-
ing the rising edge of AD-CONV. Of these 34 cycles, 6 cycles result in SDO being
driven tri-state (high impedance) as indicated by the grey boxes and timing diagram
below. The remaining cycles transfer the two 14-bit signed values most significant
bit first. The bit transfer pattern between ADC and FPGA is shown in fig. 4.11.

34
Figure 4.9: Timing Diagram of Amplifier SPI communication

Figure 4.10: Master slave connection of ADC with FPGA in SPI mode

The SDO output of the LTC1407A-1 device changes as a result of the rising
edge of the applied SCK clock. This again is different to most slave devices which
change their output on the falling edge of SCK. Since the device does not have a
conventional chip select control, it is also vital that adequate SCK cycles are applied
to ensure the SDO output is left in tristate (high impedance) such that it can not
interfere with other devices sharing the SPI bus. It is therefore advisable to use the
typical 34 cycle sequence. The timing diagram for ADC communication is shown in
fig. 4.12
The maximum sample rate supported by the LTC1407A-1 is 1.5MHz. This is
only possible if the maximum rate of SCK is used for conversion and communication.

4.5.3 SPI Interface of DAC (LTC2624)


The Spartan-3E Starter Kit board includes an SPI-compatible, four-channel, se-
rial Digitalto- Analog Converter (DAC). The DAC device is a Linear Technology
LTC2624 quad DAC with 12-bit unsigned resolution. The four outputs from the
DAC appear on the J5 header, which uses the Digilent 6-pin Peripheral Module
format. The DAC and the header are located immediately above the Ethernet

35
Figure 4.11: Bit pattern to be transfer to ADC from FPGA

Figure 4.12: Timing diagram of ADC SPI communication

RJ-45 connector. As shown 4.13, the FPGA uses a Serial Peripheral Interface
(SPI) to communicate digital values to each of the four DAC channels. The SPI
bus is a full-duplex, synchronous, character-oriented channel employing a simple
four-wire interface. A bus master-the FPGA in this example-drives the bus clock
signal (SP I SCK) and transmits serial data (SP I M OSI) to the selected bus
slave-the DAC in this example. At the same time, the bus slave provides serial
data (SP I M ISO) back to the bus master. The conncection of DAC to FPGA in
Spartan 3E starter kit shown in fig 4.13.
Looking specifically at the LTC2624 D/A converter, each communication is
formed of 4 bytes or 32-bits. Inside the D/A converter, the SPI interface is formed
by a 32-bit shift register. As a new 32-bit command word formed of command, ad-
dress and data fields is transmitted to it, the 32-bit word previously sent is echoed
back to the master. In order to use the D/A converter this response can be ig-

36
Figure 4.13: Connection of DAC to FPGA in Spartan 3E starter kit

nored, however, it is a useful to confirm correct communication is taking place.The


bit transfer pattern is shown in fig 4.14. Each bit is transmitted or received rel-

Figure 4.14: Bit transfer patter for DAC to FPGA

ative to the SCK clock. The system is fully static and any clock rate up to the
maximum of 50MHz supported by the LTC2624 is possible. Remember to check
all timing parameters in the LTC2624 data sheet if you intend working at or close
to the maximum speed. The LTC2624 captures data (SDI) on the rising edge of
SCK, so the data needs to be valid for at least 4ns relative to the rising edge. The
timing diagram of DAC interface is shown in fig 4.15. The LTC2624 changes the
output data (SDO) in response to the falling edge of SCK allowing the master to
read the value at or near the next rising edge. It is important to notice that SDO
must be read on the first clock following the enable being asserted (DAC-CS=0)
37
Figure 4.15: Timing diagram of DAC communication with FPGA

otherwise bit 31 will be missed. In theory the SPI interface allows command words
to be transmitted at a rate slightly higher than 1.5 M-words/second. Even if this
is used to set all four channels individually, this rate would exceed the conversion
rate actually supported by the D/A converter and obviously some spacing between
commands would be necessary.

4.6 Configure Spartan 3E FPGA thought on-board


Platform flash PROM
Spartan 3E does not contain on-chip nonvolatile program memory, so every time at
power up FPGA must be configure through external memory sources.
The Spartan-3E Starter Kit board supports a variety of FPGA configuration
options:

Download FPGA designs directly to the Spartan-3E FPGA via JTAG, using
the onboard USB interface. The on-board USB-JTAG logic also provides in-
system programming for the on-board Platform Flash PROM and the Xilinx
XC2C64A CPLD. SPI serial Flash and StrataFlash programming are per-
formed separately.

Program the on-board 4 Mbit Xilinx XCF04S serial Platform Flash PROM,
then Configure the FPGA from the image stored in the Platform Flash PROM
using Master Serial mode.

Program the on-board 16 Mbit ST Microelectronics SPI serial Flash PROM,


then configure the FPGA from the image stored in the SPI serial Flash PROM
using SPI mode.

38
Program the on-board 128 Mbit Intel StrataFlash parallel NOR Flash PROM,
then configures the FPGA from the image stored in the Flash PROM using
BPI Up or BPI Down configuration modes. Further, an FPGA application
can dynamically load two different FPGA configurations using the Spartan-
3E FPGAs MultiBoot mode.
In every FPGA mode selection pins are there which selects the configuration
mode of the FPGA which are M0, M1and M2. These pins works as a mode
selection pins at the time of configuration and in run mode these pins can be
used as a general purpose I/O pins. Spartan 3E starter kit contains on-board
Xilinx platform flash PROM (XC2C64A). I have configured FPGA through
this memory. So at every power up there is no need to manually configure the
FPGA through JTAG port. The bitstream will be automatically downloaded
into FPGA at every power up.

4.7 Hardware in the loop


Hardware-in-the-Loop (HIL) simulation is a technique that is used increasingly in
the development and test of complex real-time embedded systems. The purpose of
HIL simulation is to provide an effective platform for developing and testing real-
time embedded systems. HIL simulation provides an effective platform by adding
the complexity of the plant under control to the test platform. The complexity of the
plant under control is included in test and development by adding a mathematical
representation of all related dynamic systems. FPGA hardware can be included in
simulations controlled by the MATLAB language and its Simulink design tools. At
the push of a button, the Xilinx System Generator for DSP software tool produces
an implementation of model that is ready to run in hardware. System Generator
uses a special co-simulation block to control the design hardware during Simulink
simulations. The co-simulation block looks just like traditional System Generator
blocks. Its ports have names and types that match the ports on the original System
Generator subsystem. The co-simulation flow for this architecture is shown in Figure
. The simulation behavior of the co-simulation block is bit and cycle-accurate when
compared to the behavior of the original subsystem.

39
Figure 4.16: Hardware Co-Simulation

Software Implementation Softwares used for implementing the filter into FPGA
are listed below:

ISE WebPack Software: ISE WebPACK software is the ideal downloadable


solution for FPGA and CPLD design offering HDL synthesis and simulation,
implementation, device fitting, and JTAG programming. ISE WebPACK pro-
vides the tools and features along with the same easy-to-use design environ-
ment. ISE Foundation design tools providing instant access to the ISE features
and functionality at no cost. It includes Core Generator, Constrain editor
which is used to define the I/O and its state at different status of the FPGA.

System Generator :This tool is the industrys leading high-level tool for design-
ing high-performance DSP systems using FPGAs.

Develop highly parallel systems with the industrys most advanced FPGAs
Provide system modeling and automatic code generation from Simulink
and MATLAB (The MathWorks, Inc.)
Integrates the RTL, embedded, IP, MATLAB and hardware components
of a DSP system
A key component of the Xilinx XtremeDSP Tools Package and the XtremeDSP
Development and Starter Kits

Core Generator Core Generator is included in ISE WebPack which is a IP core


which includes many of the ready applications. Only one have to use that code
according to application. But no modifications are possible and must be use
as it is.

40
MATLAB: MATLAB is needed because the System Generator toolbox is im-
plemented in Simulink.

4.8 Fixed Point MAC


FPGA works on fixed point operation. I have developed the fixed point MAC which
works in parallel. Generally in DSP processors there is only one MAC unit and it
works in serially so the time needed to execute the mathematical operations is more.
If we put multiple MAC units which works in parallel then the speed will increase
considerably. I have tried to implement multiple MAC units which work in parallel
and each MAC unit having different length of operations. The result waveforms are
shown in the next chapter. The block diagram of fixed point MAC is shown in fig.
4.17

Figure 4.17: Block Diagram of Fix point MAC

The timing diagram for this MAC unit with 4 operations is shown in fig 4.18.
Here A and B each are fed at every one clock cycle. For 7 operation MAC from First
Data to Ready i.e. end of MAC operation the time needed is 9 clock cycles, same
for 6 operation MAC time needed is 8 clock cycles. These waveform are checked
with ModelSim.

41
Figure 4.18: Timing Diagram of fix point MAC

4.9 Floating Point MAC


: Today many DSP processors with floating point MAC is available from Texas
Instrumentals, Motorola and many other chip manufacturers. I have developed
floating point MAC and implement circular convolution using this MAC. The block
diagram of Floating point circular convolution using MAC which I have developed
is given in fig 4.19.
Basic parts of Floating point circular conovlution are fix to floating point con-
version, floating point addition, floating point multiplication, Register, RAM and
control signals.

Fix to floating point conversion: I have developed the whole algorithm in


System Generator. As we know FPGA woks only on fixed point. So when we
apply the double precision floating point from MATLAB it converts first into
fixed point by Gateway IN. We have to select number of bits for integer part
and fraction part. I have used core generators floating point operation IP for
floating point operations. After that I have implemente in the bloack box in
system gnerator.

Floating point addition and multiplication: Addition and multiplication are


the basic parts of the multiply accumulator. In fixed point MAC unit ac-
cumulator woks as adder but in floating point operation I am using floating
42
Figure 4.19: Block Diagram of Floating point MAC

point multiplier and using external resistor to store the value of MAC in place
of accumulator. I use coregenerator floating point operation block for these
operations and put latency 2 clock cycles. So it gives me output at 2 clock
cycle when the input data is applied and sets one done bit for handshaking
putpose. Control signal is controls this handshaking operations. The results
of individual addition and multiplication block is given in results.

Single Port RAM: I have used RAM to store data which is coming from fix
to floating point conversion. I first stores these data and then I starts to do
multiplication first. When first multiplication complets the the first addition
starts. When first addition is working the second multiplication is also woking.
In this way for piplining I use internal RAM of FPGA. We can use integers
or arrays to store these values. But we have RAM available in FPGA so
when you are using these integeres indirectly you are using gates and that is
wastage of gates. So I have used internal RAM. If I use single RAM then it is
impossible to acquire values of A and B simultaniously in a single clock cycle.
So I use two separate RAM for A and B so I can acquire data for A and B
simultaneously and through this I saves the totally 8x8=64 clock cycles. Dual
port RAM is also possible to implement in which all signals are different for
A and B. But the resource usage for two single port RAM and one dual port
43
RAM is same so I uses two single port RAM instead of one dual port RAM.

Control signal is needed to control the operations of RAM , floating point


adder and multiplier. First when the data in converted from fix to floating the
data gets write in to RAM i.e. RAM is in write mode. When the conversion is
complete the RAM converts in read i.e. data is read from RAM. Then control
signals controls the signal from adder and multiplier i.e. looks for pipelinine.

I have implemented circular conversion using Floating Point MAC. The Simulink
diagram of it is given in fig. 4.20, where only the block of major operation are shown.
The sub blocks of the program inside the block are not shown. In diagram above the

Figure 4.20: Simulink diagram for floating point MAC

every operation block is shown with arrow. All the blue parts are functional block,
some kind of operation is connected with them while yellow colour part only used

44
for in/out purposes or converting the data type. The timing diagram of 8 operations
circular MAC with the above program is shown in fig 4.21. Only first operation

Figure 4.21: Timing Diagram for floating point MAC

out of 8 is shown here because the all 8 signals can not be shown in one diagram. In
left side the label shows the operation. The main operations of this operation are
listed below:

1 control/out1: Memory location to be read from A RAM which stores the


value of A.

1 control/out2: Memory location to be read from B RAM which stores the


value of B.

A RAM: Value read from RAM which shows the floating point presentation
of A.

B RAM: Value read from RAM which shows the floating point presentation
of B.

1mult/out1: Shows the multiplication of values A and B which is given to


addition
45
1 add: Shows the addition of values coming from multiplier

1register: Stores the values coming from floating point adder

1 control: Pulse needed to reset the register and adder.

4.10 Implementation of ADC and DAC


FPGA woks only for digital singnals so for controling purpose it is mandatory to
use ADC and DAC with FPGA. But when your application needs low current and
low space and the resources are left unused in your FPGA then it is not affordable
to use exeternal ADC and DAC. So one solution is to make ADC and DAC into
FPGA with some components external to FPGA. Delta Sigma ADC and DAC are
easy to implement in to digital domain. The benefits of Delta Sigma converter is
that it moves most of the conversion process in digital form. This makes possible to
combine high performance analog with digital processing. The analog component
use a single comparator, integrator and 1-bit DAC. In this case I use DAC and
integrator which are inside of FPGA and comparator which is LM339.

4.10.1 DAC:
The basic connections of Sigma Delta sigma DAC is given in the figure given below.
One low pass filter consists of one resistor and one capacitor is connected at the
output pin of the FPGA. The range of the out of of DAC will be 0 to VCCO. Here,
VCCO is the voltage given to the I/O port . It can be a 3.3 V, 2.5 V or 1.2V. Here
I have selected the VCCO of 3.3V. The connection of low pass filter to the FPGA
is whon in fig 4.22.

Figure 4.22: Low pass filter connection with DAC

Here one resistor and capacitor make low pass filter which filters the PWM like
signal coming from FPGA and gives the voltage proportional to the DACin. We can
46
make any number of bit DAC without much change in program so it is advantageous
to set your resolution with the compromise with resource usage. You can not get
this kind of flexibility with external DAC or ADC. The output from DAC can be
counted from the equation given below:

V OU T = (DACin/(2(M SBI + 1)))xV CCOV olts

For example, for an 8-bit DAC (MSBI = 7) the lowest VOUT is 0V when DACin is
0. The highest VOUT is 255/256 VCCO volts when DACin is FF. In this type of
DAC for getting lower settling time if you give higher clock. For this purpose you
can use internal Delay Lock Loop to give double or quadruple of external clock to
DAC block.
The value of R should be more than 2.5K. R must be 2.5 KW or greater to ensure
rail-to-rail switching, with an error of 1% or less. Keep the value of R low relative to
the impedance of the load so that the current change through the capacitor due to
loading becomes negligible. The filter time constant (t = RC) must be high enough
to greatly attenuate the individual pulses in the pulse string. On the other hand,
a high time constant may also attenuate the desired low-frequency output signal.
These potentially conflicting requirements are analyzed separately. The worst-case
peak-to-peak filter noise for an 8-bit DAC can be expressed as follows:

P P NF S = (1 e1/f ) ((1 e255/f )/((1 e256/f ) 256

where
P P NF S =peak-to-peak noise expressed as a fraction of step voltage
f =f is the DAC clock frequency
=is the filter time constant, RC.
The cutoff frequency of the simple RC filter may be expressed as:

fc = 1/2

where:
fc =filter cutoff frequency
=is the filter time constant, RC.
To resolve each DACin sample to the full precision of a Delta-Sigma DAC, the sample
rate, i.e., the rate that DACin changes, must be less than or equal to 1/(2(M SBI +
1)) of the CLK frequency. In some applications, such as a programmable voltage
source, this is not an issue.

47
generates PWM likes waveform corresponding to the DACin, which is
given below for different values of DACin. It is shown in fig 4.23 After putting low

(a) PWM generated by DAC

(b) PWM generated by DAC with different input

Figure 4.23: PWM generated by DAC

pass filter with the values of 3K and 0.01 microF the output of the DAC is shown
as in fig 4.24.

4.10.2 ADC
: ADC is made with the help of DAC. This ADC is a successive approximation type
ADC and it takes output of DAC as a reference. Simple diagram of ADC is shown
in fig 4.25. For converting analog value into digital through Sigma Delta ADC we
need one low pass filter and one comparator and one DAC. We will implement DAC
into FPGA as described above. The analog level is determined by performing a
serial binary voltage search, starting at the middle of the voltage range. For each
48
Figure 4.24: Low pass filter output of DAC

complete sample, only the upper bit of the DAC input is initially set, which drives
the reference voltage to midrange. Depending on the output of the comparator,
the upper bit is then cleared or it remains set, and the next most significant bit
of the DAC input is set. This process continues for each bit of the DAC input.
The DAC is one bit wider than the ADC output. This is required in order for the
lowest numbered bit of the ADC output to be significant. When all of the bits have
been sampled, the upper bits of the register feeding theDAC is transferred to the
ADC output register. Because of the serial nature of both the DAC and the analog
sampling process, this ADC is useful only on signals that are changing at a fairly
low rate. A typical high-precision application for this ADC would be monitoring a
physical metric, such as ambient air temperature or water pressure. It can also be
used for applications with lower precision at a higher sampling rate, such as utility
voice recording (for example, a telephone answering machine). If the analog input
voltage changes during the sampling process, it effectively causes the sample point
to randomly move. This adds a noise component that becomes larger as the input
frequency increases. For many applications, the strength of the additional noise will
be so low that it will be acceptable. This noise component can be removed with an
external sample and hold circuit for the analog input signal. The ADC sample rate
may be expressed as follows:

ADCSR = fclk /2(M SBI+1)(F stm+1)(M SBI+1) samples/second

Where:
MABI=Number of bits of ADC
Fstm=configure width of the filter settle time
49
Figure 4.25: Connection for ADC

Block Diagram of ADC is given in fig. 4.26 In addition to the DAC, the ADC is
comprised of the following major elements:

DAC sample counter: This is a binary up counter that is the same width as
the DAC input. When the DAC input changes, it requires a minimum of one
complete cycle of this counter to resolve the new value at the output.

Filter settle time multiplier counter: In high-precision applications, a large


value for RC is selected for the DAC low pass filter to minimize noise on the
reference voltage. When this down-counter is at zero, it is loaded with Fstm,
a constant provided by the user. Bit sample time is effectively multiplied by
Fstm+1, so the user can configure the bit sample rate to match the filter settle
time characteristics. The width of this counter is configurable at design time
with constant MSBR. A width of four bits is adequate for most applications.

Mask shifter: This register, which is the same width as the DAC input, end-
lessly rotates a single bit right. This is effectively the state machine that
controls the bit sample sequence, implemented so that correct values can eas-
ily be loaded into the Reference shifter.

Reference shifter: This register drives the DAC input. It always starts a sample
with only the upper bit set. When only the upper bit is set, the comparator
output will be true if the analog input is greater than VCCO, and it will be
false if the analog input is less than 1/2 VCCO. If the comparator output is
true, the upper bit remains set, and the next lower bit is set. If the comparator
50
Figure 4.26: Block Diagram of ADC implemented

output is false, the upper bit clears, and the next lower bit is set. This process
continues all the way to the LSB, which causes voltage ADCref to home in on
the analog input voltage.

ADCout register: This register snaps the high order bits of the Reference
shifter when the sample is left justified; that is, the comparator output that
was sensed when only the upper bit of the DAC input was set is in the MSB.
The ADCout register is one bit shorter than the Reference shifter, making the
LSB in ADCout accurate to 1/2 LSB.
The Multi channel ADC is also implemented for 2 channels. Only need is one
comparator for each channel. Multiplexer and Demultiplexer is built inside the
FPGA. The basic diagram is shown in fig 4.27.

4.11 Implementation of FIR Filter


The MAC based fixed poing FIR filter is implemented for both Spartan 3E and
Spartan 3A. The FIR filter is made with the MAC unit. The block diagram of the
FIR is shown in fig 4.28.
Any kind of FIR filter can be implemented with this design. In this design one
can select sampling frequecy, number of bits of data as well as coefficient and bits
51
Figure 4.27: Multichannel ADC

for fractions also. There are two inputs in FIR equation, sampled data and the
coefficient. In this design the data will be sampled as per specified by the user and
the coefficients will be generated by FDA Tool. The GUI of filter is shown in fig.
??The basic block of the block diagram are described as below: In the fields the
user have to enter the values.

Filter Coefficients: In this fields if user uses the FDA Tool then he has to write
xlf da numerator(FDA Tool) and if he is not using FDA Tool then he has
to manually enter the values of filter coefficients.

Coefficient Width: The width of the coefficients needed.

Coefficient Binay Point: The binary point needed for the coefficients.
52
Figure 4.28: Block Diagram of FIR filter

Data Width : The width of the data needed.

Data Binary Point: The binary point needed for the data.

Sampling Frequency (Hz): The sampling frequency as per the FDA Tool in
Hertz. It can be differ from the FDA Tool.

The data width must be smaller than the coefficient width otherwise the design will
give the error.

Memory Control and Address Generator: This block provides the ad-
dress and the read or write command to the memory block. When WN bit of
the memory is 1 then the RAM is in write mode i.e. the data will be stored
in the memory location defined by the address port. when the WN bit is 0
then the RAM is in read mode i.e. output of the RAM will be the data in
the memory location described by the address port. This bit is controlled by
this block. For this I use two seperate counters for data and coefficients. The
coefficient counter will be start from the length of coefficient. If the order
of your filter is 43 then the counter will be start from 43 and will stop at
53
Figure 4.29: GUI for FIR filter

location 2 43 1 = 85. After that the output of counter is compared with


2 43 1 = 85. When counter reaches that value it will give high bit which
is used to control the data counter and the reset of the registor. When this
value will be go to 2 43 1 = 85, it shows that the register is reset and
the next operation is start. The connection of Memory Control and Address
Generator is shown as in fig.4.30
In same was the data counter is started from 0 and ends with 43 1 = 42.
The delay blocks are used for piplining purpose.

Upsampling And Concatination: The signal is sampled as per the user


specifications. Upsampling is used to increase the sampling of the input and
it is stored and then given to the concatination. Concatination is used to add
the difference bits between the data width and the coefficient width. This is
needed because for dual port memory you can not give RAM A and RAM B
different width. If you specify the data width as 10 and the coefficient width
as 12 then you have to add (12-10=2) bits to the data to make them equal to
the coefficient width. Interpretation and Reinterpretation blocks are used to
54
Figure 4.30: Memory Control and Address Generator

match with the binary points of both data and coefficients. The connection
for Upsampling and concatination is shown as fig. 4.31

Figure 4.31: Upscaling and Concatination

RAM: The RAM is used to store the data and the coefficients of the filter.
Here I use dual port memory to access data and coefficients both at a time. If
I use single block RAM then I want two seperate clock cycles to access data
and coefficients. But here you can access both at single clock cycle. Dual port
RAM is seperated in two seperate RAM with seperate control lines and output.
I use RAM as ROM for coefficients, because when you write coefficient once
you does not need to write it again they are remains unchanged. So there is
put the data line and selection line both at 0 i.e. Block B of RAM will always
be in read mode. I write the coefficients in RAM B initially so there is no
need to write anything in RAM B. Only the address is changed according to
55
the time. The length of RAM is depend on the FIR order, so the length of
FIR must also be change with the order of FIR. Here it changes as you select
the tap in the FDA Tool. Initially the location from 1 to the length of filter
will be zero and form next location to 2 lengthof f ilter the coefficients are
stored.

Multiplier: Multiplier is used to multiply data and coefficients. Multiplier,


Accumulator and Register connection is shown as fig.4.32

Figure 4.32: MAC unit connection

Accumulator: Accumulator will add the products of data and coefficients. It


will add until the reset is asserted. When the counter of coefficient will reach
to its final value it wll assert the reset bit of accumulator and the accumulator
will be reset to zero and then the next operation will be start.

Register: The output of the accumulator is stored int the register. Register
will be reset same as accumulator. After register there is downsample and the
convert block. We have used the upsample before to make the system faster
so we have to use downsample to retain the original sample rate. So output
will be downsample and then given to the outside of the filter.

This filter design is compared with the analog filter which is developed with same
FDA Tool and available ready in signal processing toolbox of simulink.

56
Chapter 5

Results

The hardware and software implementation is shown in the chapter 4. In this


chapter the results and comparision are shown.

5.1 Fixed Point MAC


The waveform or timing diagram for multiple fixed point MAC is shiwn in fig 4.18.
I have implemented two MAC units which are woking parallelly and both MAC
unit have different number of operations. One MAC unit has 6 and other MAC unit
have 7 operations. The total time needed to complete both the MAC operations is
9 clock cycles. If only 6 operation MAC is there the time will be 8 clock cycles.

5.2 Floating Point MAC


The timing diagram of the floating point MAC is shown in fig 4.21. The resources
used to implement this 8 point circular convolution is given below:

Slices:1057

FFs:407

LUTs:1949

MULT 18 18 :4

Block RAM:4

Total number of clock cycles needed to compelete these 8 point circular convolution
is 302 clock cycle. For same 8 point circular convolution takes 4075 clock cycles in

57
TMS320C6713. So my design is 13.49 times faster than DSP processor. Even the
speed can be more increase by using multiple MAC units which works parallely.

5.3 DAC
The PWM signals from the DAC is shown in fig 4.23a and 4.23b. The signal
generated is not proper PWM. Its output frequency and amplitude is also changes
with the input change. The output from the lowpass filter is shown in fig. 4.24.
The selection guide line for resistor and capacitor is also given. The settling period
for this DAC is 4 Seconds. The settling time for SPI DAC is 3 for half step
change. The resource usage is shown in the table given in 5.1 for the both of the
DAC implementation: The barchart for the resource usage is shown in fig. 5.1

Resources DAC SPI ADC Resources Available


FF 31 10 9,312
LUT 33 9 9,312
Slices 21 5 4,656
Gate Count 473 196 500,000
Settling Time 4 Seconds 3 Seconds

Table 5.1: Resources used for DAC

Figure 5.1: Resource usage for DAC

58
The DAC0808 interface with Spartan 3E starter kit is also done. The comparision
of inputs to the outputs for all three DAC is shown in the table 5.2. Note here that
the SPI DAC is 12 bit DAC while DAC0808 is 8 bit DAC. The DAC which I
have implemented is also a 8 bit DAC. The number of bits can be easily change in
the DAC.

% input O/P of SPI DAC O/P of DAC O/P of DAC0808 Counted O/P
10 0.34 0.33 0.33 0.33032
20 0.65 0.65 0.66 0.6599
30 0.98 0.99 1.00 0.9963
40 1.30 1.31 1.32 1.3198
50 1.65 1.65 1.66 1.65
60 2.0 1.99 2.0 1.9927
70 2.30 2.30 2.32 2.30
80 2.65 2.64 2.66 2.65
90 2.96 2.97 3.02 2.97
100 3.29 3.3 3.3 3.3

Table 5.2: Comparision of DAC

5.4 FIR filter results


The designed FIR filter is implemented for different cutoff frequencies, sampling
frequencies, bits for data and coefficients etc. The FIR filter is implemented for all
lowpass,highpass,bandpass,bandstop filters for different frequencies. All filters are
chacked with different sampling frequencies and different pass and stop frequencies
and it works similar to the block readily available in simulink in signal processing
toolbox. The FFT response of both designs is given in fig. 5.2a for sampling
frequency 500 Hz and cutoff frequency for lowpass filter is 50 Hz and inputs are
addition of two sinewaves of 30 Hz and 65 Hz respectively, the output in timedomain
from the FIR filter is shown as 5.2b and the error between FPGA output and
MATLAB ouput is shown in fig ??.
Magnitude response of the FIR filter as lowpass filter 5.3a, highpassfilter 5.3b,
bandpass filter 5.3c and bandstop filter 5.3d is given in fig. 5.3.

59
(a) Comparision of filter in FPGA vs. (b) FIR filter output
Simulink

Figure 5.2: FIR filter frequency and time response

(a) Lowpass filter (b) Highpass filter

(c) Bandpass filter (d) Bandstop filter

Figure 5.3: Magnitude response of FIR filter

The resource usage for the design described above is as follows:

Slices: 113

FFs: 181

BRAMs: 1

LUTs: 107

18 18 multipliers: 1

60
Chapter 6

Case Study : Signal Conditioning


for Magnetostrictive Level
Transmitter

The Magnetostrictive Level Transmitter is a two wire level transmitter under de-
velopment by SBEM Pvt Ltd.,Pune. They are the leader manufacturer of all kind
of level and flow transmitters. The probe is developed by them and my part is to
develop signal conditioning circuit for that probe. The working is the probe is given
below.

6.1 Working Principle


This transmitter is used for continuous, remote liquid level measurement and based
on position monitoring of a magnetic float following the magnetostrictive principle.
The probe and the float design is shown in fig. 6.1

61
Figure 6.1: Probe and float design of Magnetostrictive level transmitter

The measuring process is initiated by a current impulse. This current generates


an axial magnetic field (3) along the length of a wire (1) made of magnetostrictive
material, which is held under tension inside the guide tube. The float, which sits on
the liquid surface, is fitted with permanent magnets (4). When the pulse reaches the
float the two magnetic fields interact and a torsional force results. A torsional stress
wave (5) is induced in the wire. A piezoceramic converter in the transmitter housing
(2) at the end of the wire converts this into an electrical signal. By measuring the
elapsed transit time, it is possible to determine the start point of the torsional stress
wave and therefore the float position with a high degree of accuracy.

6.2 Features,Specifications
6.2.1 Features
Accurate Level, Interface, temperature measurement

Suitable for small capacity Petroleum Product Tanks

High Reliability and Extended Life


62
Level Resolution: 1 mm

No maintanance

Flameproof and weatherproof enclosure

Intrinsically safe probe

Cable entry 1/2 137

Terminals suitable for 1.5 sq.mm

6.2.2 Specifications
For Level Measurement

Accuracy : 1mm

Resolution : 1mm

Hysterysis : 0.5mm

Linearity : 1mm

Pressure : Atmospheric

Operating Temp. : 25to + 100 C

Measuring Range : Up to 5000 mm

Density Range : 0.7 to 1.2 gm/cc

For Temperature Measurement

Sensor : Pt 100

Range :0 100 C

Accuracy : 0.5 C

Resolution : 0.1 C

Power Supply : 24 V DC supply given from Petroquant

63
6.3 Challenges and Solutions
Challanges
As described in the Working Principle for detecting the level of the liquid the de-
tection of the pulse reflected from the float is important. This waveform is shown
in the fig 6.2.

(a) Reflected wave from float when the float is very


low

(b) Reflected wave from float when the float is high

Figure 6.2: Reflected wave from float when float at different positions

The reflected wave is positive part of sine wave and its amplitude is depends on
the position of float. When the float is at lowest level then the amplitude will be 2
V and if the float is at highest level then the amplitude will be 5 V. The challanges
regarding to design the signal conditioning are

To detect the peak of positive sin pulse

In worst case the distance between the two puses can be a 1.5 seconds.
64
It is a two wire transmitter. So the current available for this is too small. In
worst case the available current may be a 4 mA. But for signal conditioning
only 2.5 to 3 Amp. current is available because other current is used for
other puspose.

For 1mm accuracy, the accuracy in terms of time is 30 nSec.

Solutions
For detecting the peak of pulse I am going to use a comparator. Input of one
comparator will be a signal coming from the probe and other signal will be output
of DAC. The output of DAC will be controlled by a processor. As the comparator
detects the output the DAC input will become steady and the lead and lag puses
will generate which will convert the sine puse into a square. The peak of pulse will
be midded of this lead and lag pulse. In this way the peak of the sine pulse will be
detected. This procedure is shown in fig 6.3.

Figure 6.3: Detecting the peak of sine pulse

For this purpose the settling time of DAC must be less than 5 seconds. The
DAC which I have describe in previous sections and which is on Spartan 3E starter
kit is quite slower with full step settling itme 7 seconds and its current consump-
tion is 4.7 Amp. which is more than the available current. So there is no way to
use exeternal DAC for this purpose. So the one solution that I have found is to use
DAC which I have described in the previous sections. The connections for this
DAC with comparator for pulse peak detection is shwon in fig 6.4.
The biggest challeng is to detect the peak with 30 nS accuracy. The circuit
for other than pulse detection is available with them which includes LCD, HART
and Profibus protocol, Keypad, and other things and works on I2C protocol. They

65
Figure 6.4: Pulse Detection setup

have implemented these all things in PIC 24f series microcontroller which is a 16 bit
controller. But none of the PIC is available with this much speed. For this accuracy
the processor speed must be greater than 40 MHz. So I proposed to use FPGA. Now
the FPGA dont have the non-volatile PROM inside of FPGA. So for every power
up the FPGA must be configured and for that you want exeternal flash PROM
which will also increase current requirement and also takes nearly 100mSec time
to configure the FPGA which is very high. So I proposed to use the Spartan 3AN
series FPGA which includes onchip nonvolatile flash PROM and the configuration
time is about 100 Seconds. So I select the FPGA which is XC3S50AN.
But though when the FPGA is continously ON the current consumed by FPGA
alone is much higher than the current available with use. So I have desided to use
Hybernet mode of FPGA in which I can save 40% of current which will meet my
specifications. After detecting the pulse peak I will count after how many clock
cycles the peak is detected which is directly proportional to the level i.e. position
of the float. After detecting all 4 pulses I will transmit these counts to PIC and
PIC will process them according to algorithm. This design is developed in ISE and
System Generator.

66
6.4 Board Design
The FPGA I used is XC3S50AN which contain 50 kilogates. The specification of
the XC3S50AN is given as below.

CLBs: 1,584

Slices: 704

Distributed RAM=11Kbits

Block RAM: 54Kbits

Multipliers: 3

DCMs: 2

Maximum user I/O: 108

Bitstream Size: 427K

Insystem Flash Bits: 1M

This device is available in TQG144 package which includes 144 pins. The circuit
diagram is shown in the fig. The power regulator used is LP3908 from National
Semiconductor. The LP3906 is a multi-function, programmable Power Management
Unit, optimized for low power FPGAs, Microprocessors and DSPs. This device inte-
grates two highly efficient 1.5A Step-Down DC/DC converters with dynamic voltage
management (DVM), two 300mA Linear Regulators and a 400kHz I2C compatible
interface to allow a host controller access to the internal control registers of the
LP3906.National Power Expert for Xilinx is a software available from the National
which helps us to select the voltage regulator for our applications depending our
voltage and current requirements. The WEBENCH is a online design tool which
helps to design the whole power supply design. This device is available in LLT24
pin. Application diagram of regulator is shown in fig.
The input for this regulator is 5V DC. The specification fo the voltage regulator
is given as below.

1.5 Amp current output

Programmable Vout

Buck1: 0.8V to 2.0V


67
Figure 6.5: Board Schematics

68
Figure 6.6: Application diagram of voltage regulator

Buck2: 1.0V to 3.5V

Upto 96% accuracy

2 MHZ PWM switching frequency

3 % output accuracy

Automatic softstart

The oscilator used is SG8002JF which is 50 MHz oscilator from Epson. It is a


programmable high frequency programmable crystal oscillator. Its output frequency
is 1 Mhz to 125 MHz. The output of the oscilator is connected to the GCLK0 of
the FPGA whhich is a global clock input for the FPGA.The board will be configure
first time by the JTAG parallel cable as explained in the chapter 5. After that the
FPGA will be configured by internal flash PROM every time. The connectors are
given for the connecting the FPGA with the other peripherals.The DONE, PROG B
and SUSPEND pins are also connected to connectors to use it in different modes
like Hybernet and the Suspend mode. Entering in the Hybernet mode or Suspend
mode or the coming out of the mode will be controlled by the PIC. The counted
value of four peaks will be transmitted to PIC serially through FPGA.

69
Chapter 7

Conclusion

Range of floating point numbers is more than fixed point numbers for the same
word. Floating point numbering operations are more difficult to implement and
takes more resource and time to implement compare to fixed point operations though
is advantageous to implement floating point number system because of its wide range
and more accuracy than fixed point. I have implemented multiple MAC units which
are working in parallel which makes the system faster. Two or more operations can
be done parallelly in this design. Floating point MAC unit is also implemented which
is used to implement the 8 operation circular convolution. This design is faster than
the other architectures like DA, RNS, Delay Addition.It takes 302 clock cycles to
complete this operation, the same design takes 4075 clock cycles in TMS320C6317.
The FIR filter is implemented with the use of MAC unit and the dual port RAM
for different sampling frequency and the different cutoff frequencies and it works
well. One block is developed in which user can enter all the characteristics of filter.
This design is also checked with the sampling frequency of 44.1 KHz which is used
in the audio processing in sound processing, mobile phones etc. Sigma Delta type
ADC and DAC are implemented inside FPGA with few analog components outside
of the FPGA. The ADC is of 24 bit and the DAC is of 25 bit is developed and
its performance is compared with the onboard ADC and DAC available on the
Spartan 3E starter kit. The settling time of DAC is 4 seconds and the maximum
sampling rate for ADC is 40 seconds. It can be still increase with the help of DCM
(Digital Clock Manager) available inside the FPGA. Even the performance in terms
of speed, resource usage, power of whole design including floating point MAC, FIR
filter , ADC and DAC can be improve by using Virtex family FPGA.

70
Chapter 8

Future Scope

This design is implemented in the Spartan 3E and Spartan 3A FPGAs. These


FPGAs does not contain DSP48 blocks which contain the MAC inside FPGA. MAC
units are readily available in DSP48 block,I have developed MAC unit instead of
using DSP48. By using Virtex family the the design will perform faster with less
usage of resources. The design of ADC and DAC can be used as it is for the audio
applications. By putting more channels we can sample many signals with only
one ADC or DAC. If you want faster sampling time than available you can put
multiple ADC and DAC in a single FPGA depending your resources available and
your need. This filter design can be used in the applications like developing some
controling algorithm in which there is a need of filtering the signal coming from
plant like vibration signals, signals from thermocouple etc. Only need is to change
the coefficients and the number of tap of the filter. The floating point MAC unit can
be used for floating point matrix operations like matrix multiplication, inversion,
addition etc. These designs can become more optimistic by different architecture.

71
References

[1] G. Frantz and R. Simar, Comparing fixed- and floating-point dsps, Texas
Instrumentation.

[2] B. Fagin and C. Renard, Field programmable gate arrays and floating point
arithmetic, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRA-
TION (VLSI) SYSTEM, vol. 2, no. 3, pp. 365367, September 1994.

[3] Y. Li and W. Chu, Implementation of single precision floating point square


root on fpgas, IEEE, 1997.

[4] W. B. Ligon and S. McMillan, A re-evaluation of the practicality of floating-


point operations on fpgas, IEEE, 2003.

[5] J. Liang and R. Tessier, Floating point unit generation and evaluation for fp-
gas, IEEE Symposium on Field-Programmable Custom Computing Machines,
2003.

[6] K. Underwood, Fpgas vs. cpus: Trends in peak floatingpoint performance,


ACM, February 2004.

[7] G. Spivey and K. Nakajima, A component architecture for fpga-based, dsp sys-
tem design, IEEE International Conference on Application-Specific Systems,
Architectures, and Processors, 2002.

[8] M. I. ALI, A comparative study of different multiply accumulate architecture


implementations on fpga, Masters thesis, 2004.

[9] S. S. Khan and S. Yaqub, Distributed arithmatic for the design of high speed
fir filter using the fpga.

[10] Z. Luo and M. Martonosi, Accelerating pipelined integer and floating-point ac-
cumulations in configurable hardware with delayed addition techniques, IEEE
TRANSACTIONS ON COMPUTERS, vol. 49, no. 3, pp. 208218, March 2000.

72
[11] S. Vangal and Y. Hoskote, A 5ghz floating point multiply-accumulator in 90nm
dual vt cmos, IEEE International Solid-State Circuits Conference, vol. 19, no.
19.1, 2003.

[12] S. Vangal and Hoskote, A 6.2 gflops floating-point multiply-accumulator with


conditional normalization, IEEE JOURNAL OF SOLID-STATE CIRCUITS,,
vol. 41, no. 10, pp. 23142323, October 2006.

[13] C.-J. Chou and J. B. Evans, Fpga implementation of digital filters, ICSPAT,
1993.

[14] R. J. Andraka and R. Company, Fir filter fits in an fpga using a bit serial
approach.

[15] P. Longa and A. Miri, Area-efficient fir filter design on fpgas using distributed
arithmetic, IEEE International Symposium on Signal Processing and Infor-
mation Technology, 2006.

[16] M. A. Soderstrand and L. G. Johnson, Reducing hardware requirement in fir


filter design, IEEE, 2000.

[17] R. Valluvan and S. Kumar, Digital to analog conversion using delta sigma
conversion technique.

[18] A. Devices, Sigma delta adcs and dacs.

[19] I. Dzukleski and C. Ortega-Sanchez, Reconfigurable fir filter in fpga.

[20] J. Ou and S. P. Seng, A high-level modeling environment based approach for


the study of configurable processor systems, IEEE International Conference
on Microelectronic Systems Education, 2007.

[21] A. L. Walters, A scaleable fir filter implementation using 32-bit floating- point
complex arithmetic on a fpga based custom computing platform, Masters the-
sis, The Bradley Department of Electrical Engineering, Blacksburg, Virginia,
January 1998.

[22] B. Cope, Implementation of 2d convolution on fpga, gpu and cpu, IEEE,


2005.

[23] Panisset and Drolet, A floating point convolution system, IEEE, 1991.

73
[24] K. Underwood and S. Hemmert, Closing the gap: Cpu and fpga trends
in sustainable floating-point blas performance, IEEE Symposium on Field-
Programmable Custom Computing Machines, vol. 12, 2004.

[25] C. Logics, Distributed arithmatic for the design of high speed fir filter using
fpga.

[26] G. Comoretto, Design of a fir filter using a fpga, Masters thesis, March 2003.

[27] G. Kennedy and K. Rinne, A programmable bandgap voltage reference cmos


asic, Instrumentation and Measurement Technology Conference, May 2005.

[28] C. Charoensak and S. S.Abeysekera, Fpga implementation of kalman low-pass


filter for applications in sigma-delta sigma - delta demodulation, IEEE, pp.
219223, 2003.

[29] R. J. Landry and V. Calmettes, High speed iir filter for xilinx fpga, IEEE.

[30] X. HUANG and Y. HAN, The design and fpga verification of a general struc-
ture, area-optimized interpolation filter used in sigma-delta dac, IEEE, 2006.

[31] Y. Zafar and M. M. Ahmad, Adaptive on-chip oscillator for fpga based syn-
chronous designs, IEEE International Conference on Emerging Technologies,
pp. 295300, September 2005.

[32] S. Mirzaei and A. Hosangadi, Fpga implementation of high speed fir filters
using add and shift method, IEEE, 2006.

[33] R. Anderson, Getiing the most out of delta-sigma converters, Texas Instru-
ments Intercorporated.

[34] T. L. Brooks and D. H. Robertson, A cascaded sigmadelta pipeline a/d con-


verter with 1.25 mhz signal bandwidth and 89 db snr, IEEE OF SOLID-
STATE CIRCUITS, vol. 32, no. 12, pp. 18961906, December 1997.

74
Acknowledgements

I want to thank all those who directly or indirectly made my project a great
learning exprience, indicating me the values and imparting the skills and hardwork
required for the project.To make any work successful, along with the hard work and
sincere efforts, the proper guidance and support is very much essential. I would like
to express my sincere gratitude to those respectable personalities who genereously
helped me to complete the dissertation of this project and simple thanks would not
suffice it.
I take an opportunity to thank my project guide Prof. D.N.Sonawane, who
provided me all kind of supports at any time, at any stage of the project. His
valuable suggestions and helping hand would always be remembered in the future
path of my career. I would like to thank Prof. S. D. Agashe, Head, Department of
Instrumentation and Control and Prof. D. N. Sonawane who allowed me to do this
project. I would also like to thank Mr. Bedarkar, Managing director of SBEM Pvt.
Ltd. who find me able to do the project and helped me to develop FPGA board.
I would like to thank all the Prof. and staff members who directly or indirectly
helped me.

75

You might also like