Professional Documents
Culture Documents
M. TECH PROJECT
MASTER OF TECHNOLOGY
IN
Krunal H. Bhavsar
M0629P03
(2007-08)
ACCEPTANCE CERTIFICATE
Head
Project Guide
Department of
(Prof. D. N. Sonawane)
Instrumentation and Control
Date: Date:
i
DISSERTATION APPROVAL CERTIFICATE
Head
Project Guide Department of
Examiner
Prof. D. N. Sonawane Instrumentation and
Control
Date:
ii
Abstract
DSP filters are mandatory part of DSP applications like digital camera, mobile
phones, portable media players, etc. Till today DSP processors were the best choice
of the developers because of some features of DSP processors like inbuilt MAC units,
floating point engine etc. Though there are some disadvantages are there of DSP
processors like they are serial in nature, not reconfigurable, for simple operation like
filter the whole chip is unutilised. FPGAs becomes an attractive platform for these
applications, because of its parallel nature, reconfigurable architecture and dedicated
DSP blocks, IBM PowerPC Processor, Ethrnet MAC, Rocket IO, and floating point
processor inside it. DSP algorithms are mainly consists of adder, multiplier, counter,
etc. I present complete optimized architecture for FIR filter which does all kind of
filter operations inside FPGA.This architecture is common for all kind of filters
with different sampling frequency, cutoff frequency and different numbers of bits
for data and coefficients. Its performance is compared with the filter available in
the Simulink in signal processing tool box. This filter can use any one even the
coefficient values are generated from Simulink. Design of ADC and DAC are
also given which are implemented in FPGA and its performance is compared with
the other onboard ADC and DAC. The design of floating point MAC unit is also
given which performs 8 operation circular convolution with it in 302 clock cycles.
iii
Contents
Acceptancecertificate i
Approval certificate ii
Abstract iii
1 Introduction 1
2 Literature Survey 4
iv
4.8 Fixed Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.9 Floating Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.10 Implementation of ADC and DAC . . . . . . . . . . . . . . . . 46
4.10.1 DAC: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.10.2 ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.11 Implementation of FIR Filter . . . . . . . . . . . . . . . . . . . . . . 51
5 Results 57
5.1 Fixed Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Floating Point MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 FIR filter results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7 Conclusion 70
8 Future Scope 71
References 74
Acknowledgements 75
v
List of Figures
vi
4.18 Timing Diagram of fix point MAC . . . . . . . . . . . . . . . . . . . 42
4.19 Block Diagram of Floating point MAC . . . . . . . . . . . . . . . . . 43
4.20 Simulink diagram for floating point MAC . . . . . . . . . . . . . . . . 44
4.21 Timing Diagram for floating point MAC . . . . . . . . . . . . . . . . 45
4.22 Low pass filter connection with DAC . . . . . . . . . . . . . . . . . . 46
4.23 PWM generated by DAC . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.24 Low pass filter output of DAC . . . . . . . . . . . . . . . . . . . . . . 49
4.25 Connection for ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.26 Block Diagram of ADC implemented . . . . . . . . . . . . . . . . . . 51
4.27 Multichannel ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.28 Block Diagram of FIR filter . . . . . . . . . . . . . . . . . . . . . . . 53
4.29 GUI for FIR filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.30 Memory Control and Address Generator . . . . . . . . . . . . . . . . 55
4.31 Upscaling and Concatination . . . . . . . . . . . . . . . . . . . . . . . 55
4.32 MAC unit connection . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vii
List of Tables
viii
Chapter 1
Introduction
Today in all type of system ranging from video application, audio application,
processing plant data, some security system like face recognition, iris recognition,
voice recognition etc., filters are mandatory for removal of noise. In signal processing,
the function of filters are to remove unwanted parts of the signal, such as random
noise, or to extract useful parts of the signal, such as the components lying within
a certain frequency range. There are two main kinds of filter, analog and digital.
They are quite different in their physical makeup and in how they work. An analog
filter uses analog electronic circuits made up from components such as resistors,
capacitors and op amps to produce the required filtering effect. Such filter circuits
are widely used in such applications as noise reduction, video signal enhancement,
graphic equalizers in hi-fi systems, and many other areas.
A digital filter uses a digital processor to perform numerical calculations on
sampled values of the signal. The processor may be a general-purpose computer
such as a PC, or a specialized DSP (Digital Signal Processor) chip. The analog
input signal must first be sampled and digitized using an ADC (analog to digital
converter). The resulting binary numbers, representing successive sampled values
of the input signal, are transferred to the processor, which carries out numerical
calculations on them. These calculations typically involve multiplying the input
values by constants and adding the products together. If necessary, the results of
these calculations, which now represent sampled values of the filtered signal, are
output through a DAC (digital to analog converter) to convert the signal back to
analog form. In a digital filter, the signal is represented by a sequence of numbers,
rather than a voltage or current.
1
The basic setup of digital filter is shown in fig. 1.1
Today Digital filters are more popular than Analog filters because of some of the
advantages like
Unlike their analog counterparts, digital filters can handle low frequency sig-
nals accurately. As the speed of DSP/FPGA technology continues to increase,
digital filters are being applied to high frequency signals in the RF (radio
frequency) domain, which in the past was the exclusive preserve of analog
technology.
Digital filters are very much more versatile in their ability to process signals
in a variety of ways; this includes the ability of some types of digital filter to
adapt to changes in the characteristics of the signal.
2
Digital filters includes some of the disadvantages also like
The speed of digital filter is depends on the speed of ADCs and DACs used in
the design. In simulation it is not major problem but in actual hardware the
speed is major problem.
Today many types of digital processors available with different resources like DSP
from Texas Instrumentation, dS-PIC family from Microchip, FPGAs from different
companies etc. But these type of processors are not reconfigurable and parallel in
nature. Because of these they takes much more time to perform any operation. But
if these tasks are execute parallely then the speed of the operation will be much
more faster. So for application of these kind of application it is more important to
select a suitable processor for your application.
3
Chapter 2
Literature Survey
Standard floating point numbers are represented using an exponent and a mantissa
in the following format:
(signbit)mantissa base(exponent+bias)
The mantissa is a binary, positive fixed-point value. Generally, the fixed point is
located after the first bit, m0 , so that mantissa = m0.m1m2...mn, where mi is the
ith bit of the mantissa. The floating point number is normalized when m0 is one. The
exponent, combined with a bias, sets the range of representable values. A common
value for the bias is 2k1 , where k is the bit-width of the exponent. The IEEE
floating point standard makes floating point unit implementation portable and the
precision of the results predictable. Many VLSI people has worked on implementing
4
the floating point operations like addition [2],squre root[3]in, FPGA. In [4] auther
has implemented floating point addition and multiplication in the Xilinx 4020E,
Xilinx 6062XL, Xilinx 40250XV, and compared their performance in FPGA. Adder
with different architectures like standard floating point adder which contains the
steps like
Exponent Difference
Rounding
Leading-One-Predictor (LOP)
is implement in [5]. LOP adder requires more area than standard adder but results
in higher throughput. In LOP adder
the the leading is detected by the LUT function
F = (Ai Bi ) & Ai1 &Bi1 . Rounding and Normalization are also taken into
account in designing of adder. Xilinx Core generator also include these all operations
as IP which we can use as our need.
It is very important to compare the performance of FPGA with CPU for floating
point operations. As per the Moores law the number of transistor are getting
double at every 18 months. At the end of 2009 the CMOS technology will be 45 nm
technology. Today the Vertex family available with 65 nm while the Spartan family
is available with 90 nm. Every two years the feature size for CMOS technology
drops by over 40% [6]. This translates into a doubling of transistors per unit area
and a doubling of clock frequency every two years. Unlike CPUs, FPGAs have a
high degree of hardware configurability. Thus, while CPU designers must select a
resource allocation and a memory hierarchy that performs well across a range of
applications, FPGA designers can leave many of those choices to the application
designer. Simultaneously, the dataflow nature of computation implemented in field
programmable gate arrays (FPGAs) overcomes some of the issues with the memory
wall. There is no instruction fetch and much more local state can be maintained (i.e.
there is a larger register set). Thus, data retrieved from memory is much more likely
to stay in the FPGA until the application is done with it. As such, applications
implemented in FPGAs are free to utilize the improvements in area that accompany
Moores law. the floating-point performance of FPGAs has been increasing more
5
rapidly than that of commodity CPUs. Using the Moores law factors of 2 the
area and 2 the clock rate every two years, one would expect a 4 increase in FPGA
floating-point performance every two years. This is significantly faster than the 4
increase in CPU performance every three years. Architectural changes to FPGAs
have the potential to accelerate (or decelerate) the improvement in FPGA floating-
point performance. For example, the introduction of 1818 multipliers into Spartan
series as well as DSP 48 slices in Virtex dramatically reduce the area needed to
implement floating point operations.
There are some techniques through which it is easy to implement DSP algo-
rithms into FPGA. Some softwares are available which makes this task easy. These
softwares are like Handel-C from Celoxica, System Generator and Accel DSP from
Xilinx, Polis, C++ extention such as System-C, JAVA class such as JHDL etc.[7].
MAC(Multiply Accumulator) is a very important part of the any processor because
it does all the mathamatical operations like addition, multiplication, subtraction,
division etc. In Digital Signal Processors, only one MAC unit is there so the ex-
ecution of complex algorithms takes more time to execute. This can be overcome
by putting more MAC units which works parallely. In [8] the two architectures
are compared for implementing MAC unit. These two different architecture are
Distributed Arithnmatic (DA)[9] and Residue Number System (RNS). They have
compared the performance between DA, RNS and DA-RNS. In serial DA architec-
ture the values of two operands are given bit by bit i.e. serially. So in this type of
architecture the speed is too low and which will be not of any use. This architecture
is useful where there are very much less resources. But nowadays the FPGA are
became little chaper so the cost is not a issue. The basic serial architecture requires
N clock cycles to process N bit-operand. In RNS modulo adders are used. For this
we have to convert the operand into a number into log2 m form. So first convert
the operand and then reconvert into normal number. At last the both architectures
are combined and DA-RNS is produced. The total delay they got is 17.327 ns i.e.
maximum supported clock rate is 57.71 MHz which is very low.
Another technique is Delay Addition [10]. When an arithmetic calculation is
carried out in a RISC microprocessor, each instruction typically has two source
operands and one result. In many computations, however, the result of one arith-
metic instruction is just an intermediate result in a long series of calculations. For
example, dot product and other long summations use a long series of integer or
floating-point operations to compute a final result. While FPGA designs often suf-
fer from much slower clock rates than custom VLSI, configurable hardware allows
us to make specialized hardware for these cases; with this, we can optimize the
6
pipelining characteristics for the particular computation.A typical multiplier in a
full-custom integrated circuit has three stages. First, it uses Booth encoding to
generate the partial products. Second, it uses one or more levels of Wallace tree
compression to reduce the number of partial products to two. Third, it uses a fi-
nal adder to add these two numbers and get the result. For such a multiplier, the
third stage, performing the final add, generally takes about one-third of the total
multiplication time. If implemented using FPGAs, stage 3 could become an even
greater bottleneck because of the carry propagation problem. It is hard to apply fast
adder techniques to speed up carry propagation within the constraints of current
FPGAs. In this design Wallace tree used in place of simple multiplication. Because
of multiplication is conssits of number of additions whcih take more time to execute.
In this for pipelining additon they used the bit array approach instead of bit serial
approach. Bit serial approach is quite good for low resource platforms but results
in low throughtput. Bit array approach gives one product at every clock cycle but
the resource uses is quite large than bit serial approach.
One level of Wallace tree is composed of arrays of 3-2 adders (or compressors).
The logic of a 3-2 adder is the same as a full adder except the carry-out from the
previous bit has now become an external input. For each bit of a 3-2 adder, the
logic is:
S[i] = A1[i] A2[i] A3[i]
C[i] = A1[i]A2[i] + A2[i]A3[i] + A3[i]A1[i]
For the while array S + 2C = A1 + A2 + A3
S and C are partial results that we refer to in this paper as the pseudo-sum.
They can be combined during a final addition phase to compute a true sum. The
total number of inputs across an entire level of a 3-2 adder array is the same as
the bit-width of the inputs.In some Wallace tree designs, 4-2 adder arrays have also
been used because they reduce the number of compressor levels required [11]. Each
bit of such an array is composed of a 4-2 adder. The typical logic is:
7
encoding and Wallace tree except that a 4-2 adder array is inserted into the pipeline
before the final addition.
Similar as Integer MAC, we repeatedly execute pseudosum = pseudosum + in-
coming operand. Each incoming operand is an IEEE single-precision floating-point
number, with 1-bit sign, 8-bit exponent (EXP[7-0]) and 23-bit fraction. For sim-
plicity of discussion, we consider the exponent bits as three subfields: high-order
exponent, a decision bit, and low-order exponent. High-order exponent refers to
the EXP[7-6], the decision bit is EXP[5], and low-order exponent refers to EXP[4-
0]. We take different actions according to the value of these three fields. Like the
traditional adder, our design first extends the 23-bit fraction into 24-bit mantissa.
However, unlike the traditional adder, we choose not to align the incoming operand
and the current pseudosum directly because that way the alignment process could
easily become the bottleneck of the whole pipeline. In a traditional adder, the
incoming operand interacts with the accumulated pseudosum throughout the align-
ment process, which makes further pipelining impossible. Instead, we keep summary
information about the high-order exponent of the accumulated result and align its
mantissa to a fixed boundary according the its low-order exponent.Though with this
design [10] the the speed for floating point MAC get is 40 MHz which is very low
compare to other processors.
Some VLSI designs for MAC is given by Intel engineers which works on 6.2 GHz
and 5 GHz speed [11],[12], but these designs are not implemented in FPGA and the
design is more backhand than fronthand. These design is made with 90nm CMOS
technology. in [13] the FIR filter is developed with the use of MAC whcih is made by
use of 4 multipliers. The inverted form is well-suited for achieving a high sampling
rate even for higher order filters. This is possible because the throughput does not
depend strongly on the number of taps due to extensive pipelining. The fact that the
multipliers occupy a large area, however,might render the implementation of higher
order filters impractical. It has been shown that a high performance FIR filter with
substantial number of taps can be implemented on FPGAs by approximating the
filter coefficients to a sum or difference of two power-of-two terms. Implementation
of digital filters may be simplified by using only a limited number of power-of-two
terms so that only a small number of shift and add operations is required. A variety
of techniques have been proposed [13] to minimize the deterioration of the frequency
response due to these constraints. Such coefficient optimization techniques yield
performance sufficient for most practical applications. When the size of the chip is
a constraint, the arithmetic resources need to be shared at the expense of speed.
They have also implemented IIR with simple architecture by using MAC. They have
8
also compare their design with or without pipeline. But this design is only for fixed
point numbering system.
The FIR filter is simply made by a use of MAC. Even you can specifically develop
FIR filter in DSP with use of MAC which is inbuilt in processor. So one can use all
the techniques which are used for MAC to develop FIR or IIR filters. [14] have used
bit serial approach to develop FIR filter with bit serial approach. The maximum
frequency they get is 33 MHz for fixed point numbers which is too low. In [15]
the author has implemented FIR filter with the Distributed Arithmatic algorithm
and use a bit serial approach. Again for implementing the 16 tap FIR filter the
maximum frequency is 55.9 MHz and 288 bit of memory is used. In fir27 the FIR
filter is designed with parallel MAC and DA FIR for specifically Xilinx FPGAs with
SysGen. With this design the maximum throughput is abtained and compared for
different number of tap and also for Virtex and Spartan family. The performanc
using this design is satisfactory but this design is limited for fixed point numbering
system but SysGen block cant support floating point system. Now optimization
in resources is very important. In [16] the Dempster-Macleod (DM) Algorithm is
proposed to reduce the number of adders needed to implement FIR filter. It works
as follow: 115 can be represented as
115 = 26 + 25 + 24 + 21 + 20
This binary representation requires four adders. We can reduce the number of adders
by one using CSD:
115 = 27 24 22 + 20
This requires two subtractors and one adder. In this one adder is saved. Using the
approach of Dempster and Macleod, we can reduce the number of adders by one
more by factoring the number as follows:
The Dempster and Macleod (DM) algorithm uses only two subtractors. The mul-
tiply can be accomplished by cascading the two circuits. So adder used to design
filter are reduced. But this design is not implemented for floating point numbers. In
fixed point it is quite easy to do this type of conversions compare to floating point
numbers.
Since now no one has proposed the Digital Filter architecture which include
ADC and DAC inside the FPGA. Sigma-Delta Modulators are usful to implement
ADC and DAC inside the FPGA because in modulator most of the work is
9
done in digital mode. [17] shows the implementation of DAC with help of
low pass filter outside of FPGA. Xilinx application notes (xapp 154 for DAC and
xapp 155) are useful to implement ADC and DAC inside FPGA. In [17]
DAC is implemented in Spartan II (Xc2S300E). Analog Devices also manufactures
some of the ADCs and DACs based on the modulation. AN-283 (Application
note)[18] of Analog Devices shows the operation of the ADCs and DACs.
10
Chapter 3
Conceptual Explanation of
Dessertation Topic
11
DAC and filter all are inside the FPGA. I tried to make all these three components
inside the one FPGA with some analog devices outside of FPGA.
As per the comparision of the FPGA with DSPprocessors, General Purpose pro-
cessors and ASICS my main objective is to Implement Digital filter with ADC and
DAC in FPGA with optimization in terms of area, power, speed. The subobjectives
are:
Implement floating point MAC : By arranging floating point unit and internal
RAM to develop digital filter
Implement Digital filter: To filter the noise from the input signal.
Implement ADC/DAC: To convert analog signal into digital and then give it
to filter and convert digital value coming from filter into again analog form.
Optimization: Optimize the whole design in terms of area, power and speed.
Since the sign of floating point numbers is given by a special leading bit, the
range for negative numbers is given by the negation of the above values.
14
Special Values:
Zero: Zero is not directly representable in the straight format, due to the
assumption of a leading 1 (wed need to specify a true zero mantissa to
yield a value of zero). Zero is a special value denoted with an exponent
field of zero and a fraction field of zero. Note that -0 and +0 are distinct
values, though they both compare as equal.
Denormalized: If the exponent is all 0s, but the fraction is non-zero (else
it would be interpreted as zero), then the value is a denormalized number,
which does not have an assumed leading 1 before the binary point. Thus,
this represents a number (1)s 0.f 2 126, where s is the sign bit and f is
the fraction. For double precision, denormalized numbers are of the form
(1)s 0.f 2 1022. From this you can interpret zero as a special type of
denormalized number.
Infinity: The values +infinity and -infinity are denoted with an exponent
of all 1s and a fraction of all 0s. The sign bit distinguishes between
negative infinity and positive infinity. Being able to denote infinity as
a specific value is useful because it allows operations to continue past
overflow situations. Operations with infinite values are well defined in
IEEE floating point.
Not A Number: The value NaN (Not a Number) is used to represent a
value that does not represent a real number. NaNs are represented by a
bit pattern with an exponent of all 1s and a non-zero fraction. There are
two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).
A QNaN is a NaN with the most significant fraction bit set. QNaNs
propagate freely through most arithmetic operations. These values pop
out of an operation when the result is not mathematically defined. An
SNaN is a NaN with the most significant fraction bit clear. It is used to
signal an exception when used in operations. SNaNs can be handy to
assign to uninitialized variables to trap premature usage. Semantically,
QNaNs denote indeterminate operations, while SNaNs denote invalid
operations.
Floating Point Addition: Simple steps for floating point addition is shown
in the fig 3.3. The addition is explained here with example. From figure 3.3
the floating point addition is made of five steps: Suppose we want addition of
two numbers 21.44(A) + 7.24(B)
15
Figure 3.3: Floating Point Addition Algorithm
111001010111000011010110
Normalize: After adding the two mentissa the MSB must be 1. So for
getting the MSB 1 we have to shift the mentissa left and according to
that the exponent value will be changed.
Rounding: Rounding modes are used when the exact result of a floating-
point operation (or a conversion to floating-point format) would need
more significant digits than there are digits in the significand. The round-
ing methods rounds the ideal (infinitely precise) result of an arithmetic
operation to the nearest representable value, and give that representation
as the result. Some of the rounding methods are
Round to nearest (the default; by far the most common mode)
Round up (toward +; negative results round toward zero)
Round down (toward -; negative results round away from zero)
17
Round toward zero (sometimes called chop mode; it is similar to
the common behavior of float-to-integer conversions, which convert
-3.9 to -3)
Floating Point Multiplication: The Floating Point Addition is described
in detail with example in previous section. In this section only the algo-
rithm of floating point multiplication is shown in the fig 3.4.
Q is the primary data output of the core. A and B are multiplied together and the
product added or subtracted from the current result. A simplified schematic of the
core is shown in Fig 3.5.
Speed can be increased by putting multiple MAC units which works in parallel.
18
Figure 3.5: Multiply Accumulator
3.4 M odulation
The Delta-Sigma () modulation, which is also called Sigma-Delta () mod-
ulation, is a kind of analog-to-digital or digital-to-analog conversion characterized
by integrating () differences (). An analog to digital converter (ADC) or DAC
circuit which implements this technique can be easily realized using low-cost CMOS
processes, such as the processes used to produce digital integrated circuits. Delta
Sigma ADC and DAC are easy to implement in to digital domain. The benefits
of Delta Sigma converter is that it moves most of the conversion process in digital
form. This makes possible to combine high performance analog with digital pro-
cessing. The analog component use a single comparator, integrator and 1-bit DAC.
In this case I use DAC and integrator which are inside of FPGA and comparator
which is LM339.
3.4.1 ADC
Sigma-Delta ADC is given in fig 3.6. This is a first order Sigma-Delta ADC. The
order of DAC is decided form the number of integrators in the Sigma-Delta ADC.
First order Sigma-Delta convertors(fig. 3.6a) are stable but for second or higher
order convertors stability must be taken in to account. But the accuracy of second
or higher order convertors is higher than that of first order convertors. So selection
must be made in base of your application. The basic block diagram of first order 3.6a
and second order 3.6b ADC is shown in fig 3.6.
19
(a) First Order Sigma-Delta ADC
3.4.2 DAC
Digital to analog converters (DACs) convert a binary number into a voltage directly
proportional to the value of the binary number. A variety of applications use DACs
including waveform generators and programmable voltage sources. A Delta-Sigma
DAC uses digital techniques. Consequently, it is impervious to temperature change,
and may be implemented in programmable logic. Delta-Sigma DACs are actually
high-speed single bit DACs. Using digital feedback, a string of pulses is generated.
The average duty cycle of the pulse string is proportional to the value of the binary
input. The analog signal is created by passing the pulse string through an analog
low-pass filter. Delta-Sigma DACs are used extensively in audio applications. They
are suited for low frequency applications that require relatively high accuracy. As
is standard practice, the DAC binary input in this implementation is an unsigned
number with zero representing the lowest voltage level. The analog voltage output
is also positive only. A zero on the input produces zero volts at the output. All
ones on the input cause the output to nearly reach reference voltage. Basic block
diagram of Sigma Delta DAC is given in fig. 3.7
20
Figure 3.7: Sigma Delta DAC
Same as above the basic form of IIR filter can be given as 3.4 in differential
form and 3.5 as general form.
From the equations shown above the FIR filter can be described in direct form
as shown in fig 3.8 and IIR filter can be described in direct form as shown in fig. .
An FIR filter has a number of useful properties which sometimes make it prefer-
able to an infinite impulse response filter. FIR filters:
Are inherently stable. This is due to the fact that all the poles are located at
the origin and thus are located within the unit circle.
22
Figure 3.9: IIR Direct form
Require no feedback. This means that any rounding errors are not com-
pounded by summed iterations. The same relative error occurs in each calcu-
lation.
They can be designed to be linear phase, which means the phase change is pro-
portional to the frequency. This is usually desired for phase-sensitive applica-
tions, for example crossover filters, and mastering, where transparent filtering
is adequate.
There are a few terms used to describe the behavior and performance of FIR
filter including the following:
Filter Coefficients - The set of constants, also called tap weights, used to
multiply against delayed sample values. For an FIR filter, the filter coefficients
are, by definition, the impulse response of the filter.
23
Impulse Response A filters time domain output sequence when the input is
an impulse. An impulse is a single unity-valued sample followed and preceded
by zero-valued samples. For an FIR filter the impulse response of a FIR filter
is the set of filter coefficients.
Tap The number of FIR taps, typically N, tells us a couple things about
the filter. Most importantly it tells us the amount of memory needed, the
number of calculations required, and the amount of filtering that it can
do. Basically, the more taps in a filter results in better stopband attenuation
(less of the part we want filtered out), less rippling (less variations in the
passband), and steeper rolloff (a shorter transition between the passband and
the stopband).
There are many of the software tools and many of the digital signal processors
are available in which one can easily implement the FIR filter. In MATLAB one can
easily implement FIR filter with the ready function even in SIMULINK one FDA
tool is available in which the use has just to define the frequencies of interest and
the type of the filter and it will automatically generated. One can easily generate
this kind of filter in just 10 to 15 minutes. But user dont have any command on this
design and it is only in softform. Even in SIMULINK there is a Real time Workshop
in which you can download this filter in DSP processor kit like TMS320C6713 and
you will ready to filter the design. But to implement such kind of filter is not that
much easy for FPGA. But there are lots of advantages to implement filter in FPGA
because of its parallalism and reconfigurability.
There are generally five steps in the design of a digital filter:
some of the steps are grouped together when automated tool is used like FDA tool.
24
Chapter 4
Hardware - Software
Implementation
Hardware Implementation:
The Hardware which I have used is given below:
JTAG programmer
Devices Supported
Spartan-3E (XC3S500E-4FG320C)
CoolRunner-II (XC2C64A-5VQ44C)
Platform Flash (XCF04S-VO20C)
Memory
25
64MByte DDR SDRAM
Display
26
4.2 Spartan 3A Starter Kit
Here some of the features of the Spartan 3E starter kit is given
Devices Supported
Spartan-3A (XC3S700A-FG484)
Platform Flash (XCF04S-VO20C)
Clocks
Memory
Display
28
Figure 4.3: Spartan 3 Family FPGA
Spartan-3 platform - For applications where both high logic density and high
I/O count are important. Ideal for highly integrated data-processing applications
29
4.4 JTAG Programmer
Joint Test Action Group (JTAG) is the usual name used for the IEEE 1149.1
standard entitled Standard Test Access Port and Boundary-Scan Architecture for
test access ports used for testing printed circuit boards using boundary scan. JTAG
was an industry group formed in 1985 to develop a method to test populated circuit
boards after manufacture. A JTAG interface is a special four/five-pin interface
added to a chip, designed so that multiple chips on a board can have their JTAG
lines daisy-chained together, and a test probe need only connect to a single JTAG
port to have access to all chips on a circuit board. The connector pins are
The basic connections for multiple devices in a single chain is shown in fig 4.4
Since only one data line is available, the protocol is necessarily serial like SPI.
The clock input is at the TCK pin. Configuration is performed by manipulating a
state machine one bit at a time through a TMS pin. One bit of data is transferred
in and out per TCK clock pulse at the TDI and TDO pins, respectively. Different
instruction modes can be loaded to read the chip ID, sample input pins, drive
30
(or float) output pins, manipulate chip functions, or bypass (pipe TDI to TDO
to logically shorten chains of multiple chips). The operating frequency of TCK
varies depending on the chip, but it is typically 10-100 MHz (100-10 ns per bit).
When performing boundary scan on integrated circuits, the signals manipulated
are between different functional blocks of the chip, rather than between different
chips. The TRST pin is an optional active-low reset to the test logic - usually
asynchronous, but sometimes synchronous, depending on the chip. If the pin is not
available, the test logic can be reset by clocking in a reset instruction synchronously.
Data presented to TDI must be valid for some chip-specific Setup time before and
Hold time after the rising edge of TCK. TDO data is valid for some chip-specific
time after the falling edge of TCK. From Xilinx JTAG programmers are available in
two models (i) connection throuth USB and (ii) connection through parallel port.
Their cost is $199 and $111 respectively. I made JTAG programming cable which
programs throuth parallel port of cable. USB connected JTAG cable is difficult to
develop because for that you need USB controller. I can use iMPACT software for
programming the FPGA through parallel port JTAG programmer. It supports all
the family of FPGA as well as CPLD. The circuit diagram of this programmer is
shown in fig 4.5. These six points will connect to the six pins on the board which
31
you want to configure which is shown in fig 4.6.
32
Figure 4.7: Programmable Amplifier and ADC conncection with FPGA in Spartan
3E starter kit
a new 8-bit command byte is transmitted to it, the byte previously sent is echoed
back to the master. The bit transfer pattern between amplifier and FPGA is shown
in fig 4.8.
The timing diagram is given in fig 4.9 . All timing shown in diagram are in
nenoseconds.
This timing diagram has been created approximately to scale assuming that the
highest speed SCK is being used (minimum of 50ns Low and 50ns High). The
LTC6912-1 captures data (SDI) on the rising edge of SCK, so the data needs to be
valid for at least 30ns before the rising edge. The LTC6912-1 outputs data (AMP-
DO) on the falling edge of SCK. This output may take up to 85ns, so if the AMP-D0
value needs to be read, then it is advisable to delay the reading of this signal as long
as possible or operate at a slower clock speed. As the above diagram indicates, it is
definitely not possible for the master to read AMP-DO using the next rising edge
33
Figure 4.8: Bit transfer pattern between Amplifier and FPGA
34
Figure 4.9: Timing Diagram of Amplifier SPI communication
Figure 4.10: Master slave connection of ADC with FPGA in SPI mode
The SDO output of the LTC1407A-1 device changes as a result of the rising
edge of the applied SCK clock. This again is different to most slave devices which
change their output on the falling edge of SCK. Since the device does not have a
conventional chip select control, it is also vital that adequate SCK cycles are applied
to ensure the SDO output is left in tristate (high impedance) such that it can not
interfere with other devices sharing the SPI bus. It is therefore advisable to use the
typical 34 cycle sequence. The timing diagram for ADC communication is shown in
fig. 4.12
The maximum sample rate supported by the LTC1407A-1 is 1.5MHz. This is
only possible if the maximum rate of SCK is used for conversion and communication.
35
Figure 4.11: Bit pattern to be transfer to ADC from FPGA
RJ-45 connector. As shown 4.13, the FPGA uses a Serial Peripheral Interface
(SPI) to communicate digital values to each of the four DAC channels. The SPI
bus is a full-duplex, synchronous, character-oriented channel employing a simple
four-wire interface. A bus master-the FPGA in this example-drives the bus clock
signal (SP I SCK) and transmits serial data (SP I M OSI) to the selected bus
slave-the DAC in this example. At the same time, the bus slave provides serial
data (SP I M ISO) back to the bus master. The conncection of DAC to FPGA in
Spartan 3E starter kit shown in fig 4.13.
Looking specifically at the LTC2624 D/A converter, each communication is
formed of 4 bytes or 32-bits. Inside the D/A converter, the SPI interface is formed
by a 32-bit shift register. As a new 32-bit command word formed of command, ad-
dress and data fields is transmitted to it, the 32-bit word previously sent is echoed
back to the master. In order to use the D/A converter this response can be ig-
36
Figure 4.13: Connection of DAC to FPGA in Spartan 3E starter kit
ative to the SCK clock. The system is fully static and any clock rate up to the
maximum of 50MHz supported by the LTC2624 is possible. Remember to check
all timing parameters in the LTC2624 data sheet if you intend working at or close
to the maximum speed. The LTC2624 captures data (SDI) on the rising edge of
SCK, so the data needs to be valid for at least 4ns relative to the rising edge. The
timing diagram of DAC interface is shown in fig 4.15. The LTC2624 changes the
output data (SDO) in response to the falling edge of SCK allowing the master to
read the value at or near the next rising edge. It is important to notice that SDO
must be read on the first clock following the enable being asserted (DAC-CS=0)
37
Figure 4.15: Timing diagram of DAC communication with FPGA
otherwise bit 31 will be missed. In theory the SPI interface allows command words
to be transmitted at a rate slightly higher than 1.5 M-words/second. Even if this
is used to set all four channels individually, this rate would exceed the conversion
rate actually supported by the D/A converter and obviously some spacing between
commands would be necessary.
Download FPGA designs directly to the Spartan-3E FPGA via JTAG, using
the onboard USB interface. The on-board USB-JTAG logic also provides in-
system programming for the on-board Platform Flash PROM and the Xilinx
XC2C64A CPLD. SPI serial Flash and StrataFlash programming are per-
formed separately.
Program the on-board 4 Mbit Xilinx XCF04S serial Platform Flash PROM,
then Configure the FPGA from the image stored in the Platform Flash PROM
using Master Serial mode.
38
Program the on-board 128 Mbit Intel StrataFlash parallel NOR Flash PROM,
then configures the FPGA from the image stored in the Flash PROM using
BPI Up or BPI Down configuration modes. Further, an FPGA application
can dynamically load two different FPGA configurations using the Spartan-
3E FPGAs MultiBoot mode.
In every FPGA mode selection pins are there which selects the configuration
mode of the FPGA which are M0, M1and M2. These pins works as a mode
selection pins at the time of configuration and in run mode these pins can be
used as a general purpose I/O pins. Spartan 3E starter kit contains on-board
Xilinx platform flash PROM (XC2C64A). I have configured FPGA through
this memory. So at every power up there is no need to manually configure the
FPGA through JTAG port. The bitstream will be automatically downloaded
into FPGA at every power up.
39
Figure 4.16: Hardware Co-Simulation
Software Implementation Softwares used for implementing the filter into FPGA
are listed below:
System Generator :This tool is the industrys leading high-level tool for design-
ing high-performance DSP systems using FPGAs.
Develop highly parallel systems with the industrys most advanced FPGAs
Provide system modeling and automatic code generation from Simulink
and MATLAB (The MathWorks, Inc.)
Integrates the RTL, embedded, IP, MATLAB and hardware components
of a DSP system
A key component of the Xilinx XtremeDSP Tools Package and the XtremeDSP
Development and Starter Kits
40
MATLAB: MATLAB is needed because the System Generator toolbox is im-
plemented in Simulink.
The timing diagram for this MAC unit with 4 operations is shown in fig 4.18.
Here A and B each are fed at every one clock cycle. For 7 operation MAC from First
Data to Ready i.e. end of MAC operation the time needed is 9 clock cycles, same
for 6 operation MAC time needed is 8 clock cycles. These waveform are checked
with ModelSim.
41
Figure 4.18: Timing Diagram of fix point MAC
point multiplier and using external resistor to store the value of MAC in place
of accumulator. I use coregenerator floating point operation block for these
operations and put latency 2 clock cycles. So it gives me output at 2 clock
cycle when the input data is applied and sets one done bit for handshaking
putpose. Control signal is controls this handshaking operations. The results
of individual addition and multiplication block is given in results.
Single Port RAM: I have used RAM to store data which is coming from fix
to floating point conversion. I first stores these data and then I starts to do
multiplication first. When first multiplication complets the the first addition
starts. When first addition is working the second multiplication is also woking.
In this way for piplining I use internal RAM of FPGA. We can use integers
or arrays to store these values. But we have RAM available in FPGA so
when you are using these integeres indirectly you are using gates and that is
wastage of gates. So I have used internal RAM. If I use single RAM then it is
impossible to acquire values of A and B simultaniously in a single clock cycle.
So I use two separate RAM for A and B so I can acquire data for A and B
simultaneously and through this I saves the totally 8x8=64 clock cycles. Dual
port RAM is also possible to implement in which all signals are different for
A and B. But the resource usage for two single port RAM and one dual port
43
RAM is same so I uses two single port RAM instead of one dual port RAM.
I have implemented circular conversion using Floating Point MAC. The Simulink
diagram of it is given in fig. 4.20, where only the block of major operation are shown.
The sub blocks of the program inside the block are not shown. In diagram above the
every operation block is shown with arrow. All the blue parts are functional block,
some kind of operation is connected with them while yellow colour part only used
44
for in/out purposes or converting the data type. The timing diagram of 8 operations
circular MAC with the above program is shown in fig 4.21. Only first operation
out of 8 is shown here because the all 8 signals can not be shown in one diagram. In
left side the label shows the operation. The main operations of this operation are
listed below:
A RAM: Value read from RAM which shows the floating point presentation
of A.
B RAM: Value read from RAM which shows the floating point presentation
of B.
4.10.1 DAC:
The basic connections of Sigma Delta sigma DAC is given in the figure given below.
One low pass filter consists of one resistor and one capacitor is connected at the
output pin of the FPGA. The range of the out of of DAC will be 0 to VCCO. Here,
VCCO is the voltage given to the I/O port . It can be a 3.3 V, 2.5 V or 1.2V. Here
I have selected the VCCO of 3.3V. The connection of low pass filter to the FPGA
is whon in fig 4.22.
Here one resistor and capacitor make low pass filter which filters the PWM like
signal coming from FPGA and gives the voltage proportional to the DACin. We can
46
make any number of bit DAC without much change in program so it is advantageous
to set your resolution with the compromise with resource usage. You can not get
this kind of flexibility with external DAC or ADC. The output from DAC can be
counted from the equation given below:
For example, for an 8-bit DAC (MSBI = 7) the lowest VOUT is 0V when DACin is
0. The highest VOUT is 255/256 VCCO volts when DACin is FF. In this type of
DAC for getting lower settling time if you give higher clock. For this purpose you
can use internal Delay Lock Loop to give double or quadruple of external clock to
DAC block.
The value of R should be more than 2.5K. R must be 2.5 KW or greater to ensure
rail-to-rail switching, with an error of 1% or less. Keep the value of R low relative to
the impedance of the load so that the current change through the capacitor due to
loading becomes negligible. The filter time constant (t = RC) must be high enough
to greatly attenuate the individual pulses in the pulse string. On the other hand,
a high time constant may also attenuate the desired low-frequency output signal.
These potentially conflicting requirements are analyzed separately. The worst-case
peak-to-peak filter noise for an 8-bit DAC can be expressed as follows:
where
P P NF S =peak-to-peak noise expressed as a fraction of step voltage
f =f is the DAC clock frequency
=is the filter time constant, RC.
The cutoff frequency of the simple RC filter may be expressed as:
fc = 1/2
where:
fc =filter cutoff frequency
=is the filter time constant, RC.
To resolve each DACin sample to the full precision of a Delta-Sigma DAC, the sample
rate, i.e., the rate that DACin changes, must be less than or equal to 1/(2(M SBI +
1)) of the CLK frequency. In some applications, such as a programmable voltage
source, this is not an issue.
47
generates PWM likes waveform corresponding to the DACin, which is
given below for different values of DACin. It is shown in fig 4.23 After putting low
pass filter with the values of 3K and 0.01 microF the output of the DAC is shown
as in fig 4.24.
4.10.2 ADC
: ADC is made with the help of DAC. This ADC is a successive approximation type
ADC and it takes output of DAC as a reference. Simple diagram of ADC is shown
in fig 4.25. For converting analog value into digital through Sigma Delta ADC we
need one low pass filter and one comparator and one DAC. We will implement DAC
into FPGA as described above. The analog level is determined by performing a
serial binary voltage search, starting at the middle of the voltage range. For each
48
Figure 4.24: Low pass filter output of DAC
complete sample, only the upper bit of the DAC input is initially set, which drives
the reference voltage to midrange. Depending on the output of the comparator,
the upper bit is then cleared or it remains set, and the next most significant bit
of the DAC input is set. This process continues for each bit of the DAC input.
The DAC is one bit wider than the ADC output. This is required in order for the
lowest numbered bit of the ADC output to be significant. When all of the bits have
been sampled, the upper bits of the register feeding theDAC is transferred to the
ADC output register. Because of the serial nature of both the DAC and the analog
sampling process, this ADC is useful only on signals that are changing at a fairly
low rate. A typical high-precision application for this ADC would be monitoring a
physical metric, such as ambient air temperature or water pressure. It can also be
used for applications with lower precision at a higher sampling rate, such as utility
voice recording (for example, a telephone answering machine). If the analog input
voltage changes during the sampling process, it effectively causes the sample point
to randomly move. This adds a noise component that becomes larger as the input
frequency increases. For many applications, the strength of the additional noise will
be so low that it will be acceptable. This noise component can be removed with an
external sample and hold circuit for the analog input signal. The ADC sample rate
may be expressed as follows:
Where:
MABI=Number of bits of ADC
Fstm=configure width of the filter settle time
49
Figure 4.25: Connection for ADC
Block Diagram of ADC is given in fig. 4.26 In addition to the DAC, the ADC is
comprised of the following major elements:
DAC sample counter: This is a binary up counter that is the same width as
the DAC input. When the DAC input changes, it requires a minimum of one
complete cycle of this counter to resolve the new value at the output.
Mask shifter: This register, which is the same width as the DAC input, end-
lessly rotates a single bit right. This is effectively the state machine that
controls the bit sample sequence, implemented so that correct values can eas-
ily be loaded into the Reference shifter.
Reference shifter: This register drives the DAC input. It always starts a sample
with only the upper bit set. When only the upper bit is set, the comparator
output will be true if the analog input is greater than VCCO, and it will be
false if the analog input is less than 1/2 VCCO. If the comparator output is
true, the upper bit remains set, and the next lower bit is set. If the comparator
50
Figure 4.26: Block Diagram of ADC implemented
output is false, the upper bit clears, and the next lower bit is set. This process
continues all the way to the LSB, which causes voltage ADCref to home in on
the analog input voltage.
ADCout register: This register snaps the high order bits of the Reference
shifter when the sample is left justified; that is, the comparator output that
was sensed when only the upper bit of the DAC input was set is in the MSB.
The ADCout register is one bit shorter than the Reference shifter, making the
LSB in ADCout accurate to 1/2 LSB.
The Multi channel ADC is also implemented for 2 channels. Only need is one
comparator for each channel. Multiplexer and Demultiplexer is built inside the
FPGA. The basic diagram is shown in fig 4.27.
for fractions also. There are two inputs in FIR equation, sampled data and the
coefficient. In this design the data will be sampled as per specified by the user and
the coefficients will be generated by FDA Tool. The GUI of filter is shown in fig.
??The basic block of the block diagram are described as below: In the fields the
user have to enter the values.
Filter Coefficients: In this fields if user uses the FDA Tool then he has to write
xlf da numerator(FDA Tool) and if he is not using FDA Tool then he has
to manually enter the values of filter coefficients.
Coefficient Binay Point: The binary point needed for the coefficients.
52
Figure 4.28: Block Diagram of FIR filter
Data Binary Point: The binary point needed for the data.
Sampling Frequency (Hz): The sampling frequency as per the FDA Tool in
Hertz. It can be differ from the FDA Tool.
The data width must be smaller than the coefficient width otherwise the design will
give the error.
Memory Control and Address Generator: This block provides the ad-
dress and the read or write command to the memory block. When WN bit of
the memory is 1 then the RAM is in write mode i.e. the data will be stored
in the memory location defined by the address port. when the WN bit is 0
then the RAM is in read mode i.e. output of the RAM will be the data in
the memory location described by the address port. This bit is controlled by
this block. For this I use two seperate counters for data and coefficients. The
coefficient counter will be start from the length of coefficient. If the order
of your filter is 43 then the counter will be start from 43 and will stop at
53
Figure 4.29: GUI for FIR filter
match with the binary points of both data and coefficients. The connection
for Upsampling and concatination is shown as fig. 4.31
RAM: The RAM is used to store the data and the coefficients of the filter.
Here I use dual port memory to access data and coefficients both at a time. If
I use single block RAM then I want two seperate clock cycles to access data
and coefficients. But here you can access both at single clock cycle. Dual port
RAM is seperated in two seperate RAM with seperate control lines and output.
I use RAM as ROM for coefficients, because when you write coefficient once
you does not need to write it again they are remains unchanged. So there is
put the data line and selection line both at 0 i.e. Block B of RAM will always
be in read mode. I write the coefficients in RAM B initially so there is no
need to write anything in RAM B. Only the address is changed according to
55
the time. The length of RAM is depend on the FIR order, so the length of
FIR must also be change with the order of FIR. Here it changes as you select
the tap in the FDA Tool. Initially the location from 1 to the length of filter
will be zero and form next location to 2 lengthof f ilter the coefficients are
stored.
Register: The output of the accumulator is stored int the register. Register
will be reset same as accumulator. After register there is downsample and the
convert block. We have used the upsample before to make the system faster
so we have to use downsample to retain the original sample rate. So output
will be downsample and then given to the outside of the filter.
This filter design is compared with the analog filter which is developed with same
FDA Tool and available ready in signal processing toolbox of simulink.
56
Chapter 5
Results
Slices:1057
FFs:407
LUTs:1949
MULT 18 18 :4
Block RAM:4
Total number of clock cycles needed to compelete these 8 point circular convolution
is 302 clock cycle. For same 8 point circular convolution takes 4075 clock cycles in
57
TMS320C6713. So my design is 13.49 times faster than DSP processor. Even the
speed can be more increase by using multiple MAC units which works parallely.
5.3 DAC
The PWM signals from the DAC is shown in fig 4.23a and 4.23b. The signal
generated is not proper PWM. Its output frequency and amplitude is also changes
with the input change. The output from the lowpass filter is shown in fig. 4.24.
The selection guide line for resistor and capacitor is also given. The settling period
for this DAC is 4 Seconds. The settling time for SPI DAC is 3 for half step
change. The resource usage is shown in the table given in 5.1 for the both of the
DAC implementation: The barchart for the resource usage is shown in fig. 5.1
58
The DAC0808 interface with Spartan 3E starter kit is also done. The comparision
of inputs to the outputs for all three DAC is shown in the table 5.2. Note here that
the SPI DAC is 12 bit DAC while DAC0808 is 8 bit DAC. The DAC which I
have implemented is also a 8 bit DAC. The number of bits can be easily change in
the DAC.
% input O/P of SPI DAC O/P of DAC O/P of DAC0808 Counted O/P
10 0.34 0.33 0.33 0.33032
20 0.65 0.65 0.66 0.6599
30 0.98 0.99 1.00 0.9963
40 1.30 1.31 1.32 1.3198
50 1.65 1.65 1.66 1.65
60 2.0 1.99 2.0 1.9927
70 2.30 2.30 2.32 2.30
80 2.65 2.64 2.66 2.65
90 2.96 2.97 3.02 2.97
100 3.29 3.3 3.3 3.3
59
(a) Comparision of filter in FPGA vs. (b) FIR filter output
Simulink
Slices: 113
FFs: 181
BRAMs: 1
LUTs: 107
18 18 multipliers: 1
60
Chapter 6
The Magnetostrictive Level Transmitter is a two wire level transmitter under de-
velopment by SBEM Pvt Ltd.,Pune. They are the leader manufacturer of all kind
of level and flow transmitters. The probe is developed by them and my part is to
develop signal conditioning circuit for that probe. The working is the probe is given
below.
61
Figure 6.1: Probe and float design of Magnetostrictive level transmitter
6.2 Features,Specifications
6.2.1 Features
Accurate Level, Interface, temperature measurement
No maintanance
6.2.2 Specifications
For Level Measurement
Accuracy : 1mm
Resolution : 1mm
Hysterysis : 0.5mm
Linearity : 1mm
Pressure : Atmospheric
Sensor : Pt 100
Range :0 100 C
Accuracy : 0.5 C
Resolution : 0.1 C
63
6.3 Challenges and Solutions
Challanges
As described in the Working Principle for detecting the level of the liquid the de-
tection of the pulse reflected from the float is important. This waveform is shown
in the fig 6.2.
Figure 6.2: Reflected wave from float when float at different positions
The reflected wave is positive part of sine wave and its amplitude is depends on
the position of float. When the float is at lowest level then the amplitude will be 2
V and if the float is at highest level then the amplitude will be 5 V. The challanges
regarding to design the signal conditioning are
In worst case the distance between the two puses can be a 1.5 seconds.
64
It is a two wire transmitter. So the current available for this is too small. In
worst case the available current may be a 4 mA. But for signal conditioning
only 2.5 to 3 Amp. current is available because other current is used for
other puspose.
Solutions
For detecting the peak of pulse I am going to use a comparator. Input of one
comparator will be a signal coming from the probe and other signal will be output
of DAC. The output of DAC will be controlled by a processor. As the comparator
detects the output the DAC input will become steady and the lead and lag puses
will generate which will convert the sine puse into a square. The peak of pulse will
be midded of this lead and lag pulse. In this way the peak of the sine pulse will be
detected. This procedure is shown in fig 6.3.
For this purpose the settling time of DAC must be less than 5 seconds. The
DAC which I have describe in previous sections and which is on Spartan 3E starter
kit is quite slower with full step settling itme 7 seconds and its current consump-
tion is 4.7 Amp. which is more than the available current. So there is no way to
use exeternal DAC for this purpose. So the one solution that I have found is to use
DAC which I have described in the previous sections. The connections for this
DAC with comparator for pulse peak detection is shwon in fig 6.4.
The biggest challeng is to detect the peak with 30 nS accuracy. The circuit
for other than pulse detection is available with them which includes LCD, HART
and Profibus protocol, Keypad, and other things and works on I2C protocol. They
65
Figure 6.4: Pulse Detection setup
have implemented these all things in PIC 24f series microcontroller which is a 16 bit
controller. But none of the PIC is available with this much speed. For this accuracy
the processor speed must be greater than 40 MHz. So I proposed to use FPGA. Now
the FPGA dont have the non-volatile PROM inside of FPGA. So for every power
up the FPGA must be configured and for that you want exeternal flash PROM
which will also increase current requirement and also takes nearly 100mSec time
to configure the FPGA which is very high. So I proposed to use the Spartan 3AN
series FPGA which includes onchip nonvolatile flash PROM and the configuration
time is about 100 Seconds. So I select the FPGA which is XC3S50AN.
But though when the FPGA is continously ON the current consumed by FPGA
alone is much higher than the current available with use. So I have desided to use
Hybernet mode of FPGA in which I can save 40% of current which will meet my
specifications. After detecting the pulse peak I will count after how many clock
cycles the peak is detected which is directly proportional to the level i.e. position
of the float. After detecting all 4 pulses I will transmit these counts to PIC and
PIC will process them according to algorithm. This design is developed in ISE and
System Generator.
66
6.4 Board Design
The FPGA I used is XC3S50AN which contain 50 kilogates. The specification of
the XC3S50AN is given as below.
CLBs: 1,584
Slices: 704
Distributed RAM=11Kbits
Multipliers: 3
DCMs: 2
This device is available in TQG144 package which includes 144 pins. The circuit
diagram is shown in the fig. The power regulator used is LP3908 from National
Semiconductor. The LP3906 is a multi-function, programmable Power Management
Unit, optimized for low power FPGAs, Microprocessors and DSPs. This device inte-
grates two highly efficient 1.5A Step-Down DC/DC converters with dynamic voltage
management (DVM), two 300mA Linear Regulators and a 400kHz I2C compatible
interface to allow a host controller access to the internal control registers of the
LP3906.National Power Expert for Xilinx is a software available from the National
which helps us to select the voltage regulator for our applications depending our
voltage and current requirements. The WEBENCH is a online design tool which
helps to design the whole power supply design. This device is available in LLT24
pin. Application diagram of regulator is shown in fig.
The input for this regulator is 5V DC. The specification fo the voltage regulator
is given as below.
Programmable Vout
68
Figure 6.6: Application diagram of voltage regulator
3 % output accuracy
Automatic softstart
69
Chapter 7
Conclusion
Range of floating point numbers is more than fixed point numbers for the same
word. Floating point numbering operations are more difficult to implement and
takes more resource and time to implement compare to fixed point operations though
is advantageous to implement floating point number system because of its wide range
and more accuracy than fixed point. I have implemented multiple MAC units which
are working in parallel which makes the system faster. Two or more operations can
be done parallelly in this design. Floating point MAC unit is also implemented which
is used to implement the 8 operation circular convolution. This design is faster than
the other architectures like DA, RNS, Delay Addition.It takes 302 clock cycles to
complete this operation, the same design takes 4075 clock cycles in TMS320C6317.
The FIR filter is implemented with the use of MAC unit and the dual port RAM
for different sampling frequency and the different cutoff frequencies and it works
well. One block is developed in which user can enter all the characteristics of filter.
This design is also checked with the sampling frequency of 44.1 KHz which is used
in the audio processing in sound processing, mobile phones etc. Sigma Delta type
ADC and DAC are implemented inside FPGA with few analog components outside
of the FPGA. The ADC is of 24 bit and the DAC is of 25 bit is developed and
its performance is compared with the onboard ADC and DAC available on the
Spartan 3E starter kit. The settling time of DAC is 4 seconds and the maximum
sampling rate for ADC is 40 seconds. It can be still increase with the help of DCM
(Digital Clock Manager) available inside the FPGA. Even the performance in terms
of speed, resource usage, power of whole design including floating point MAC, FIR
filter , ADC and DAC can be improve by using Virtex family FPGA.
70
Chapter 8
Future Scope
71
References
[1] G. Frantz and R. Simar, Comparing fixed- and floating-point dsps, Texas
Instrumentation.
[2] B. Fagin and C. Renard, Field programmable gate arrays and floating point
arithmetic, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRA-
TION (VLSI) SYSTEM, vol. 2, no. 3, pp. 365367, September 1994.
[5] J. Liang and R. Tessier, Floating point unit generation and evaluation for fp-
gas, IEEE Symposium on Field-Programmable Custom Computing Machines,
2003.
[7] G. Spivey and K. Nakajima, A component architecture for fpga-based, dsp sys-
tem design, IEEE International Conference on Application-Specific Systems,
Architectures, and Processors, 2002.
[9] S. S. Khan and S. Yaqub, Distributed arithmatic for the design of high speed
fir filter using the fpga.
[10] Z. Luo and M. Martonosi, Accelerating pipelined integer and floating-point ac-
cumulations in configurable hardware with delayed addition techniques, IEEE
TRANSACTIONS ON COMPUTERS, vol. 49, no. 3, pp. 208218, March 2000.
72
[11] S. Vangal and Y. Hoskote, A 5ghz floating point multiply-accumulator in 90nm
dual vt cmos, IEEE International Solid-State Circuits Conference, vol. 19, no.
19.1, 2003.
[13] C.-J. Chou and J. B. Evans, Fpga implementation of digital filters, ICSPAT,
1993.
[14] R. J. Andraka and R. Company, Fir filter fits in an fpga using a bit serial
approach.
[15] P. Longa and A. Miri, Area-efficient fir filter design on fpgas using distributed
arithmetic, IEEE International Symposium on Signal Processing and Infor-
mation Technology, 2006.
[17] R. Valluvan and S. Kumar, Digital to analog conversion using delta sigma
conversion technique.
[21] A. L. Walters, A scaleable fir filter implementation using 32-bit floating- point
complex arithmetic on a fpga based custom computing platform, Masters the-
sis, The Bradley Department of Electrical Engineering, Blacksburg, Virginia,
January 1998.
[23] Panisset and Drolet, A floating point convolution system, IEEE, 1991.
73
[24] K. Underwood and S. Hemmert, Closing the gap: Cpu and fpga trends
in sustainable floating-point blas performance, IEEE Symposium on Field-
Programmable Custom Computing Machines, vol. 12, 2004.
[25] C. Logics, Distributed arithmatic for the design of high speed fir filter using
fpga.
[26] G. Comoretto, Design of a fir filter using a fpga, Masters thesis, March 2003.
[29] R. J. Landry and V. Calmettes, High speed iir filter for xilinx fpga, IEEE.
[30] X. HUANG and Y. HAN, The design and fpga verification of a general struc-
ture, area-optimized interpolation filter used in sigma-delta dac, IEEE, 2006.
[31] Y. Zafar and M. M. Ahmad, Adaptive on-chip oscillator for fpga based syn-
chronous designs, IEEE International Conference on Emerging Technologies,
pp. 295300, September 2005.
[32] S. Mirzaei and A. Hosangadi, Fpga implementation of high speed fir filters
using add and shift method, IEEE, 2006.
[33] R. Anderson, Getiing the most out of delta-sigma converters, Texas Instru-
ments Intercorporated.
74
Acknowledgements
I want to thank all those who directly or indirectly made my project a great
learning exprience, indicating me the values and imparting the skills and hardwork
required for the project.To make any work successful, along with the hard work and
sincere efforts, the proper guidance and support is very much essential. I would like
to express my sincere gratitude to those respectable personalities who genereously
helped me to complete the dissertation of this project and simple thanks would not
suffice it.
I take an opportunity to thank my project guide Prof. D.N.Sonawane, who
provided me all kind of supports at any time, at any stage of the project. His
valuable suggestions and helping hand would always be remembered in the future
path of my career. I would like to thank Prof. S. D. Agashe, Head, Department of
Instrumentation and Control and Prof. D. N. Sonawane who allowed me to do this
project. I would also like to thank Mr. Bedarkar, Managing director of SBEM Pvt.
Ltd. who find me able to do the project and helped me to develop FPGA board.
I would like to thank all the Prof. and staff members who directly or indirectly
helped me.
75