You are on page 1of 68

Institutionen fr systemteknik

Department of Electrical Engineering


Examensarbete

High Level Model of IEEE 802.15.3c Standard and Implementation of a Suitable FFT on ASIC

Examensarbete utfrt i Elektroniksystem vid Tekniska hgskolan vid Linkpings universitet av Tanvir Ahmed LiTH-ISY-EX--11/4462--SE
Linkping 2011

Department of Electrical Engineering Linkpings universitet SE-581 83 Linkping, Sweden

Linkpings tekniska hgskola Linkpings universitet 581 83 Linkping

High Level Model of IEEE 802.15.3c Standard and Implementation of a Suitable FFT on ASIC

Examensarbete utfrt i Elektroniksystem vid Tekniska hgskolan i Linkping av


Tanvir Ahmed LiTH-ISY-EX--11/4462--SE

Handledare:

Carl Ingemarsson
isy, Linkpings universitet

Mario Garrido
isy, Linkings universitet

Examinator:

Oscar Gustafsson
isy, Linkpings universitet

Linkping, 15 May, 2011

Avdelning, Institution Division, Department Electronics Systems Department of Electrical Engineering Linkpings universitet SE-581 83 Linkping, Sweden Sprk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats vrig rapport ISBN ISRN

Datum Date

2011-05-15

LiTH-ISY-EX--11/4462--SE Serietitel och serienummer ISSN Title of series, numbering

URL fr elektronisk version


http://www.es.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-68697

Titel Title

Svensk titel High Level Model of IEEE 802.15.3c Standard and Implementation of a Suitable FFT on ASIC

Frfattare Tanvir Ahmed Author

Sammanfattning Abstract A high level model of HSIPHY mode of IEEE 802.15.3c standard has been constructed in Matlab to optimize the wordlength to achieve a specic bit error rate (BER) depending on the application, and later an FFT has been implemented for dierent wordlengths depending on the applications. The hardware cost and power is proportional to wordlength. However, the main objective of this thesis has been to implement a low power, low area cost FFT for this standard. For that the whole system has been modeled in Matlab and the signal to noise ratio (SNR) and wordlength of the system have been studied to achieve an acceptable BER. Later an FFT has been implemented on 65nm ASIC for a wordlength of 8, 12 and 16 bits. For the implementation, a radix-8 algorithm with eight parallel samples has been adopted. That reduce the area and the power consumption signicantly compared to other algorithms and architectures. Moreover, a simple control has been used for this implementation. Voltage scaling has been done to reduce the power. The EDA synthesis result shows that for 16bit wordlength, the FFT has 2.64 GS/s throughput, it takes 1.439 mm2 area on the chip and consume 61.51 mW power.

Nyckelord Keywords WPAN, FFT, ASIC, Radix-8

Abstract
A high level model of HSIPHY mode of IEEE 802.15.3c standard has been constructed in Matlab to optimize the wordlength to achieve a specic bit error rate (BER) depending on the application, and later an FFT has been implemented for dierent wordlengths depending on the applications. The hardware cost and power is proportional to wordlength. However, the main objective of this thesis has been to implement a low power, low area cost FFT for this standard. For that the whole system has been modeled in Matlab and the signal to noise ratio (SNR) and wordlength of the system have been studied to achieve an acceptable BER. Later an FFT has been implemented on 65nm ASIC for a wordlength of 8, 12 and 16 bits. For the implementation, a radix-8 algorithm with eight parallel samples has been adopted. That reduce the area and the power consumption signicantly compared to other algorithms and architectures. Moreover, a simple control has been used for this implementation. Voltage scaling has been done to reduce the power. The EDA synthesis result shows that for 16bit wordlength, the FFT has 2.64 GS/s throughput, it takes 1.439 mm2 area on the chip and consume 61.51 mW power.

Acknowledgments
I would like to thank Oscar Gustafsson for giving me an opportunity to do my thesis in Electronics Systems. That gives me the access of the resources and all kind of facilities for doing my thesis. It gives me a new way of thinking and I believe that it will help me for my PhD in Japan. I am heartily thankful to my supervisors Carl Ingemarsson and Mario Garrido for guiding throughout the thesis and correcting various documents of mine with attention and care. Apart from that they helped me a lot to solve the technical issues related with the thesis. Their guidance helped me to get a grip on dierent design tool and VHDL, such that Matlab, Modelsim and Design Compiler. I oer my regards and blessing to all my friends who were sharing the lab with me for their inspiration and exchanging their culture and ideas. It was a great experience for me to work with dierent people from dierent countries and experiencing the multicultural environment. As well as it helps me a lot to know about dierent areas of electronics as they were working in dierent topics. Last but not least I am grateful to my parents for giving me every kind of support from my birth untill now. I believe that without their support it was not possible for me to continuing my Masters in Sweden.

vii

Contents
1 Introduction 2 Standard review of mm-Wave 2.1 Single carrier mode in mm wave PHY (SCPHY) . . . . 2.1.1 Bandwidth and carrier frequency . . . . . . . . . 2.1.2 Forward error correction (FEC) . . . . . . . . . . 2.1.3 Modulation . . . . . . . . . . . . . . . . . . . . . 2.2 High speed interface mode in mm wave PHY (HSIPHY) 2.2.1 Bandwidth and carrier frequency . . . . . . . . . 2.2.2 Forward error correction . . . . . . . . . . . . . . 2.2.3 Modulation . . . . . . . . . . . . . . . . . . . . . 2.2.4 OFDM . . . . . . . . . . . . . . . . . . . . . . . 2.3 Audio visual mode in mm wave PHY (AVPHY) . . . . . 2.3.1 Bandwidth and carrier frequency . . . . . . . . . 2.3.2 Forward error correction . . . . . . . . . . . . . . 2.3.3 Modulation . . . . . . . . . . . . . . . . . . . . . 2.3.4 OFDM . . . . . . . . . . . . . . . . . . . . . . . 3 High Level Model of IEEE 802.15.3c 3.1 System overview . . . . . . . . . . . 3.2 High level model . . . . . . . . . . . 3.2.1 Transmitter and receiver . . . 3.2.2 Channel . . . . . . . . . . . . 3.3 Performance evaluation . . . . . . . 3.3.1 SNR vs BER . . . . . . . . . 3.3.2 WordLength vs BER . . . . . 4 Background of FFT 4.1 Theoretical background . . . . . . 4.2 Architecture of the FFT . . . . . . 4.2.1 Feedforward architectures . 4.2.2 Single path delay feedback . 4.3 Building blocks of the FFT . . . . 4.3.1 Complex multiplier . . . . . 4.3.2 Buttery . . . . . . . . . . ix (HSIPHY) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 7 7 7 8 9 12 12 13 13 14 15 15 16 16 16 19 19 20 21 22 22 22 22 25 25 27 27 29 29 30 30

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

x 4.3.3 4.3.4

Contents ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Buers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 33 34 34 35 35 36 38 39 41 42 43 43 43 44 46 51 51 51 53

5 Implementation of FFT on ASIC 5.1 Design issue related to the FFT processor . 5.2 Radix-8 . . . . . . . . . . . . . . . . . . . . 5.3 Proposed architecture . . . . . . . . . . . . 5.3.1 Radix-8 buttery . . . . . . . . . . . 5.3.2 Shuer . . . . . . . . . . . . . . . . 5.4 ROMs for the coecients . . . . . . . . . . 5.5 Controller . . . . . . . . . . . . . . . . . . . 5.6 Methodology . . . . . . . . . . . . . . . . . 5.6.1 Hardware implementation in VHDL 5.6.2 Functionality testing . . . . . . . . . 5.6.3 Synthesizing and area calculation . . 5.6.4 Power calculation . . . . . . . . . . . 5.7 Design for Low Power . . . . . . . . . . . . 5.8 Comparison to previous approaches . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

6 Conclusion and Future Work 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography

List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 Constellation diagram of Constellation diagram of Constellation diagram of Constellation diagram of Constellation diagram of Constellation diagram of FEC data multiplexer. . Constellation diagram of Constellation diagram of Constellation diagram of Convolutional encoder. . /2 BPSK. . . . . . . /2 QPSK. . . . . . . /2 8-PSK. . . . . . . /2 16-QAM. . . . . . DAMI. . . . . . . . . OOK. . . . . . . . . . . . . . . . . . . . . . . QPSK modulation. . . 16 QAM modulation. 64 QAM modulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 10 11 12 12 13 14 15 16 18 20 23 23 26 26 27 28 28 29 29 30 31 31 32 32 32 35 36 36 37 37 38 38 39 39 40 41 41 42 43 44

IEEE 802.15.3c system. . . . . . . . . . . . . . . . . . . . . . . . . BER as a function of SNR. . . . . . . . . . . . . . . . . . . . . . . BER as a Function of Wordlength at SNR 35 dB. . . . . . . . . . . SFG of radix-2. . . . . . . . . . . . . . . . SFG of radix-4. . . . . . . . . . . . . . . . SFG of radix-16 decimation in frequency. SFG of radix-16 decimation in time. . . . Radix-2 feedforward architecture. . . . . . Radix-4 feedforward architecture. . . . . . Radix-2 feedback architecture. . . . . . . Radix-4 feedback architecture. . . . . . . Complex multiplier. . . . . . . . . . . . . Radix-2 buttery. . . . . . . . . . . . . . . ROM for coecients. . . . . . . . . . . . . Memory with pointer. . . . . . . . . . . . Shift registers. . . . . . . . . . . . . . . . SFG of radix-8 decimation in time. . . . SFG of radix-8 decimation in frequency. Data Path of the FFT . . . . . . . . . . Data path of the FFT. . . . . . . . . . . Implementation of radix-8 buttery. . . Shuing circuit. . . . . . . . . . . . . . Block diagram of shuer 1. . . . . . . . Block diagram of shuer 2. . . . . . . . Block diagram of shuer 3. . . . . . . . Block diagram of shuer 4. . . . . . . . Datapath controller. . . . . . . . . . . . ROM controller. . . . . . . . . . . . . . Entity of complex multiplier. . . . . . . Entity of a radix-2 buttery. . . . . . . . Entity of shuing circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents 5.16 Area and power consumption of the FFT before and after frequency scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.17 Power consumption before and after voltage scaling. . . . . . . . . 5.18 Power and area for dierent length buer. . . . . . . . . . . . . . . 5.19 Power and area of complex multiplier. . . . . . . . . . . . . . . . . 5.20 Power and area of radix-8 buttery. . . . . . . . . . . . . . . . . . 5.21 Power and area of FFT. . . . . . . . . . . . . . . . . . . . . . . . .

45 45 46 47 48 49

Contents

List of Tables
2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2 3.3 4.1 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Bandwidth and center frequency for dierent channels Modulation dependent normalization factor . . . . . . Subcarrier frequency allocation . . . . . . . . . . . . . Timing-related parameters for HSIPHY . . . . . . . . Low data rate channelization . . . . . . . . . . . . . . High data rate OFDM parameter . . . . . . . . . . . . Low data rate OFDM parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 13 15 17 17 17 18 19 21 21 30 33 33 40 46 47 48 49 49

MCS 6 specications . . . . . . . . . . . . . . . . . . . . . . . . . . Argument for modem.qammod . . . . . . . . . . . . . . . . . . . . Argument for modem.qamdemod . . . . . . . . . . . . . . . . . . . Comparison of pipelined architecture for the N point FFT . . . .

Constraint of the ASIC . . . . . . . . . . . . . . . . . . . . . . . . Design constraint of the FFT . . . . . . . . . . . . . . . . . . . . . Selection signal information . . . . . . . . . . . . . . . . . . . . . . Memory and Shift Register performance for dierent wordlength . Area and power for dierent components . . . . . . . . . . . . . . . FFT performance for dierent wordlength . . . . . . . . . . . . . . Comparison of architectures for the computation of a 512-point 8parallel FFT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of Various FFT for WPAN application . . . . . . . . .

Chapter 1

Introduction
The advancement of the applications in communication systems as well as the data rate of the applications are racing with time. Dierent task groups developed dierent standards and some of them are adopted by the IEEE. IEEE 802.15.3c is one of them. Some other applications of IEEE 802.15 standard are Bluetooth and Zigbee. These standards can support a data rate up to 100 Mb/s for short range (1 m - 10 m) communication. However, those atandards are not suitable for applications such as Live HD video streaming with a bit rate 3 Gbps, to replace the HDMI (2.2 Gbps) connection with wireless connectivity and large le transfer at very high speed. In 2005, IEEE 802.15 Alternative Task Group 3c developed a standard with an aim of providing wireless communication in a persons area while the data rate will be high enough to support those applications [1]. This standard uses the 60 GHz band as a carrier frequency. However, research shows that the band near 60 GHz has high attenuation in air compared to the 5 GHz band. As aresult, this band is more suitable for indoor rather than outdoor applictions. Moreover, it can limit the problem of channel interference. Later, in 2009, the standard was adopted by IEEE. The title of the thesis work is High Level Model of IEEE 802.15.3c and Implementation of a Suitable FFT on ASIC There are two components to this title. The rst one, high level model of the IEEE 802.15.3c standard. That include the exploration of the dierent aspects of the standard. Such as, Review of the standard and a high level model of one specic mode for this standard. The high level model has been used to optimized the dierent parameter (such as SNR and nite word length) for the physical layer. Second component is the implementation of a suitable FFT on ASIC. HSIPHY mode of this standard adopted orthogonal frequency division multiplexing (OFDM) to overcome multipath fading eect of wireless channel and FFT is the key component of OFDM. To implement an FFT on ASIC, a 65 nm technology standard cell library has been used. The main attention of the implementation was to reduce the power as well as the area. This document is organized in the following chapters: Chapter 1: Introduction 5

Introduction Chapter 2: Standard Review of mm-Wave- A review of the IEEE 802.15.3c standard and its dierent mode of operations. Chapter 3: High Level Model of IEEE 802.15.3c (HSIPHY) - Modeling of physical layer for High Speed Interface (HSIPHY) and eect of nite wordlength and SNR on bit error rate. Chapter 4: Backround of the FFT - Discussion about the algorithm of the discrete Fourier transform (DFT), dierent architectures of the FFT and the basic building blocks. Chapter 5: Implementation of the FFT on ASIC - Details of radix-8 and design issue, hardware implementation and results of the FFT. Chapter 6: Conclusion and Future Work - Dierent conclusions are drawn on the basis of the results and some direction for the research.

The whole design is based on Matlab and VHDL. Communication toolbox of Matlab has been used for the high level model of the standard and VHDL as a hardware description language for the implementation of the FFT. Modelsim and Design compiler have been used for the functionality testing and compilation of the design for a specic technology library, respectively. Finally, performance measurement (calculation of the power and area for a specic clock frequency) has been done by means of Design compiler and Nanosim.

Chapter 2

Standard review of mm-Wave


This chapter focuses on the standard review of the IEEE 802.15.3c. This standard is mostly used for high data rate transmission at GBPS rates such as video on demand, HDTV and home theater and data transmission at Gbps data rate. This standard use 60GHz as a carrier frequency [1]. This band a high attenuation in free space. Research shows that the 60 GHz band has attenuation of 15 dB per kilometer. So, this band is a promising candidate for indoor applications rather than outdoor. It is noted in [1] that the standard can operate in three dierent mode. Single Carrier mode in mmWave PHY (SCPHY) High Speed Interface mode in mmWave PHY (HSIPHY) Audio/Visual mode in mmWave PHY (AVPHY)

2.1

Single carrier mode in mm wave PHY (SCPHY)

This mode provides three dierent classes of modulation and coding scheme targeting dierent wireless connectivity applications. Class 1 has been specied for low rate and low cost mobile operation while this mode can support a data rate 1.5 Gb/s. Class 2 has been specied to achieve a data rate up to 3 Gb/s and class 3 has been specied for the high speed and high performance applications with a data rate over 5 Gb/s [1].

2.1.1

Bandwidth and carrier frequency

This mode operates in four dierent carrier frequency that ranges between 57.24 GHz to 65.88 GHz [1]. However the bandwidth remains equal for all four cases. These channels are dened in Table 2.1. 7

Standard review of mm-Wave

Table 2.1: Bandwidth and center frequency for dierent channels Channel ID Start frequency Center frequency Stop frequency 1 57.24 58.32 59.40 2 59.40 60.48 61.56 3 61.56 62.64 63.72 4 63.72 64.80 65.88

2.1.2

Forward error correction (FEC)

This mode of operation support reed solomon (RS) block codes and low density parity check (LDPC) block codes as a forward error correction scheme, whereas RS block code is mandatory and LDPC block code is optional. The dierent coding schemes are described as follows. RS(255,239) The RS(255,239) code shall use the polynomial generator in Equation 2.1 [1], where the number of the input is 239, it generates 16 code words and send along with the 239 input words. So, the total number of outputs is 255.
16

g(x) =
k=1

x + 2

(2.1)

Here, is the root of primitive polynomial p(x) = 1 + x2 + x3 + x4 + x8 and x is the input data. LDPC(672,588) LDPC is systematic, i.e., it encode an information block of size k,i into a codeword c of size n, c by adding n-k parity bits. Each of the parity matrices is partitioned into a square sub blocks of size z z identity matrix. The cyclic permutation I matrix p is obtained from the cyclically shifting the identity matrix by I times.

1 0 p0 = ... 0 0

0 1 0 ... ...

... ... ... 0 ...

... ... ... 1 0

0 1 0 0 0 0 0 , p1 = ... 0 0 ... 0 1 ... 1

... 1 ... 0 ...

... ... ... 0 0

0 0 ... 0 0 , p2 = 0 1 1 0 0

0 0 ... 0 1

1 ... ... ... 0

... ... 0 ... ...

0 0 1 0 0

LDPC(672,588) has 588 input bits and 672 output bits with a code rate of 7/8. Here, the number of parity bits is 84. The table is described in [1].

2.1 Single carrier mode in mm wave PHY (SCPHY) LDPC(672,504)

There has 504 input bits and 672 output bit in LDPC(672,504) with a code rate of 3/4. The number of parity bits is 168. However, it follows the same identity and permuted matrix as discussed in Section 2.1.2. The table is described in [1] LDPC(672,336) LDPC(672,336) is used for highly reliable applications with a code rate of 1/2. It takes 336 bits as an input and generates 672 bits. It follows the identity matrix of Section 2.1.2 and the table is described in [1].

2.1.3

Modulation

This mode supports six dierent modulation schemes depending on the data rate and the performance requirements of the applications. However, four of them are mandatory and the other two are optional. The optional schemes are used for low data rate application. /2 BPSK /2 is a binary phase modulation with /2 phase shift counterclockwise. Figure 2.1 shows the constellation mapping of the /2 BPSK signal. Here, zl is the input bit. The input bit has mapped with 1 of the constellation diagram when the input is 1. For the other case the bit is mapped with j. With this modulation one symbol is generated for every bit.

Zl

Counter Clockwise /2 rotation

-1

Figure 2.1: Constellation diagram of /2 BPSK.

/2 QPSK /2 QPSK encodes 2 bits per symbol, with a rotation of /2 counter clockwise. This modulation techniques shows four equally spaced phase on the radius. Figure

10

Standard review of mm-Wave

2.2 is the constellation mapping diagram for the /2 QPSK. This modulation scheme uses gray encoding [1].

d1d2

Q 11 1 -1 01 1 -1 00 I

10

Figure 2.2: Constellation diagram of /2 QPSK.

/2 8-PSK The constellation diagram of /2 8-PSK is depicted in Figure 2.3. In this techniques three bits are mapped toh one symbol of the constellation. Here, the three bits are denoted d1 d2 d3 . Again, this also has the /2 rotation as in previous cases. Eight dierent symbols are used for representing the arrival bits. The bits shall be gray encoded here as well.

d1d2d3 011 001 -1 000

Q 010 1 110 1 111 I -1 100 101

Figure 2.3: Constellation diagram of /2 8-PSK.

2.1 Single carrier mode in mm wave PHY (SCPHY) /2 16-QAM

11

The /2 16QAM constellation diagram is depicted in Figure 2.4. Here four bits, b1 b2 b3 b4 are mapped to one symbol. 16 dierent symbols with dierent radius has been used to represent the arrival bit.

Q 0010 0110 +3 0011 0111 1111 +1 +d 1110

b1b2b3b4 1010

1011

-3d

-d

+3d I

-1 0001 0101 -3 0000 0100 1101

1001

1100

1000

Figure 2.4: Constellation diagram of /2 16-QAM.

Dual Alternate Mark Inversion Dual Alternate Mark Inversion (DAMI) coding is optional and this scheme is used for low data rate and low cost applications. The constellation diagram is shown in the Figure 2.5. It takes two bits as input and generates one symbol.

On O Keying On O Keying (OOK) is also optional and this scheme is used for low data rate and low cost applications as DAMI. Figure 2.6 shows the constellation diagram. It takes one bit and generates one symbol for every bit.

12

Standard review of mm-Wave

10 1

00 11 1

01 1 I

Figure 2.5: Constellation diagram of DAMI.

Figure 2.6: Constellation diagram of OOK.

2.2

High speed interface mode in mm wave PHY (HSIPHY)

The HSI PHY is designed for low latency, high speed data and it use orthogonal frequency domain multiplexing (OFDM). This mode supports dierent modulation and coding scheme using dierent frequency domain spreading factors, modulations and LDPC block codes.

2.2.1

Bandwidth and carrier frequency

This mode uses Channel IDs 2 and 3 of Table 2.1 as a carrier frequency [1]. The band starts from 59.40 GHz and ends at 63.72 GHz. The center frequencies are 60.48GHz and 62.64GHz respectively for Channel IDs 2 and 3.

2.2 High speed interface mode in mm wave PHY (HSIPHY)

13

2.2.2

Forward error correction

This mode use both equal error protection (EEP) and unequal error protection (UEP) depending on the data rate and performance. The data multiplexer is shown in Figure 2.7. For the EEP case the both LDPC blocks will be the same and for the case of UEP, the two LDPC blocks will be dierent. In this mode four dierent LDPCs are used with dierent code rate. Three of them are the same as for SCPHY and the nal one is LDPC(672,420). This is discussed in the following.

Msb 8b

Octet demux 1:2

LDPC Encoder LDPC Encoder

MUX

Lsb 8b

Figure 2.7: FEC data multiplexer.

LDPC(672,420) LDPC(672,420) is used for high reliability applications with code rate 5/8. 420 bits is taken as a input and generate 672 bits. Here 252 bits are parity bit.

2.2.3

Modulation

This mode uses three dierent modulation techniques depending on the data rate and the performance. The modulation dependent normalization factor is given in Table 2.2. It is also stated in [1] that the value of d is 1 for normal constellation and 1.25 for skewed constellation. Table 2.2: Modulation dependent normalization factor Modulation K mod QPSK 1/ 1 + d2 16-QAM 1/ 5 (1 + d2 ) 64-QAM 1/ 21 (1 + d2 )

14 QPSK

Standard review of mm-Wave

The constellation diagram of QPSK is depicted in the Figure 2.8. SCPHY also use QPSK but without /2 rotation. However, it takes two bits b1 b2 as input and maps with the symbol. There are be four symbols on the radius of the constellation diagram.

Q 10 11 +1 +d

b1b2

-d

I -1 00 01

Figure 2.8: Constellation diagram of QPSK modulation.

16 QAM 16 QAM take four bits d1 d2 d3 d4 as input and generate one symbol. The constellation diagram is in the Figure 2.9. There are 16 dierent symbols with dierent values and radius on the constellation diagram. It can provide higher data rate than QPSK.

64 QAM The constellation diagram of 64 QAM is shown in Figure 2.10. Six bits are map with one symbol. Here b1 b2 b3 b4 b5 b6 are six input bits. In the constellation diagram there are 64 dierent symbols with dierent radius and angles.

2.2.4

OFDM

This mode support OFDM. There will be 3 DC sub-carriers, 16 pilot sub-carriers, 16 guard sub-carriers and 336 data sub-carriers [1]. The sub-carriers and their logical indexes are described in Table 2.3. Again, the total number of sub-carriers are 512 with a throughput of 2.64 GS/s for this mode. The timing related parameters for the FFT are given in Table 2.4.

2.3 Audio visual mode in mm wave PHY (AVPHY)

15

d1d2d3d4 0010 0011 -3 0001 0000

Q 0110 3 0111 -1 -1 0101 -3 0100 1100 1000 1 1 1101 3 1001 I 1111 1011 1110 1010

Figure 2.9: Constellation diagram of 16 QAM modulation.

Table 2.3: Subcarrier frequency allocation Subcarriers type Number of subcarriers Logical subcarriers indexes Null subcarriers 141 [256 : 186] [186 : 255] DC subcarriers 3 1, 0, 1 Pilot subcarriers 16 [166 : 22 : 12] [12 : 22 : 166] Guard subcarriers 16 [185 : 178] [178 : 185] Data subcarriers 336 All others

2.3

Audio visual mode in mm wave PHY (AVPHY)

This mode of the standard is mainly for multimedia applications, such as live HD video streaming, replacement of HDMI wired connectivity with wireless connectivity etc. This mode operate in two data rates: one is low data rate and the other one is the high data rate. The modulation and the coding schemes are varied for the data rate.

2.3.1

Bandwidth and carrier frequency

This mode supports two dierent data rate. One is high data rate and the other one is low data rate and dierent channels are used for those. High data rate uses Channel Id 2 of Table 2.1. Whereas, the low data rate support three dierent channels. These are described in Table 2.5. Here fc(HRP ) is the current high data rate channel.

16

Standard review of mm-Wave

Q 000100 000110 011100 010100 +7 000101 001101 011101 010101 +5 000111 001111 011111 010111 +3 000110 001110 011110 010110 +1 -7d -5d -3d -d +d +3d

b1b2b3b4b5b6 110100 111100 101100 100100 110101 111101 1101101 100101

110111 111111 101111 100111 110110 110110 101110 100110 +5d +7d I

000010 001010 011010 010010 -1 000011 001011 011011 010011 -3 000001 001001 011001 010001 -5 000000 001000 011000 010000 -7

110010 111010 101010 100010 110011 111011 101011 100011

110001 111001 101001 100001

110000 111000 101000 100000

Figure 2.10: Constellation diagram of 64 QAM modulation.

2.3.2

Forward error correction

This mode of the standard use convolutional encoding. The convolutional encoder diagram for this standard is depicted in Figure 2.11. The convolutional encoder encode with a code rate of 1/3. The convolutional encoder use 6 delay memory. And generator polynomial g0 = 1338 , g1 = 1718 andg2 = 1658 . The initial value of the memories are set to 0.

2.3.3

Modulation

This mode use the same QPSK and 16QAM modulation scheme as shown in Figures 2.8 and 2.9, respectively. This mode also use gray coded input bits.

2.3.4

OFDM

This mode use two dierent OFDM technique for low data rate and high data rate respectively. These are described in Table 2.6 and 2.7 for high data rate and low data rate respectively

2.3 Audio visual mode in mm wave PHY (AVPHY)

17

Table 2.4: Timing-related parameters for HSIPHY Parameters Description Value fs Reference sampling rate 2640 MHz TC Sample duration 0.38 ns Nsc Number of subcarriers 512 Ndsc Number of data subcarriers 336 NP Number of pilot subcarriers 16 NG Number of guard subcarriers 141 NDC Number of DC subcarriers 3 NR Number of reserved subcarriers 16 NU Number of used subcarriers 352 NGI Guard interval length in samples 64 fsc Subcarrier frequency spacing 5.15625 MHz BW Nominal used bandwidth 1815 MHz TF F T IFFT and FFT period 193.94 ns TGI Guard interval duration 24.24 ns TS OFDM Symbol duration 4.583 MHz FS OFDM Symbol rate 16 NCP S Number of samples per OFDM symbols 576

Channel Index 1 2 3

Table 2.5: Low data rate channelization Start Frequency Center Frequency fc(HRP ) 207.625 MHz fc(HRP ) 49 MHz fc(HRP ) + 109.625 MHz fc(HRP ) 158.625 MHz fc(HRP ) fc(HRP ) + 158.625 MHz

Stop Frequency fc(HRP ) 109.625 MHz fc(HRP ) + 49 MHz fc(HRP ) + 207.625 MHz

Table 2.6: High data rate OFDM parameter Parameter Value Occupied bandwidth 1.76 GHz Reference sampling rate 2.538 GHz Number of subcarriers 512 FFT period Nsc(HR) /fs(HR) 202 ns Subcarrier spacing 1/TF F T (HR) 4.96 MHz Guard interval 64/fs(HR) 25.2 ns Symbol duration TF F T (HR) + TGI(HR) 227 ns Number of data subcarriers 336

18

Standard review of mm-Wave

X coded data output

Input

+ +

Y coded data output

Z coded data output

Figure 2.11: Convolutional encoder.

Table 2.7: Low data Parameter Occupied bandwidth Reference sampling rate Number of subcarriers FFT period Subcarrier spacing Guard interval Symbol duration Number of data subcarriers

rate OFDM parameter Value 92 MHz 317.25 MHz 128 Nsc(LR) /fs(LR) 403 ns 1/TF F T (HR) 2.48 MHz 28/fs(HR) 25.2 ns TF F T (HR) + TGI(HR) 492 ns 30

Chapter 3

High Level Model of IEEE 802.15.3c (HSIPHY)


This chapter will mainly focus on the overview of the system and the high level model of the system in Matlab. In Chapter 2, the dierent modes of IEEE 802.15.3c were discussed. Among the three dierent modes HSIPHY is picked. However, this mode has 11 dierent MCS (Modulation and coding scheme) identiers. For the high level model MCS 6 has been selected. The specications of MCS 6 are described in Table 3.1. Table 3.1: MCS 6 specications Parameter Value Data Rate 5390 Mb/s Modulation Scheme 16-QAM Spreading Factor 1 Forward Error Correction LDPC(672,588) Coding Mode EEP

3.1

System overview

The system is depicted in Figure 3.1. This system can be divided into two main section. These are Transmitter and Receiver. The transmitter get the data from the MAC or protocol and the receiver send the data to the protocol. The received data from the protocol are encoded by the LDPC encoder, where the extra bits are added to protect the signal from the noise on the channel. The coded bits are modulated by the modulator and converted to discrete samples. The OFDM block convert those samples from discrete frequency to discrete time signal. Later, the Digital to Analog Converter (DAC) converts the discrete signal to a continuous time signal. The continuous time signal is processed in the RF section. Before 19

20

High Level Model of IEEE 802.15.3c (HSIPHY)

transmitting by the antenna, the RF section up-converts the baseband signal and amplies. At the other end the RF section of the receiver receives the signal,applies proper ltering and down-converts the received signal.

Transmitter
MAC/ Protocol

Base Band Processor

MAC/ Protocol

Base Band Processor

OFDM

LDPC

OFDM

LDPC

Receiver

Transreceiver

Figure 3.1: IEEE 802.15.3c system. The transmitted signals are propagated through the wireless channel to the receiver which introduce noise. The receiver receives the noisy signal by the antenna. The received signals are continuous time signal. The continuous time signals are processed in the RF blocks and send it to the Analog to Digital Converter (ADC) block to make the signals ready for the baseband processing section. The ADC converts the continuous time signal to a discrete time signal. The discrete time signal is converted to frequency domain signal after the OFDM block, which is nothing except an implementation of FFT. Samples in frequency are converted into bits in the demodulator block. The retrieved bits are sent to the MAC or protocol after the LDPC block. In the LDPC block, the encoded bits are decoded with the help of parity bits.

3.2

High level model

The high level model has been constructed for the specication in Table 3.1. The modelling setup includes MATLab and the communication toolbox. The communication toolbox includes most of the blocks for the system. The unavailable blocks have been modelled by MATLab. The model consist of three main blocks. These are transmitter, receiver and channel. The transmitter and receiver consist

3.2 High level model

21

of forward error correction (FEC) as LDPC(672,588), modulator as 16-QAM and OFDM as a subcomponents.

3.2.1

Transmitter and receiver

Forward error correction (FEC) Forward error correction has been used on both transmitter and receiver. The LDPC object of communication toolbox has been used for this case. LDPC (672,588) follows the standard [1]. The table and the permuted identity matrices have been generated in Matlab. The table consist of the zero matrices and permuted identity matrices. Modulation and demodulation Modulation and demodulation convert the bits into samples as well as samples into bits respectively. Modulation has been done on the transmitter and demodulation on the receiver. 16-QAM modulation and demodulation have been performed for this model. There are modem.qammod, modem.qamdemod, modulate and demodulate function in the communication toolbox to perform the modulation and demodulation. The arguments for modem.qammod and modem.qamdemod are described in Table 3.2 and Table 3.3. Later the created objects have been used in modulate and demodulate function to perform the modulation and demodulation. Table 3.2: Argument for modem.qammod Argument Description Value M Modulation index 16 PhaseOset Oset phase of the mapping /2 SymbolOrder Symbol order of the input gray InputType Type of input bit

Table 3.3: Argument for modem.qamdemod Argument Description Value M Modulation index 16 PhaseOset Oset phase of the mapping /2 SymbolOrder Symbol order of the input gray InputType Type of input bit DecisionType Type of decision LLR NoiseVariance Noise Variance of system 1.2

Orthogonal frequency division multiplexing (OFDM) The OFDM block has been modelled using IFFT and FFT on transmitter and receiver, respectively. 141 null subcarriers, 3 DC subcarriers, 16 pilot sub-carriers

22

High Level Model of IEEE 802.15.3c (HSIPHY)

and 16 guard subcarriers have been added with the 336 data subcarriers before the IFFT on the transmitter. In the receiver, the data subcarriers have been extracted from the 512 sub-carriers.

3.2.2

Channel

The processed signal is transmitted through the channel. The channel is wireless and it has multipath fading eect. The channel can be characterized in two ways. One is large scale characterization and the other is small scale characterization [2]. Large scale characterization has been applied here, as in Equation 3.1. The path loss P L(d) can be dened by the average path loss P L(d) and shadowing fading X . P L(d)[dB] = P L(d)[dB] + X [dB] (3.1) However, the average pathloss P L(d) can be expressed as in Equation 3.2. Where d0 and n denote the reference distance and PL exponent. The pathloss exponent n varies for dierent enviroment. This model has been modeled for the room enviroment. Xq is for the additional attenuation due to specic obstruction by objects. P L(d)[dB] = P L(d0 )[dB] + 10n log10 d d0
Q

+
q=1

Xq , . . . for d d0

(3.2)

3.3

Performance evaluation

Two dierent performance measures have been observed in this model. One is BER as a function of SNR and the second one is BER as a function of wordlength in the FFT. These are described in the following subsections.

3.3.1

SNR vs BER

The BER has improved with the SNR of the system. The graph in Figure 3.2 shows the results for dierent wordlength. BER of the model reduced with in increment of the SNR. Figure 3.2 shows the blue line for wordlength 8, the red line for wordlength 12 and the black line for wordlength 16. So, to achieve some number of BER the SNR can be selected for a specic wordlength.

3.3.2

WordLength vs BER

BER as a function of wordlength has shown in Figure 3.3. Here, the SNR of the system is 35 dB. Wordlength can be selected from the graph to achieve specic BER. As quantization noise is reduced for higher wordlength, the BER is also improved with wordlength. It has been observed that the BER is reduced for the higher input wordlength.

3.3 Performance evaluation

23

10

10

8 bit 12 bit 16 bit

Bit Error Rate (BER)

10

10

10

10

10

10

15 20 25 Signal to NoiseRatio(dB)

30

35

40

Figure 3.2: BER as a function of SNR.

10

10 Bit Error Rate (BER)

10

10

10

10

10 12 Wordlength

14

16

18

Figure 3.3: BER as a Function of Wordlength at SNR 35 dB.

Chapter 4

Background of FFT
A short description of the FFT algorithm, dierent architectures and the basic building blocks for the architectures are discussed in this chapter. Further information about the algorithm and architectures are discussed in [37].

4.1

Theoretical background

Some claim that 1965 is the start of the modern world, when J. Cooley and J. Tukey published their ecient method for numerical computation of the Fourier transform. Some others claim, the method was introduced by Gauss in the mid 1800s, the idea that lies at the heart of the algorithm is clearly present in an unpublished paper that appeared posthumously in 1866. However, the present and future demands are that now a days people process continuous signals by discrete methods. Computers and digital processing systems can not work with continuous sums. The FFT represent a general function in terms of summation of trigonometric functions. This mathematical operation transforms the time domain signal into frequency domain signal according to the DFT:
N

X[k] =
n=0

kn x[n]WN , k = 0...N 1

(4.1)

In Equation 4.1 X[k] and x[n] are the complex output and the input of N point kn FFT respectively, where n is the time index and k is the frequency index. WN is kn the twiddle factor. WN can be dened as in Equation 4.2.
kn WN = ej(2kn/N ) = cos(

2kn 2kn ) j sin( ) N N

(4.2)

For a better understanding of the operations performed by the FFT, the FFT is represented by its signal ow graph (SFG). Examples of signal ow graphs are shown in Figures 4.1, 4.2, 4.3 and 4.4. The SFGs in the Figures consist of butteries and complex rotations. For examples Figure 4.1 represents a radix-2 buttery, which computes: 25

26

Background of FFT

Figure 4.1: SFG of radix-2.

X[0] = x[0] + x[1] X[1] = x[0] x[1] Figure 4.2 shows a radix-4 buttery. A radix-4 buttery includes a complex multiplication by ej/2 = j. This is a trivial operation. From hardware point of view a trivial operation can be done without any hardware cost.

Figure 4.2: SFG of radix-4. The signal ow graph in Figure 4.3 shows a 16-point radix-2 DIF FFT and 2 the number after every stage, , indicates a rotation by, ej N . The the input sequences are in natural order whereas the outputs are bit reversed order. On the other hand, Figure 4.4 shows a signal ow graph of 16 point radix-2 DIT FFT. In this case, the inputs are in bit reversed order and the outputs are in natural order. Besides, the placement of multiplications is not same.

4.2 Architecture of the FFT

27

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7

0 0 0 0 0 2 4 6 0 0 0 0 0 2 4 6

0 0 0 4 0 0 0 4 0 0 0 4 0 0 0 4

0 8 4 12 2 10 6 14 1 9 5 13 3 11 7 15

Figure 4.3: SFG of radix-16 decimation in frequency.

4.2

Architecture of the FFT

The architecture of FFT can be divided in some dierent parts. Those are butteries, complex rotators, memories for twiddle factor, circuits for data management and control. Butteries and rotators are used for the calculation of mathematical operation of the signal ow graph. Basic pipelined architectures for the FFT operation are discussed below. The basic components for these architectures are discussed in the next section of this chapter.

4.2.1
Radix-2

Feedforward architectures

A radix-2 feedforward Architecture is depicted in Figure 4.5. The input sequence is broken down into two parallel data streams owing forward, with correct distance between the data elements entering the buttery scheduled by reorder. In this architecture both butteries and multipliers have an utilization ratio of 100%. C2 in the Figure 4.5 are switchs and BF2 are the radix-2 butteries. The numbers by the switch are the length of the buer. A detailed description about the architecture can be found in [3].

28

Background of FFT

0 8 4 12 2 10 6 14 1 9 5 13 3 11 7 15

0 0 0 4 0 0 0 4 0 0 0 4 0 0 0 4

0 0 0 0 0 2 4 6 0 0 0 0 0 2 4 6

0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure 4.4: SFG of radix-16 decimation in time.

Figure 4.5: Radix-2 feedforward architecture.

Radix-4

A radix-4 feedforward architecture is depicted in Figure 4.6. C4 and BF4 in the Figure 4.6 are the switchs and radix-4 butteries. The lengths of the buers are shown by the number in the box. Here, the input sequence is broken into four parallel data streams and proper distance between data elements are kept by the shuer. In this architecture the multipliers and the butteries have an utilization ratio of 100%. This architecture is good for high throughput applications. This architecture is well described in [8].

4.3 Building blocks of the FFT

29

64 128 192

C4

192 128 64

BF4

X X X

16 32 48

C4

48 32 16

BF4

X X X

4 8 12

C4

12 8 4

BF4

X X X

1 2 3

C4

3 2 1

BF4

Figure 4.6: Radix-4 feedforward architecture.

4.2.2
Radix-2

Single path delay feedback

A radix-2 feedback architecture is depicted in Figure 4.7. This architecture uses the registers eciently by storing one buttery output in the feedback shift registers, while a single data stream goes through the multiplier at every stage. However, this architecture suers 50% utilization of complex multipliers and butteries. This architecture is good for area ecient implementation. This architecture is described in [9].

Figure 4.7: Radix-2 feedback architecture.

Radix-4 A radix-4 single path feedback architecture is depicted in Figure 4.8. In this architecture the utilization of multipliers and butteries have been increased to 75%. However, the radix-4 buttery contains at least 8 complex adders and its utilization dropped to only 25%. More detail about the architecture can be found in [10]. The comparison of the dierent pipelined architectures is given in Table 4.1.

4.3

Building blocks of the FFT

These architectures use some basic building blocks. Such as, complex multiplier, buttery, ROM table, RAM and shift register. These blocks are discussed as follows.

30

Background of FFT

Figure 4.8: Radix-4 feedback architecture. Table 4.1: Comparison of pipelined architecture for the N point FFT ARCHITECTURE Multipliers Adders Control Radix 2 feedforward [11] 2(log4 N 1) 4 log4 N Simple Radix 4 feedforward [8] 3(log4 N 1) 8 log4 N Simple Radix 2 feedback [11] 2(log4 N 1) 4 log4 N Simple Radix 4 feedback [11, 12] log4 N 1 8 log4 N Medium

4.3.1

Complex multiplier

The complex multiplier is shown in Figure 4.9. A complex multiplier can compute (a + j b)(c + j d) = (ac bd) + j (ad + bc). Here a + j b is the multiplicand and c + j d is the multiplier. These have both real and imaginary parts. This operation can be done by four real multipliers, one adder and one subtractor. The subtractor can be implemented by an adder with a carry 1.

4.3.2

Buttery

The buttery is depicted in Figure 4.10. For the two inputs a and b of the buttery the outputs are a + b and a b. This operation can be done by one complex addition and one complex subtraction. Here, a and b are complex inputs. Again, the subtraction can be done by setting the carry to 1.

4.3.3

ROM

A ROM is used to store the coecients of the complex multipliers. Each coecient are stored in a specic address of the ROM. The coecients is accessed by the address of the ROM. Dierent size ROMs is used depending on the size of the FFT and input wordlength. A ROM is depicted in Figure 4.11. Here, the address is 5 bits and the wordlength is 8 bits.

4.3.4

Buers

Buers are used to store the samples as well as make the proper sequences for the butteries. The buers are can be implemented by memories or shift register. Memories are probably used for the long length buer and shift register for the short length. A memory is depicted in Figure 4.12, where two pointers are pointing

4.3 Building blocks of the FFT

31

Figure 4.9: Complex multiplier. the read and the write addresses of the memory. On the other hand, in the shift register, samples are shifted to the next register every clock cycle. A shift register is depicted in Figure 4.13.

x[0]

+ 0

X[0]

x[1]

+ 1

X[1]

Figure 4.10: Radix-2 buttery.

32

Background of FFT

Address 00000 00001

Content 00000000 11101000

11101 11110 11111

11101000 10001000 10001100

Figure 4.11: ROM for coecients.

Read Pointer

Write Pointer

Write Pointer

Read Pointer

Write Pointer Read Pointer

Figure 4.12: Memory with pointer.

X[i]

L-1

X[i+L]

Figure 4.13: Shift registers.

Chapter 5

Implementation of FFT on ASIC


This chapter focuses on the implementation of an FFT for the IEEE 802.15.3c (HSIPHY mode) standard. CORE65LPSVT technology library has been used for this implementation. This library is mainly used for ultra low power applications. For this implementation, 0.8 V supply voltage and a 330 MHZ clock have been used. Table 5.1 shows the specication of the ASIC. The specications for the FFT are noted in [1]. The FFT shall be 512 point and the sample rate 2.64 GS/s. In order to meet the requirement of the throughput, 8 parallel samples have been used as input of the FFT. Table 5.2 shows the requirements of the FFT. Among 512 sub-carriers, 336 are data sub-carriers, 16 are pilot sub-carriers, 16 are guard sub-carriers, 3 are DC sub-carriers and 141 are Null sub-carriers.

Table 5.1: Constraint of the ASIC ASIC Constraint Value Library CORE65LPSVT Process 65 nm Global Power Supply 0.8 V Global Clock Frequency 330 MHz

Table 5.2: Design constraint of the FFT Design Parameter Value Length of the FFT 512 sample rate 2.64 GS/s Samples in parallel 8

33

34

Implementation of FFT on ASIC

5.1

Design issue related to the FFT processor

FFT architectures can be divided into two dierent categories: pipelined architectures (such that feedforward and feedback) and memory-based architectures. These architectures are described in [7, 1315] and [1618] respectively. On one hand, pipelined architectures have the advantage of high throughput. However, these architectures have high area cost for large point FFTs. On the other hand, memory-based architectures have advantage of low area cost, but often the throughput is limited due to the memory access bandwidth and the available number of processing elements. In order to meet the requirements of IEEE 802.15.3c standard, a high throughput FFT processor needs to be designed. For high throughput applications, a pipelined FFT architecture has been adopted most times. Among dierent pipelined architectures, single path delay feedback architectures have the advantages of less number of memories and hardware compared to multipath feedforward architectures. However, single path delay feedback architectures use the processing unit for 50% compared to multipath feedforward architectures. On the other hand, multipath feedforward architectures can process two or more samples in parallel, whereas single path feedback ones only process one sample per clock cycle. Therefore, feedforward architectures can operate at slower clock than feedback architectures. For a slower clock, low power can be acheived for feedforward architectures. However, these architectures increase the hardware cost signicantly, as more complex rotators, butteries and memories are needed. The above listed architectures have some advantage and some common requirement, as has been well described in [1921]. A radix-8 and 8 parallel data architecture has been proposed for this application. As the throughput of the FFT is quite high, 8 parallel data can reduce the clock frequency and the direct implementation of radix-8 buttery need 8 parallel data. Besides, the proposed architecture reduces the number of multipliers and complex adders. Finally, the processing elements of the data path can operate at maximum 500 MHz (2 ns delay) clock frequency. Therefore, a 330 MHz clock has been used for the pipeline architecture, and 8 parallel samples are the good choice to reduce the input clock frequency.

5.2

Radix-8

Equation 4.1 shows that, for in-place computation of each value of k, N complex multiplications (4N real multiplications and 2N real additions) and N 1 complex additions (4N 2 real addition) are needed. The signal ow graph for the radix-8 0 decimation in time is depicted in Figure 5.1. However, the W8 coecient on the SFG can be ignored, because it represents a multiplication by 1. Figure 5.1 shows that samples are arriving at the input of the SFG as bit reversed, whereas the output are in natural order. The SFG of radix-8 decimation in frequency is depicted in Figure 5.2. Input samples are arriving in natural order and the outputs are in bit-reversed order. The complex multiplications are changed it position on the SFG. Apart from that the

5.3 Proposed architecture

35

x [0] x [4 ] x [1] x [5] x [2] x [6] x [3] x [7 ]


W8 1
0

X [0]
W8 1 W8 1 W8 1 W8 W8
2 0 0 0 0

X [1]
W8 W8
2 0

1 1 W8 W8 W8 1 1 W8
3 2 1 0

X [2] X [3]
1 1 1 1

X [4] X [5] X [6] X [7]

Figure 5.1: SFG of radix-8 decimation in time.

same number of complex multiplications and additions are used in the decimation in frequency decomposition.

5.3

Proposed architecture

A 512-point FFT processor has been proposed for this application. The architecture of the FFT and datapath are depicted in Figure 5.3 and Figure 5.4. The architecture consists of three main parts. Fourteen ROM tables for the coecients of the multipliers. The data path computes the FFT and a controller has been used for controlling the ROM coecients as well as the data path. The controller has been easily implemented by a six-bit counter. Figure 5.4 shows that the datapath consist of three stages of Radix-8 buttery. The rst two stages of the FFT include a total of 14 complex rotators. The third stage has only a radix-8 buttery. Shuer 1 and shuer 4 have been used before and after the FFT, in order to provide input and output samples in natural order. Shuer 2 and shuer 3 have been used inside the FFT for maintaining the proper order of data inside the FFT. The dierent blocks of the FFT are described as follows.

5.3.1

Radix-8 buttery

The implementation of the radix-8 buttery is depicted in Figure 5.5. For this architecture, the radix-8 buttery has been done by direct implementation of the butteries and constant complex rotations. There are twelve butteries, two constant complex rotators and three trivial rotators. The radix-8 buttery has three stages. The rst stage of butteries are leading two complex rotation and one trivial rotation by (j). The second stage follows by two trivial rotations by (j).

36

Implementation of FFT on ASIC

x [0] x [1] x [2] x [3] x [4 ] x [5] x [6] x [7 ]


W8 1 1 1 1 W8 W8 W8
3 2 1 0

X [0]
W8 1 1 W8 1
2 0

X [4] X [1] X [5] X [2]

W8 1 1 W8
2

X [6] X [3]

X [7]

Figure 5.2: SFG of radix-8 decimation in frequency.

Input

Data Path

Output

Coefficient

Controller

14 ROM Table

Figure 5.3: Data Path of the FFT Figure 5.5 shows the interconnection network of the radix-8 buttery. Trivial rotations (1, j and j) have been done by some modication in the buttery at no extra hardware cost. The multiplication by 1 has been done by interchanging the inputs on the input port. Again, multiplication by j can be done by interchanging the real and imaginary outputs. And multiplication by j can be done by interchanging input and output signals as it has done for 1 and j.

5.3.2

Shuer

Figure 5.6 shows the basic block for the shuer. The shuer consists of two multiplexers and input and output buers. The input and output buer lengths vary at dierent stages of the datapath. Both memory and shift registers have been used for the implementation of the buers. A study on memory and shift register has shown that memory takes less area and consumes less power for long

5.3 Proposed architecture

37

x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7] X X X X X X X X X X X X X X

X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7]

Figure 5.4: Data path of the FFT.


x[0] x[4] x[1] x[5] x[2] x[6] x[3] x[7] Butterfly x Butterfly Butterfly x[0] x[4] x[1] x[5] x[2] x[6] x[3] x[7]

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Figure 5.5: Implementation of radix-8 buttery.

length buers, whereas shift registers consume less power and less area for small length buers. Samples are stored in the buers for control signal 0. Samples of the output buers are replaced by input buers for control signal 1. The shuer 1 is shown in Figure 5.7. Twelve shuing circuits have been used in three stages. Dierent size of buers have been used in the dierent stages. First, second and third stages have 32, 16 and 8 input and output buers, respectively. Three dierent control signals have been used to control the shuers. For the rst stage the control signal shall change after every 32 clock as the length of input and output buers are 32. Second and third control signals must change after 16 and 8 clock cycles, respectively. However, the second and third selections shall wait for 32 and 48 clock cycle respectively. Shuer 2 and shuer 3 have also three stages. Figure 5.8 and 5.9 show the shuer 2 and shuer 3 respectively. The lengths of the buers for the shuer 2 and shuer 3 are 1, 2, 4 and 8, 16, 32. The gures show the interconnections of the shuer 2 and shuer 3. Three control signals have been used for the control of the three stages. Control signals 1, 2 and 3 for shuer 2 shall change after 1,2 and 4 clock cycles respectively, depending on the number of input and output

38

Implementation of FFT on ASIC

1 0

Figure 5.6: Shuing circuit.

Shuffler 1X32

Shuffler 1X16

Shuffler 1X8

Shuffler 1X32

Shuffler 1X16

Shuffler 1X8

Shuffler 1X32

Shuffler 1X16

Shuffler 1X8

Shuffler 1X32

Shuffler 1X16

Shuffler 1X8

Figure 5.7: Block diagram of shuer 1.

buers on each stages. Shuer 4 is depicted in Figure 5.10. There are twenty four shuing circuits that have been arranged in six stages. Six control signals have been used to control the stages of the shuer. The lengths of the input and the output buers of the six stages are 32, 4, 16, 2, 8 and 1. The control signals of the six stages must change from 0 to 1 every 32, 4, 16, 2, 8 and 1 clock cycle.

5.4

ROMs for the coecients

Fourteen ROMs in two stages have been used for this architecture. Seven memories of the 64 addresses for the rst stage and seven memories of 8 addresses for the second stage. The 64 addresses of the rst stage of ROMs can be represented by 6 bits. 64 coecients have been stored on each ROM. cos( 2 ) j sin( 2 ) is the N N content of the ROM for each specic address. cos( 2 ) and sin( 2 ) have been N N represented in 8 bit for the 8 bit implementation. The value of varies for each specic address and ROM. The value of for the address b5 b4 b3 b2 b1 b0 of the X-th ROM is X (b2 b1 b0 b5 b4 b3 )2 . Here, X is the number of memories from 1, 2 . . . 7 and b5 b4 b3 b2 b1 b0 is the address in the ROM. As an example, the value of for

5.5 Controller

39

Shuffler 1X1

Shuffler 1X2

Shuffler 1X4

Shuffler 1X1

Shuffler 1X2

Shuffler 1X4

Shuffler 1X1

Shuffler 1X2

Shuffler 1X4

Shuffler 1X1

Shuffler 1X2

Shuffler 1X4

Figure 5.8: Block diagram of shuer 2.

Shuffler 1X8

Shuffler 1X16

Shuffler 1X32

Shuffler 1X8

Shuffler 1X16

Shuffler 1X32

Shuffler 1X8

Shuffler 1X16

Shuffler 1X32

Shuffler 1X8

Shuffler 1X16

Shuffler 1X32

Figure 5.9: Block diagram of shuer 3. address 001100 of ROM 4 is 4 (100001)2 . So, is equal to 132. Again, there are seven ROMs of 8 addresses in this architecture. Each ROM has addresses from 0 to 7. Eight addresses can be represented by 3 bits. The same cos( 2 ) j sin( 2 ) equation have been used for calculation of the content of N N the ROM. The value of for ROM X of b2 b1 b0 address is X (b2 b1 b0 )2 , where X varies from 1, 2 . . . 7. As an example, the value of for 101 address of ROM 5 can be calculated as 5 (101)2 = 25.

5.5

Controller

The controller for the FFT has been implemented by a simple six-bit counter. Signals of the counter have been used for controlling both the control signals of the datapath as well as the addresses of the ROMs. The control for the datapath is depicted in Figure 5.11. Control signals of shuers have been controlled by the signals of the counter. Fifteen control signals have been mapped with the dierent

40

Implementation of FFT on ASIC

Shuffler 1X32

Shuffler 1X4

Shuffler 1X16

Shuffler 1X2

Shuffler 1X8

Shuffler 1X1

Shuffler 1X32

Shuffler 1X4

Shuffler 1X16

Shuffler 1X2

Shuffler 1X8

Shuffler 1X1

Shuffler 1X32

Shuffler 1X4

Shuffler 1X16

Shuffler 1X2

Shuffler 1X8

Shuffler 1X1

Shuffler 1X32

Shuffler 1X4

Shuffler 1X16

Shuffler 1X2

Shuffler 1X8

Shuffler 1X1

Figure 5.10: Block diagram of shuer 4.

signal of the counter depending on the time period of the signal. The MSB of the counter has been mapped to those control signals that have period of 64 clock cycles, whereas the LSB of the counter has been mapped to those control signals that have a period of 2 clock cycles. From control signal 2 to control signal 15 of the data path shall wait for half of the summation of the previous signals period. Equal number of buers have been used here. Number of delays and period of the signals are described in Table 5.3. Table 5.3: Selection signal information Control Signal Counter signal Period Delays 1 Count(5) 64 0 2 Count(4) 32 32 3 Count(3) 16 48 4 Count(0) 2 56 5 Count(1) 4 57 6 Count(2) 8 59 7 Count(3) 16 63 8 Count(4) 32 71 9 Count(5) 64 87 10 Count(5) 64 119 11 Count(2) 8 151 12 Count(4) 32 155 13 Count(1) 4 171 14 Count(3) 16 173 15 Count(0) 2 181 The controller for the ROM address is depicted in Figure 5.12. The fourteen ROM memories have been controlled by the same counter. Six signals of the counter have been mapped with the address bits of the rst 7 ROM memories, as the address of the rst 7 ROMs are represented by 6 bits. Three LSBs of the counter have been used for the controlling the address bits of next 7 ROM Table. Equalizing delays have been used for two stages of ROM. 56 and 63 delays have

5.6 Methodology

41

Shu er 1

Shu er 2

Shu er 3

Shu er 4

D D

D D D

D D D

D D D D D D

Counter

Figure 5.11: Datapath controller.

been used respectively for the 1st stage and 2nd stage ROMs, respectively.

ROM 64 X 7

ROM 8X7

Counter

6 bits 3 bits

Figure 5.12: ROM controller.

5.6

Methodology

For the implementation, dierent design tools have been used: Modelsim for the functionality testing, Design compiler for the synthesis and Nanosim for the power calculation. VHDL has been used as a hardware description language. The basic blocks for the architecture have been programmed in VHDL. As the FFT has been implemented for dierent wordlengths, generic and generate have been used for parameterizable wordlength of the blocks. Later the blocks have been used to build the FFT. Design compiler and Nanosim have been used to calculate the area and power consumption of the FFT.

42

Implementation of FFT on ASIC

5.6.1

Hardware implementation in VHDL

The entity of the complex multiplier is depicted in Figure 5.13. The generics WM1 and WM2 have been used to change the wordlength of multiplier and multiplicand. The basic block of the complex multiplier is a real value multiplier. A Wallace tree array multiplier has been used for this implementation. A pipeline of 5 stages has been used in the adder tree to reduce the critical path as well as to reduce the latency. The complex multiplier maintains the same input and output wordlength by discarding the LSB bits from the output.

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; use ieee.std_logic_unsigned.all; entity complex_multiplier is generic(WM1 : integer:=3; WM2 : integer := 2); port( in_real : in std_logic_vector(WM1-1 downto 0); in_imag : in std_logic_vector(WM1-1 downto 0); coeff_real : in std_logic_vector(WM2-1 downto 0); coeff_imag : in std_logic_vector(WM2-1 downto 0); clk : in std_logic; reset : in std_logic; mult_real : out std_logic_vector(WM1-1 downto 0); mult_imag : out std_logic_vector(WM1-1 downto 0)); end complex_multiplier;

Figure 5.13: Entity of complex multiplier.

The entity of the buttery is shown in Figure 5.14. Generics have been used to change the wordlength and the truncation. The buttery keeps the input wordlength for TE equals to 0 and increases it one bit for TE equals to 1. The basic radix-2 buttery is used in radix-8 one. The entity of the shuer is depicted in Figure 5.15. WL, Lin, Lout and BT have been used in generic to change the wordlength, length of the input and output buers, and selection between memory and shift registers. Study of memory and shift register has shown that memories consume less power and take less area for long buers and opposite for shift registers. For this implementation, both architectures have been taken into consideration to optimize the power and area. These basic components have been used to build the radix-8 buttery and the shuers. Twiddle factors for the complex multipliers have been calculated by Matlab. Matlab has been used to generate the VHDL code for the ROMs. These ROMs, radix-8 buttery, shuers and complex multiplier have been used to build the FFT. A simple six-bit counter has been used to control the FFT.

5.6 Methodology

43

library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; use ieee.numeric_std.all; entity butterfly is generic( WL : integer := 3; TE : integer := 1); port( in_1_real : in std_logic_vector(WL-1 downto 0); in_1_imag : in std_logic_vector(WL-1 downto 0); in_2_real : in std_logic_vector(WL-1 downto 0); in_2_imag : in std_logic_vector(WL-1 downto 0); clk : in std_logic; out_1_real : out std_logic_vector(WL-1+TE downto out_1_imag : out std_logic_vector(WL-1+TE downto out_2_real : out std_logic_vector(WL-1+TE downto out_2_imag : out std_logic_vector(WL-1+TE downto end butterfly;

0); 0); 0); 0));

Figure 5.14: Entity of a radix-2 buttery.

5.6.2

Functionality testing

The functionality of the FFT and the individual components has been tested by Modelsim. Test benches of individual component have been build and the functionality has been tested. Input and output sequences for the FFT have been generated in Matlab and the same input sequences have been used in the test bench of the FFT. The output sequences for the FFT have been tried to match with the output sequences generated by Matlab. Again, the datapath of the FFT has been tested without the radix-8 buttery and complex multiplier for the data management. Natural input sequences have been used at the input of the circuit with the proper control signals.

5.6.3

Synthesizing and area calculation

The FFT and individual components have been synthesized using Design compiler with CORE65LPSVT library. This library is for 65 nm process technology. Design compiler has been used to synthesis and optimize the area of the design for a specic clock as well as to generate the netlist of the design. The area of the FFT has been calculated by Design compiler.

5.6.4

Power calculation

The power consumption has been calculated by Nanosim. Random sequences for the FFT and individual components have been generated using Matlab. The netlist generated by Design compiler and the random sequences have been used to calculate the power. Voltage scaling has been done for the design by changing the supply voltage in the spice le.

44

Implementation of FFT on ASIC

library ieee; use ieee.std_logic_1164.all; entity shuffler is generic(WL : integer:= 10; Lin : integer := 20; Lout : integer := 10; BT : integer := 1); port( in0 : in std_logic_vector(WL-1 downto 0); in1 : in std_logic_vector(WL-1 downto 0); clk : in std_logic; sel : in std_logic; out0 : out std_logic_vector(WL-1 downto 0); out1 : out std_logic_vector(WL-1 downto 0)); end shuffler;

Figure 5.15: Entity of shuing circuit.

5.7

Design for Low Power


1 2 f cVdd 2

Dynamic power of any circuit can be illustrated by: Pdynamic = (5.1)

In the equation c is the area capacitance, Vdd is the supply voltage, f is the clock frequency and is the switching activity. The dynamic power can be improved by reducing the supply voltage Vdd , area capacitance c and clock frequency f. However, the area capacitance is indirectly related with the clock frequency. The area capacitance can be reduced by reducing the clock frequency. For optimizing the power of the FFT frequency scaling and voltage scaling have been done. Initially, the FFT has been synthesized for 380 MHz in order to operate any clock below 380 MHz. Due to the higher clock, the FFT takes more area. That results the higher capacitance and cause more power consumption. Voltage scaling can reduce the power. However, the area capacitance does not change for the voltage scaling. Frequency scaling has been done to reduce the area and power consumption. A 330 MHz clock has been used to reduce the area as well as the capacitance of the FFT. The bar charts in Figure 5.16 show the dierence of power and area for both clocks. The blue bars show the area and power for 380 MHz and the brown bars for 330 MHz. The results are shown for wordlength 8, 12 and 16. Voltage scaling has been done to reduce the power consumption of the FFT. Initially, the power of the FFT has been calculated for 1.2 V and there was a slack time of 0.5 ns. The voltage has been reduced from 1.2 V to 0.8 V and the slack time has been reduced as well. The bar chart in Figure 5.17 shows the change of power after voltage scaling. In the gure the blue bars show the power consumption for 1.2 V and the brown bars show the power consumption for 0.8 V. Memories have been replaced by shift registers for buer lengths over 8. On one hand, the switching activity increases with the length of the buers for shift registers and causes more power consumption. On the other hand, the switching

5.7 Design for Low Power

45

Power Consumption before and after Frequency Scaling


90 80 70 60 50 40 30 20 10 0 8 bits 12 bits Wordlength 16 bits 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

Area before and after frequency scaling

Power (mW)

380 MHz 330 MHz

Area (mm2)

380 MHz 330 MHz

8 bits

12 bits Wordlength

16 bits

Figure 5.16: Area and power consumption of the FFT before and after frequency scaling.

Power before and after voltage Scaling


160 140 120 100 80 60 40 20 0 8 bits 12 bits Wordlength 16 bits

Power (mW)

1.2 V 0.8 V

Figure 5.17: Power consumption before and after voltage scaling.

activity remains constant for the memories. Therefore, memories have been used for large buers and shift registers for small buers. Finally, one large wordlength buers have been replaced by multiple small wordlength buers in parallel. By this technique the number of read and write pointers have been reduced for the memories. The area and the power of the memories and shift registers for different lengths are described in Table 5.4 and the bar charts in Figure 5.18 show the relative comparison for the memories and shift registers for lengths from 2 to 32. In Figure 5.18 the blue bars represent the area and power consumption of the memories and the brown bars represent the area and power consumption of the shift registers. The area as well as the the power for the memories and shift registers increase with the length of the buers. The bar chart for power consumption in Figure 5.18 shows that the power consumption for the shift registers with the buer length. However, the power consumption remains constant for the memories, whereas the switching activity for shift registers increases with the buer

46

Implementation of FFT on ASIC

length. Conversely, the switching activity for memories remains constant for any length buer. Table 5.4: Memory and Shift Register performance for dierent wordlength Memory Shift Register length Area (m2 ) Power (W ) Area (m2 ) Power (W ) 2 981.75 80.3306 281.63 38.1587 4 1095.63 83.3872 525.19 60.8420 8 1358.23 86.8200 998.91 122.4963 16 1865.86 91.3967 1943.75 246.1874 32 2771.59 94.3003 3836.04 502.1067

Area for Memory and Shift Register


4500 4000 3500 3000 2500 2000 1500 1000 500 0 2 4 8 16 Length of the Buffer 32

Power Consumption by Memory and Shift Register


Power Consumption (uW)
600 500 400 300 200 100 0 2 4 8 16 Length of the Buffer 32 Memory Shift Register

Area (um2)

Memory Shift Register

Figure 5.18: Power and area for dierent length buer. The performance of the complex multipliers and radix-8 buttery have been evaluated in terms of power and area. The area and the power for the complex multiplier and the radix-8 buttery are described in Table 5.5 for wordlength 8, 12 and 16. The bar charts in Figure 5.19 and 5.20 show the power consumption and area for the complex multiplier and the radix-8 buttery, respectively. Table 5.5 shows the trade o between performance and wordlength. Power consumption and area increase with the wordlength.

5.8

Comparison to previous approaches

Table 5.6 and bar charts in Figure 5.21 show the performance of the proposed architecture in account of power consumption and area. Figure 5.21 shows the power and area for the FFT and for the input and output reorder. The blue color of the bars show the power and area for the FFT and the brown color shows the area and power consumption of the input and output reorder. The FFT consumes

5.8 Comparison to previous approaches

47

Table 5.5: Area and power for Word length 8 bit Complex Multiplier 12 bit 16 bit 8 bit 12 bit Radix 8 16 bit

dierent components Area Power (mm2 ) (mW ) 0.01434 0.9215 0.03222 1.6938 0.05748 2.6595 0.04826 6.3843 0.10402 9.2194 0.18221 12.2570

Power Consumption by Complex Multiplier


3 0.07 2.5 2 1.5 1 0.5 0 8 12 Wordlength 16 0.06 0.05 0.04 0.03 0.02 0.01 0

Area for the Complex Multiplier

Power (mW)

Area (mm2)

12 Wordlength

16

Figure 5.19: Power and area of complex multiplier. more power than the input and output reorder, as the computations have been done in the FFT and causes more switching activity. The number of the complex rotators, complex adders and memories for the proposed architecture are compared with previous approaches in Table 5.7. As the table shows, this architecture requires less number of complex rotators, complex adders and memories than previous approaches, so the area have been reduced. Therefore, the area capacitance have been reduced as well as power consumption for the FFT. Table 5.8 shows the comparison of the proposed architecture with the previous approaches. For the proposed approach the results are shown for wordlengths 8, 12 and 16. As a dierent technology has been used for the proposed design, the power consumption and area need to be normalized. Power consumption and area have been normalized by Equation 5.3 and 5.2 according to [25, 26]: Normalized Area = Normalized Power = Area (Tech./65nm)2 (5.2) (5.3)

Power Consumption (Tech./65nm) (Vdd /0.8)2

Table 5.8 shows that the proposed architecture achieves higher throughput and

48

Implementation of FFT on ASIC

Power Consumption by Radix 8 Butterfly


14 12 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

Area for Radix 8 Butterfly

Power (mW)

10 8 6 4 2 0 8 12 Wordlength 16

Area (mm2)

12 Wordlength

16

Figure 5.20: Power and area of radix-8 buttery.

Table 5.6: FFT performance for Word length 8 bit Complete system 12 bit 16 bit 8 bit FFT 12 bit 16 bit

dierent Area (mm2 ) 0.683 1.252 1.873 0.391 0.881 1.439

wordlength Power (mW ) 46.82 54.81 74.46 38.49 42.04 61.51

better eciency in terms of power consumption and area. For wordlength 12, the proposed architecture has reduced the power consumption by 10% and the area by 31% with respect to previous approaches for the same wordlength and FFT size [24].

5.8 Comparison to previous approaches

49

Power Consumption by FFT


2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 8 bits 12 bits Wordlength 16 bits 80 70 60 50 40 30 20 10 0 8 bits

Area of the FFT

Power (mm2)

Reorder FFT

Area (mW)

Reorder FFT

12 bits Wordlength

16 bits

Figure 5.21: Power and area of FFT.

Table 5.7: Comparison of architectures for the computation of a 512-point 8parallel FFT. PIPELINED AREA ARCHITECTURE Complex Complex Complex Type Radix Rotators Adders Sample Memory FF (MDC) Radix-8, [22] 14(6) 72 1170 FF (MDC) Radix-2, [23] 28 72 504 FB (MDF) Radix-2, [9] 28 144 504 Iterative Radix-16 + 2, [24] 32 256 1024 FF (MDC) Proposed, radix-8 14(6) 72 504

Table 5.8: Comparison of Various FFT for WPAN application PREVIOUS APPROACHES PROPOSED APPROACH Iterative FB (MDF) FB (MDF) FF (MDC) PARAMETERS [24] [27] [28] 8-bit 12-bit 16-bit Point (N) 512 2048 2048 512 512 512 Radix (r) 16 + 2 Mixed 2 8 8 8 Parallel samples(P) 8 4 8 8 8 8 Wordlength (bit) 12 9 9 8 12 16 Process(nm) 90 90 90 65 65 65 Voltage (V) 1 1 1 0.8 0.8 0.8 Clock (MHz) 324 300 300 330 330 330 Throughput (GS/s) 2.59 1.2 2.4 2.64 2.64 2.64 Area(mm2 ) 2.46 0.97 1.16 0.391 0.881 1.439 Normalized Area 1.28 0.5 0.6 0.391 0.881 1.439 Power(mW) 103.5 117 159 38.49 42.04 61.51 Normalized Power 47.84 54.08 73.49 38.49 42.04 61.51

Chapter 6

Conclusion and Future Work


This chapter discusses conclusion that can be drawn from the previous chapters and some direction about the future research on this topic.

6.1

Conclusion

Based on the results the following conclusion can be drawn: High level model has been done for the standard and BER has been calculated for dierent level of SNR and wordlength. The FFT is parameterizable. This allows to choose wodrlength. The FFT has been optimized in order to reduce the area and the power consumption. Better results than previous approaches have been obtained. Radix-8 and 8 parallel samples reduce the number of hardware elements (20 complex multipliers are used). Simple control is needed for this architecture.

6.2

Future work

Future work on this topic can be done to improve the results specically: The high level model can be improved by using the ASIC toolbox. For that case the model will be more realistic for the hardware point of view. The channel model can be more realistic using small scale fading. The ASIC can be fabricated to measure the performance on hardware. Constant multiplications in the radix 8 buttery can be simplied. 51

52

Conclusion and Future Work A recongurable FFT that supports all the modes of this standard can be implemented. Other blocks such as forward error correction and modulation can be implemented on ASIC.

Bibliography
[1] I. 802.15.3c, Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specications for High Rate Wireless Personal Area Networks (WPANs). 2009. [2] S.-K. S. Yong, P. Xia, and A. Valdes-Garcia, 60 GHz Technology For Gbps WLAN and WPAN. Wiley-IEEE Press, 2010. [3] L. R. Rabiner and B. Gold, Discrete-time signal processing. Prentice Hall, 1975. [4] A. Oppenheim and R. Schafer, Theory and application of digital signal processing. Prentice Hall, 1989. [5] W. W. Smith and J. M. Smith, Handbook of Real-Time Fast Fourier Transforms. Wiley-IEEE Press, 1995. [6] W. Cochran, J. Cooley, D. Favin, H. Helms, R. Kaenel, W. Lang, J. Maling, G.C., D. Nelson, C. Rader, and P. Welch, What is the fast Fourier transform?, Proceedings of the IEEE, vol. 55, pp. 16641674, Oct. 1967. [7] M. Garrido, Ecient hardware architectures for the computation of the FFT and other related signal processing algorithms in real time. PhD thesis, Universidad Politcnica de Madrid, 2009. [8] E. Swartzlander, W. Young, and S. Joseph, A radix 4 delay commutator for fast Fourier transform processor implementation, IEEE Journal of SolidState Circuits, vol. 19, pp. 702709, Oct 1984. [9] E. Wold and A. Despain, Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementations, IEEE Transactions on Computers, vol. C-33, pp. 414426, May 1984. [10] A. Despain, Fourier Transform Computers Using CORDIC Iterations, IEEE Transactions on Computers, vol. C-23, pp. 993 1001, Oct. 1974. [11] S. He and M. Torkelson, Design and implementation of a 1024-point pipeline FFT processor, in Custom Integrated Circuits Conference, 1998. Proceedings of the IEEE 1998, pp. 131 134, May 1998. 53

54

Bibliography

[12] M. Snchez, M. Garrido, M. Lpez Vallejo, J. Grajal, and C. Lopez-Barrio, Digital channelised receivers on FPGAs platforms, in Radar Conference, 2005 IEEE International, pp. 816 821, may 2005. [13] S. He and M. Torkelson, Designing pipeline FFT processor for OFDM (de)modulation, in 1998 URSI International Symposium on Signals, Systems, and Electronics, Sep 1998. [14] Y. Chang and K. Parhi, An ecient pipelined FFT architecture, in IEEE Transaction on Circuit and Systems-II: Analog and Digital Signal Processing, June 2003. [15] S. Johansson, S. He, and P. Nilsson, Wordlength optimization of a pipelined FFT processor, in 42nd Midwest Symposium, Circuits and Systems, vol. 1, Aug. [16] L. Johnson, Conict free memory addressing for dedicated FFT hardware, in IEEE Transactions Circuits and Systems II: Analog and Digital Signal Processing, May 1992. [17] Y. Ma, An eective memory addressing scheme for FFT processors, in IEEE Transactions on Signal Processing, March 1999. [18] C. Wang and C. Chang, A new memory-based FFT processor for VDSL transceivers, in IEEE International Symposium on Circuits and Systems, May 2001. [19] A. Batra, J. Balakrishnan, G. R. Aiello, J. R. Foerster, and A. Dabak, Design of a multiband OFDM system for realistic UWB channel environment, in IEEE Transactions on Microwave Theory and Techniques, Sept 2004. [20] J. Lee, H. Lee, S.-I. Cho, and S.-S. Choi, A high-speed, low-complexity radix-24 FFT processor for MB-OFDM UWB systems, in IEEE International Symposium on Circuits and Systems, 2006, May 2006. [21] S.-M. Kim, J.-G. Chung, and K. Parhi, Low error xed-width CSD multiplier with ecient sign extension, in IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Dec 2003. [22] M. Snchez, M. Garrido, M. Lpez, and J. Grajal, Implementing FFT-based digital channelized receivers on FPGA platforms, IEEE Transactions on Aerospace and Electronic Systems, vol. 44, pp. 15671585, Oct 2008. [23] J. Johnston, Parallel pipeline fast Fourier transformer, in IEE Proc. F Comm. Radar Signal Process., vol. 130, pp. 564572, Oct 1983. [24] S.-J. Huang and S.-G. Chen, A Green FFT Processor with 2.5-GS/s for IEEE 802.15.3c (WPANs), in International Conference on Green Circuits and Systems (ICGCS), pp. 9 13, June 2010.

Bibliography

55

[25] Y. Chen, Y.-W. Lin, Y.-C. Tsao, and C.-Y. Lee, A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems, IEEE Journal of Solid-State Circuits, vol. 43, pp. 12601273, May 2008. [26] B. Baas, A low-power, high-performance, 1024-point FFT processor, IEEE Journal of Solid-State Circuits, vol. 34, pp. 380387, Mar 1999. [27] Y. Chen, Y.-C. Tsao, Y.-W. Lin, C.-H. Lin, and C.-Y. Lee, An IndexedScaling Pipelined FFT Processor for OFDM-Based WPAN Applications, IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 55, pp. 146 150, Feb 2008. [28] S.-N. Tang, J.-W. Tsai, and T.-Y. Chang, A 2.4-GS/s FFT Processor for OFDM-Based WPAN Applications, IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 57, pp. 451 455, June 2010.

You might also like