You are on page 1of 4

A low power ROM-less direct digital frequency synthesizer with preset value pipelined accumulator

Jun CHEN, Rong LUO, Huazhong YANG and Hui WANG Department of Electronic Engineering, Tsinghua University, Beijing, 100084 China

Abstract*
In this paper, a low-power ROM-less direct digital frequency synthesizer (DDFS) is presented. A preset value pipelined accumulator (PVPA) is proposed achieving update rates in excess of 500MHz by careful choice of the 12-7-7-6 4-stage pipelined architecture. Power dissipation is reduced by moving redundant registers and no phase latency is introduced when switching frequency. The phase to sine amplitude converter is entirely made up of combinational logic without ROM, and modified Sunderland approximation and power-gating technique are used to reduce its area and power, respectively. Moreover, a 2MSB truncated phase is introduced to one-quadrant phase to sine amplitude converter to improve the spurious free dynamic rang (SFDR) by 10dB. The design was implemented using a 0.18 m CMOS technology. It occupies a core area of 0.04mm2 and dissipates 17.2mW at 1.8 V supply voltage and 500 MHz clock. Index Terms - Direct digital frequency synthesizer (DDFS), preset value phase accumulator (PVPA), ROM-less look up table, low power

2. Background
2.1 Direct digital frequency synthesizer
The digital frequency synthesizer was originally presented by Tierney, Rader, and Gold in 1971 [2]. In general, a DDFS consists of a phase accumulator, a phase to sine-amplitude converter (PSAC), a digital-to-analog converter (DAC), and a low-pass filter (LPF), as shown in Fig. 1. The synthesizer has two inputs: a clock reference fclk and a frequency control word FCW . The phase accumulator integrates the value of the FCW on every clock cycle producing a ramp whose slope is directly proportional to FCW . This gives the frequency of the output sine-wave as
M fout = fclk * FCW / 2

(1)

where M is the width of phase accumulator. An approximation to the sinusoid amplitude of the equivalent angles is produced by the PSAC.
FCW Phase accumulator N Phase amplitude converter DAC/ LPF

1. Introduction
M

Direct digital frequency synthesizers (DDFS) generate sine (cosine) output with the advantage of fast settling time, sub-Hertz frequency resolution, large bandwidth, continuous-phase switching response, and low phase noise [1]. These advantages have made the technology popular in spread-spectrum communication, radar systems, test instrumentation, and electronic warfare. In this paper, a new low power ROM-less DDFS using a preset value pipelined accumulator (PVPA) is proposed. The PVPA reduces the power consumption and increases the throughput of the accumulator without introducing phase latency. Also, power gating is used in the ROM-less phase-to-amplitude converter to reduce the dynamic power. In section 2, a conventional DDFS and pipelined accumulator is introduced. In section 3, we present the PVPA and DDFS design. In section 4 , simulation results are given. The conclusion is given in section 5.

CLK

Figure 1. Conventional DDFS architecture In order to improve the frequency resolution, a wide phase accumulator is usually used and the output of phase accumulator is truncated. The PSAC is usually implemented as a ROM lookup table (LUT). The LUT is a power hungry circuit and the power consumption is dominated by the ROM. required in the phase-to-amplitude conversion. Also a large ROM LUT is the bottleneck of the highest clock frequency of DDFS. Therefore, many ROM-less architectures [8]-[10] and ROM compression algorithms [5]-[7] have been proposed to lower power consumption and to improve clock frequency.

2.2 Architecture of pipelined accumulator


A wide phase accumulator is often used in DDFS for the fine frequency resolution at high clock frequency, and the wide accumulator cannot finish one addition in a short single clock period because of the delay caused by the carry

This project was sponsored in part by NSFC under grant

#90207001
Proceedings of the 19th International Conference on VLSI Design (VLSID06)

1063-9667/06 $20.00 2006 IEEE

bits propagating through the adder. A conventional solution is to pipeline the phase accumulator as m stages of L bits each, such that, m L = M , as shown in Fig. 2. Each adder generates L+1 bits output: L sum bits and one carry output bit. The carry output is latched between successive adders. Every new frequency input word is moved into the pipeline circuits consisting of D-flip-flops (DFF) and delay elements. The speed of the accumulator based on this architecture can be increased up to m times.
L bits
L DFF L DFF

cini = f1(i 0,si ,cini +1) cini = f2 (i 0,si ,cini +1) =i 0 +i si + cini i

(2)

From the above, we will get the value of the i , cini from (3), (4) to initialize the pipelined accumulator when frequency switching.
out =0 + cin1 1=10 +s1+ cin2 2 =20 + 2*s 2 + cin3 = +3* 30 s3 3 cin1= f1( 10 ,s1, cin2 ) cin2 = f1( 20 ,s 2 , cin3 ) cin3 = f1( 30 ,s 3 )

... ...
L DFF

L DFF

L DFF

+
DFF

L DFF

L+k b
k DFF

(3)

L bits
L DFF

L DFF

L DFF

L DFF

(4)

m stages
DFF

L bits

L DFF

L DFF

+
DFF

L DFF

L bits

L DFF

L DFF

Figure 2. A conventional pipelined accumulator However, the pipeline circuit requires considerable area and power and introduces more frequency switching latency. At the same time, increasing the number of pipelined blocks would increase the loading of the clock network.

As it is difficulty for the preset logic to produce the exact values of i and cini in a short clock cycle, and from Fig.3 we can easily learn that 10 , s 2 , and 30 have little impact on truncated phase output, we simply set, 10 s 2 , and 30 all to zero. Then a constant error is introduced which is smaller than one LSB of the truncated phase output. At the same time we find that cin3 equals the MSB of s 3 and cin2 equals zero. For the constant error has no influence on the output frequency, we can approximate i and cini as follows,
cin1= f1( s1) cin 2 = f1( s 2 ,s 3[5]) cin3 = f1( s 3 )

3. Proposed DDFS Design


3.1 Proposed preset value pipelined accumulator
Considering the pipeline circuits used in the pipelined accumulator, we can find that the value stored in the same row DFF is equal when no frequency switching and the last column of DFF is enough for the pipeline circuits. To reduce the PSAC complexity, the output of phase accumulator is usually truncated and only some MSBs are used as the input to the PSAC. An accumulator can be split into two parts: MSBs phase accumulator and LSBs carry generator, as shown in Fig. 3. We assume that the values of the sum and carry have been initialized when frequency switching so that only the last column of DFFs is needed to store the input of frequency control word FCW . If we make the MSBs accumulator as one stage of the pipelined accumulator, no clock latency is introduced in the truncated phase output. Based on this idea, we propose a new 12-7-7-6 four-stage-pipelined accumulator, shown in Fig. 4. At the time of frequency switching, lets suppose FCW is changed from i s , then cini , cini and i can be found from i 0 , si and cini +1 , as shown in (2):

(5)

out =0 1=s1 (6) 2 =2*s 2 +s 3[5] =3* =2* + s3 s3 s3 3 A 32 bits PPVA is implemented as shown in Fig.4. The PPVA is made up of a 12 bits MSB accumulator and 3 preset value accumulators. The 3 preset value accumulators operate as LSBs carry generator as shown in Fig.3. In Fig.4, the detailed structure of the preset value accumulator is given in preset value accumulator 3 of figure 4. The preset logic block in the preset value accumulator is used to realize the above formulas (5) and (6). When the FCW is changed, the sum and carry DFFs are initialized by preset logic blocks except for stage one controlled by Fce. To lower the complexity of the preset logic block, a 32-bit pipelined accumulator is divided into four stages: 12, 7, 7 and 6 bits. The speed of the proposed PPVA is limited by the first stage and a carry select adder is used to achieve high speed. In our design, the length of the first stage is equal to the length of the truncated output phase to reduce the phase output latency. The maximum width of PPVA is 4 times the width of the first stage.

Proceedings of the 19th International Conference on VLSI Design (VLSID06)

1063-9667/06 $20.00 2006 IEEE

MSBs

MSBs
accumulator DFF LSB carry generator Truncated phase output

LSBs

Figure 3. Two block phase accumulator architecture


Fce 12 bit Accumulator 32b FCW 7b 12b Phase output 12b

We propose a new ROM-less PSAC based on equation (8) as shown in Fig. 6. The most significant two phase bits are used to decode the quadrant, while the remaining 10 bits are used for the one-quadrant phase to sine amplitude converter. The remaining 10 phase bits are divided into three bit slices: A, B, and C, having 3, 4, and 3 bits, respectively. Now equation (8) can be rewritten as follows: Ampc = sin(a + b ) = si (a ) ci (b ) i =(1,2, 8) (9)
Amp f = cos a sin c si (a )fi (c ) i =(1,2, 8)

(10)

Preset value accumulator 1

7b

Preset value accumulator 2

6b Input DFF
2MSB 1's complementor

MUX

A is used to generate signals si (a ) to control two blocks: Input Latch and MUX. The block of Input Latch is used as power gating block to hold or pass the data of B, C, and 2MSB to one of eight phase to amplitude generator logic blocks. In every clock cycle, B, C, and 2MSB are used as inputs of one phase to amplitude converter logic block only, and the inputs of other phase to amplitude converter logic blocks are unchanged and no dynamic power is consumed. The MUX block is used to select coarse amplitude and fine amplitude as the inputs of a 9-bit adder.
1Msb

Carry out DFF

MUX

Preset value accumulator 3 LSB carry generator


10b 12b

Figure 4. Preset value pipelined accumulator The phase error introduced by preset logic can be reduced as cin1= f1( s1,m ) (7) 1=s1+m where m in (7) can be some MSBs of 10 or equals 10 .

3.2 ROM-less phase to amplitude converter


Sunderland et al. [3] split the phase word into three bit slices, A, B, and C, shown in Fig. 5. When C is small enough, an approximation for the sine of the sum of three angles is made as (8).
sin( )=sin( A+ B +C )sin( A+ B ) + cos A sin C

Preset logic

Sum DFFs

2Msb

4b 4b 4b Input Latch 4b
Phase to amplitude converter logic 1

9b 3b 9b
Phase to amplitude converter logic 2

1's complementor

1's complementor

B&C (7b)

3b

9b MUX

9bit adder

9b

...
4b 4b 8b
Phase to amplitude converter logic 8

9b 3b 8b 3b

A(3b)
Control signal gernerator

(8)

Figure 6. Detailed block diagram of proposed Phase to sine amplitude converter Nicholas et al. [4] proposed that minimizing the mean-square error provides the lowest total spur energy and minimizing the maximum absolute error tends to reduce the value of the greatest spurs. To improve the SFDR performance, 2MSB is introduced to the one-quadrant phase to sine amplitude converter to reduce the error resulting from 1s complement approximation, which is used to determine whether the sine amplitude is increasing or decreasing. Equation (10) is rewritten as (11).
Amp f = cos a sin c si (a )fi (c,2MSB ),i =(1,2,...8)

(11)

1MSB A Coarse ROM Sin(A+B) B 1's complementor

A+B+C

A Fine ROM cosAsinC C

A phase to amplitude converter logic is made up of a coarse logic block and a fine logic block. 2MSB and C together generate fine amplitude., fi (c, 2MSB ) in (11). The coarse logic block is used to realize ci (b ) in (9). In both coarse logic block and fine logic block, only logic circuit is used.

Figure 5. Sunderland architecture


Proceedings of the 19th International Conference on VLSI Design (VLSID06)

1063-9667/06 $20.00 2006 IEEE

B(4bit)

Coarse logic
2MSB

Coarse amplitude (9bit) Fine mplitude (3bit)

Fine logic
C(3bit)

Figure 7. Phase to amplitude converter logic To improve the highest clock frequency, the PSAC block is in 2-stage pipelined topology.

previous section was designed in Verilog HDL and synthesized using a SMIC 0.18- m CMOS library. Post-synthesis simulation results, using worst-case library parameters, indicate that the design could be operated at a clock frequency of 500MHz. The core cells occupy 0.04 mm2 and the total power dissipation is approximately 17.2mW. Table 1 shows the performance comparisons with recent publications. It shows that the design based on the proposed architecture has the lowest power and achieve high speed with only 2 pipeline level.

4. Simulation and Comparison


Two DDFS architectures were simulated in Matlab. Both have a 32-bit-width accumulator with 12-bit truncated phase output and 10-bit output to the DAC. The architecture introducing 2MSB for the one-quadrant phase to sine amplitude converter results in 10dB improvement in SFDR, as shown in Fig. 8.
80 60 40 20 0 -20 0 50 100 150 200 250

5. Conclusion
A new ROM-less low power DDFS has been proposed. A preset value method is used to improve the operating speed of 32-bit 4 pipelined phase accumulator up to 500MHz, and no clock latency is introduced. It uses a new mapping technique of sine function by a ROM-less lookup table resulting in a small area. Power gating method is used to lower the power consumption of the PSAC. Simulation results show that the average power is 17.2mW at 500MHz with a 70dBc SFDR in SMIC 0.18- m CMOS technology. It shows that the proposed DDFS is suitable as IP cores in low-power applications.

80 60 40 20 0 -20 0 50 100 150 200 250

Reference
[1] J. Vankka and K. Halonen, Direct Digital Synthesizers: Theory, Design and Applications. Boston, MA: Kluwer, 2001 [2] J. Tierney, C. Rader, and B. Gold, A Digital Frequency Synthesizer, IEEE Trans. Audio Electroacoust, vol. AU-19, Mar. 1971 pp. 4857 [3] D. Sunderland, R. Strauch, S. Wharfield, H. Peterson, and C. Cole, CMOS/SOS frequency synthesizer LSI circuit for spread spectrum communications, IEEE J. Solid-State Circuits, vol. SC-19, Aug. 1984, pp. 497505 [4] H. T. Nicholas, H. Samueli, and B.Kim,"The Optimization of Direct Digital Frequency Synthesizer in the Presence of Finite Word Length Effects Performance," in Proc. 42nd Annu. Frequency Contr. Symp., June 1988, pp. 357-363 [5] A. Bellaouar, M. S. Obrecht, A. M. Fahim, and M. I. Elmasry, Low power direct digital frequency synthesis for wireless communications, IEEE J. Solid-State Circuits, vol. 35, Mar. 2000, pp. 385390 [6] J. M. P. Langlois, and D. Al-Khalili, A low power direc digital frequency synthesizers in 0.18- m CMOS, IEEE CICC03 pp283-286 [7] D. De Caro, E. Napoli, and A. G. M. Strollo High speed Direct Digital Frequency Synthesizers in 0.25- m CMOS IEEE CICC04 pp163-166 [8] Li Jincheng, Yang Huazhong A New Architecture of Twiddle Factor Generator for Radix-2 1024-Point FFT Chinese Journal of semiconductors vol. 25, Apr. 2004, pp.377-382 [9] Jian Dong Jiang, and Edward K. F. Lee, A low-power segmented nonlinear DAC-based direct digital frequency synthesizer , IEEE J. Solid-State Circuits, vol. 37, Oct. 2002, pp1326-1330 [10] K.I.Palomaki and J.Niittylahti A low-power, meneoryless direct digital frequency synthesizer architecture ISCAS03 ppII77-II80

(a)

(b)

Figure. 8 (a) Output spectrum based on Sunderland approximation.(b) Output spectrum based on modified Sunderland approximation.
Table 1 Performance comparisons

Ours
FCW Truncated phase Amplitude Output SFDR (dBc) Area (mm2) Max.clock (MHz) Power Dissipation (W/MHz) Process (um) Supply voltage(V)

[7]
24bit 14bit 12bit 80 0.075 480 72 0.25 2.5 6 Quadrat. output

[6]
32bit 13bit 12bit 84 0.090 150 500 0.18 1.8 4 Quadrat. output

[10]
11 11 11 58 0.150 30 290 0.35 3.3 1 Quadrat. output

32bit 12bit 10bit 70 0.040 500 34.4 0.18 1.8 2 Single output

Pipeline levels(cycle) Note

A DDFS design based on the proposed architecture in the

Proceedings of the 19th International Conference on VLSI Design (VLSID06)

1063-9667/06 $20.00 2006 IEEE

You might also like