You are on page 1of 17

信号与系统

课程设计报告

设计题目:快速傅立叶变换(FFT)的计算机实现

完 成 人:王 璠

班号:电气 0612 班

姓名:王 璠

学号:012006019801

二 OO 八年 十一 月
Fast Fourier Transform Computation Using a C8051 MCU

Fan Wang
(Huazhong University of Science & Technology, Wuhan, 430074)

Abstract: A method of calculating FFT using a C8051 MCU is introduced in this


paper. The software is aimed at computational efficiency and accuracy. By three
key techniques: (1)avoidance of multiplication whenever possible; (2)16-bit integer
storage; (3)on-chip PLL, a real-time audio spectrum is achieved. The results are
displayed on a LCD screen and carefully analyzed. Basic theories of FFT and
common problems in practice are discussed in detail.

Ⅰ INTRODUCTION

The Fast Fourier Transform(FFT) is an efficient algorithm for digital N-point DFT
computation. The algorithm was first used by Gauss in 1805, and rediscovered in the
1960s by Cooley and Tukey. The most widely used method of Cooley-Tukey
Algorithm is called Radix-2 Algorithm, which is applied in this paper.
FFT is of great importance in current digital processing systems, and there are
specially designed processors for this kind of use, for example, DSPs. Unfortunately,
the processor introduced in this system---C8051 MCU---is mostly used in simple
control systems and is not specially designed for multiplications and additions.
Therefore, the purpose of this paper is not a practical industrial solution, but a deep
understanding of the basic theories of signal processing systems and the way of
putting theories into practice.

Ⅱ BACKGROUNDS

A. Discrete Fourier Transform and Fast Fourier Transform

For a length N complex sequence x(n), n=0,1,2,…N-1, the discrete Fourier


transform(DFT) is defined by
N −1
X (k ) = ∑ x(n)WNnk , k = 0,1,2,...N − 1 where WN = e − j 2π / N .
n =0

According to this equation, an N point DFT calculation requires N 2 times of


multiplication and N(N-1) times of addition. As N increases, the computational

-1-
complexity becomes larger.
Fast Fourier Transform is a fast structure to compute the DFT. Note that WNnk has

two important properties: periodicity and symmetry, which are the fundamentals of
FFT algorithm. Using the symbol ((nk )) N to represent the remainder from a division

of nk by N, it is easy to prove that W nk = W (( nk )) N . For example, when N=4, we have

W 6 = W 2 ,W 9 = W 1 . Another property can be explained by the equation

W ( nk + N / 2) = −W nk . Again taking N=4 as an example, we have W 3 = −W 1 , W 2 = −W 0 .


Hence, the number of multiplications required can be reduced to some extent.

When N is a power of 2 , say N = 2 M ,where M is a positive integer, the DFT


summation can be split into sums over the odd and even indexes of the input signal:
N −1
X (k ) = ∑ x(n)WNnk
n =0

= ∑ x(n)W
n ~ even
nk
N + ∑ x(n)W
n ~ odd
nk
N

N −1 N −1
2 2
= ∑ x(2n)W
n =0
kn
N /2 +W k
N ∑ x(2n + 1)W
n =0
nk
N /2

= G (k ) + WNk H (k )
N −1 N −1
2 2
where G (k ) = ∑ x(2n)W
n =0
kn
N /2 , H (k ) = ∑ x(2n + 1)W
n =0
nk
N /2 . Now that the previous N-

point DFT becomes two N/2-point DFTs, the complexity is reduced. According to the
N
symmetry property, WNk + N / 2 = −WNk , so we have X (k + ) = G (k ) − WNk H (k ) . In
2
conclusion,
X (k ) = G (k ) + WNk H (k ) k = 0,1,2,...N / 2 − 1

X (k + N / 2) = G (k ) − WNk H (k ) k = 0,1,2,...N / 2 − 1

Suppose that N = 4 , we have the FFT process as the following:


X (0) = G (0) + W40 H (0)
X (1) = G (1) + W41 H (1)
X (2) = G (0) − W40 H (0)
X (3) = G (1) − W41 H (1)

-2-
In the decimation in time FFT, the computation flow graph of the time domain
sequence can be briefly illustrated by a butterfly diagram. A diagram of 8-point
sequences is shown in Figure 1.
The diagram immediately suggests that three loops should be included in the software:
the stage loop, the group loop, and the butterfly loop. As the diagram shows, there are
v = log 2 N stages, each stage m, has 2 v −m groups and each group has 2 m−1 butterflies.

Fig.1. Butterfly diagram of a 8-point FFT algorithm.

B. FFT Bit-Reversal Algorithm

Since the transformed complex numbers are not in normal order, more storage space
is needed by the intermediate number, in order to obtain the right results. To minimize
the storage space required for intermediate data, either the input data is required to be
scrambled or the output data needs to be unscrambled. The butterfly diagram
illustrates that if the input data is in bit-reversed order, the results will be in the
normal order; if the input data is in normal order, the output data can be unscrambled
to normal by a process of bit-reversal.

-3-
C. Anti-aliasing Filtering and Windowing

If the input signal has an high frequency component which is higher than half of the
sampling frequency, it may be mis-interpreted to have lower frequencies which are
not exist. This effect is known as frequency aliasing. To avoid this, the input signal
should be low-pass filtered.
When dealing with practical sytems, the sampling signal is time limited. In time
domain, the signal is multiplied by a square wave function (a rectangular window) of
the time. But the software makes the assumption that the data is periodic and repeated
infinitely. Taking DFT transform of a sampled sinusoidal signal, we obtain a spectrum
which has the shape of a sinc function. In this way, the energy of the sinusoidal signal
leaks to a higher spectrum. In practice, the main lobe of the sinc function is desired to
be narrow and high enough, while the side lobe level should be low. Several sampling
windows has been invented to make a practical trade-off of the two. Triangular,
Hanning, Hamming and Blackman windows are all common windows in practical,
which will be discussed in advance in Section Ⅲ.

D. Function Overview of the C8051 MCU

Fig.2. C8051F120 architecture overview.


The processor of the computational system is a high performance C8051 MCU by
Silicon Laboratories™—C8051F120. The chip is specially designed for high speed
mixed signal processing, meeting the requirements of low level customers. It’s rich
analog on-chip sources make it a ideal tool for many mixed signal systems. For
example, a 12-bit, 8-channel, 100kHz sample rate ADC, two 12-bit DACs, two analog
comparators, a voltage reference, etc. Its pipelined instruction architecture 8051 core

-4-
enables it to execute most of instruction set in only 1 or 2 system clocks. Under the
help of the on-chip PLL, the peak speed of the chip can reach as high as 100MIPS. At
the same time, a 2-cycle 16×16 MAC engine accelerates the computation efficiently.

Ⅲ SOFTWARE DESIGN

The computation flow of N point FFT is in the following five steps:


1. Data collection.
2. Windowing the input data.
3. Changing the data into bit-reversed order.
4. FFT computation.
5. Magnitude modification and LCD display.
There are five important functions correspond to the five steps. The explanations are
as follows:

A. ADC0_ISR( )

The 12-bit on-chip ADC0 collects data at a designed a sample rate. When a Timer3
overflow happens, a data-collecting conversation is initiated. The data is then stored
in the ADC0H :ADC0L registers in a left-justified form, that is, the higher 12 bits are
the data bits, while the last 4 bits are not used. So the range of the input data is from
0x0000 to 0xFFF0. The overflow interval of Timer3 is determined by a configurable
macro variable SAMPLE_RATE, so that the sample rate of the system is determined.
After the N-point data samples have been collected, ADC0 interrupt is disabled. Then
comes the next function.

B. WindowCalc( )

Four types of windows are available in this software. Each of them has its own
coefficients, which will be multiplied with the time domain data. The coefficient
equations are listed below.
Type Equation
0: None(Rectangular) W(n)=1
1: Triangular W(n)=n/(N/2) (0≤n≤N/2)
W(n)=2-n/(N/2) (N/2<n<N)
2: Hanning W(n)=0.5-0.5cos(2πn/N)
3: Hamming W(n)=0.54-0.46cos(2πn/N)
4: Blackman W(n) = 0.42 - 0.5cos(2πn / N)
+0.08cos(4πn / N)

-5-
The window coefficients are pre-calculated and stored in the header file
FFT_Code_Tables.h. The type define variable WINDOW_TYPE can be selected from
0 to 4.
The function also converts the single-ended data collected by ADC0 into differential,
in order to remove the DC component of the input signal. The process is carried out
by XOR the input data with 0x8000, which has the same effect with subtracting
32768 from the input data, so that the data ranges from 32752 to -32768.

C. Bit_Reverse( )

From the butterfly diagram, we know that either the input data or the output FFT array
requires a bit reverse operation. In this software, the input data is bit-reversed. Take a
8-point FFT as an example:
Previous Data Previous Index New Index Desired Data
x(0) 000 000 x(0)
x(1) 001 100 x(4)
x(2) 010 010 x(2)
x(3) 011 110 x(6)
x(4) 100 001 x(1)
x(5) 101 101 x(5)
x(6) 110 011 x(3)
x(7) 111 111 x(7)
To save data storage space, only half of the bit-reverse array is stored, the other half is
useless for the data will be exchanged by the first half of operation. For example, the
8-point FFT listed above may use the following bit-reverse array:
BRTable[ ]={0,2,1,3};
And the sentence NewIndex=BRTable[PreviousIndex]*2 will help us location the nth
reversed index.

D. Int_FFT( )

In this algorithm, complex numbers are divided into real parts and imaginary parts.
The modified input data has only real parts and is stored in the array Real[]. The later
stages will produce imaginary parts as well, which will be stored in Imag[]. Again
check the butterfly equation, when WN is divided into real part and imaginary part,

the single set of butterfly equation should be:

-6-
Re1 = Re1 + (cos(x) × Re2 + sin(x) × Im2)
Re2 = Re1 - (cos(x) × Re2 + sin(x) × Im2)
Im1 = Im1 + (cos(x) × Im2 - sin(x) × Re2)
Im2 = Im1 - (cos(x) × Im2 - sin(x) × Re2)

where Re1 = ReArray[indexA], Re2 = ReArray[indexB]


Im1 = ImArray[indexA], Im2 = ImArray[indexB]
x = the angle: 2×pi×(sin_index/NUM_FFT), in radians.
According to Figure.1, the flow chart of the FFT is drawn as the following:

Fig.3. Software flow pattern of Int_FFT().

-7-
To improve the speed of the FFT computation, integer numbers are used instead of
float numbers. All of them are set to be 16-bit int. Multiplications are avoided
whenever possible, and all divisions are implemented using right shifts. But this
method may cause some asymmetrical effects when the number is negative and the
deleted bits are non-zero. For example, -3/2=-1, but -3>>1=-2. A one should be added
to the result. In general, if m is a signed int or long int, to obtain m / 2 k , the following

sentences will be helpful:


if((m<0)&&(the lower k bits of m is non-zero))
result=m>>k+1;
else result=m>>k;
To minimize the calculation error and minimize the data storage requirement at the
same time, the data is multiplied in a 32-bit form and stored in a 16-bit form. A union
type of variable is defined to meet the needs.
typedef union IBALONG // Integer or Byte-addressable LONG
{
long l; // long: var.l
unsigned int i[2]; // u int: var.i[0]:var.i[1]
unsigned char b[4]; // u char: var.b[0]:var.b[1]:var.b[2]:var.b[3]
} IBALONG;
The computation is carried out by the following sentences:
TempL.l = (long)TempImA - TempReB; //Convert to long.
if ((TempL.l < 0)&&(0x01 & TempL.b[3])) //Divided by 2.
TempImA = (TempL.l >> 1) + 1;
else
TempImA = TempL.l >> 1;
Another example is:
TempL.i[1] = 0;
TempL.i[0] = TempReA;
TempL.l = TempL.l >> 1;
ReTwid.l += TempL.l;
if ((ReTwid.l < 0)&&(ReTwid.i[1])) //Divided by 65536
TempReA2 = ReTwid.i[0] + 1;
else
TempReA2 = ReTwid.i[0];
Note that TempL and ReTwid in the above two examples both have the type
IBALONG.
Why does this software have to calculate these divisions? There are two reasons:
SinTable[], in which the needed sinusoidal values are stored, has the type of int; the
FFT algorithm has an inherent processing gain, which will result in the overflow of
data if not handled. Therefore, when the SinTable[] is used, it has to be divided by
65536, and at the end of each butterfly equation, the intermediate variable has to be
divided by 2. More details are shown in the appendix.

-8-
E. Spectrum_Display( )

The FFT result is in complex form, of which the real part is stored in Real[], and the
imaginary part is stored in Imag[]. Based on the two arrays, the magnitude spectrum
and phase spectrum can be easily obtained. This software only calculates power
spectrum of the input signal. The spectrum is displayed on a 128×64 point LCD
screen.

Ⅳ EXPERIMENT RESULTS AND DISCUSSION

An EE1643C Function Signal Generator serves as the system’s testing signal source.
The signal is first filtered by an anti-aliasing filter with 32kHz cut-off frequency (1
order passive low-pass), and then sampled by ADC0 at a certain sampling rate. Four
configurable macro variables determine the system’s performance. The chart shows
what they are.
Name of variable Function
NUM_FFT Number of points of FFT algorithm
SAMPLE_RATE Sampling rate of ADC0 in Hz
WINDOW_TYPE Type of windows
RUN_ONCE Program runs once when the value is 0,many times
when the value is non-zero.
Further discussions are started by changing the values of the above 4 variables or the
wave form of the function generator.

A. Function Change

(i) Magnitude and Frequency Change of a Sinusoidal Signal


The power spectrum displays the energy distribution on every frequency. When the
magnitude of a sinusoidal signal changes, the height of the “energy bars” will
change.

Fig.TR.1. A 2.5kHz 1.3V sine. Fig.TR.2 A 2.5kHz 2.0V sine.

-9-
When the frequency of the sinusoidal changes, the “energy bar” which has the same
height will move to another place.

Fig.TR.3 A 2.5kHz 2.0V sine. Fig.TR.4 A 5kHz 2.0V sine.


(ii) Wave Form Change
Sinusoidal signals have only one frequency component, just as the above four pictures
illustrate. But triangular and rectangular have abundant frequency components.

Fig.TR.5. Triangular signal. Fig.TR.6. Rectangular signal.

B. Software Configuration Change

(i) NUM_FFT Change


More points of FFT transform will result in a more accurate spectrum with lower
distortion and energy leakage.

Fig.TR.7. A 256-point square wave Fig.TR.8. A 128-point square wave


(ii) WINDOW_TYPE Change
As mentioned in the background section, window type has some effects on the
spectrum leakage. The leaked energy becomes smaller after the process of windowing
the input data. Only Blackman window is checked in this experiment, for other
windows have similar effects which are hardly distinguished from each other.

- 10 -
Fig.TR.9. Rectangular without windowing. Fig.TR.10.Rectangular with Blackman window.
(iii) SAMPLE_RATE Change
All the above examples are signal samples of a sampling rate of 40kHz. According to
the Nyquist sampling theory, frequencies lower than 20kHz can be easily located. So
the full range of the screen is set to be 20kHz. But when the sampling rate is divided
by 2, a range of 10kHz spectrum is obtained.

Fig.TR.11. 5k sine at 40kHz sampling rate. Fig.TR.12. 5k sine at 20kHz sampling rate.
(iv) RUN_ONCE Change
To check the speed of the system, RUN_ONCE is set to zero. For all audio signals at a
sampling rate of 40kHz, the LCD screen never blinks and a steady frequency
spectrum of the input time domain signals can be read without any obstacles. This
proves the real-time property of the system.

C. Property of Symmetry

The real part of FFT output exhibits even symmetry and imaginary part of FFT output
exhibits odd symmetry. This results in the symmetry of frequency spectrum. That is,
for N-point FFT products, point N/2+1 is the same as point N/2-1, point N/2+2 is the
same as point N/2-2, etc. To illustrate this property, set the full range of the screen to
40kHz. Here are the results.

Fig.TR.13. 2kHz square wave at full range. Fig.TR.14. 13kHz square wave at full range.

- 11 -
Ⅴ CONCLUSION

In this paper, a method of Fast Fourier Transform computation based on C8051 MCU
is introduced. Computational efficiency and accuracy are improved by the use of three
techniques: (1) avoidance of multiplication whenever possible; (2) 16-bit integer
storage; (3) on-chip PLL. Basic concepts of DFT and FFT are introduced in detail.
Practical problems such as bit-reversal, frequency aliasing and spectrum leakage are
carefully discussed and well solved by software or hardware solutions. Some typical
wave forms’ spectrums and properties of the FFT spectrums are checked by a LCD
screen.

Ⅵ REFERENCES

[1] Edward W. Kanmen and Bonnie S. Heck, Fundamentals of Signals and Systems:
Using the Web and MATLAB, 2nd ed. Pearson Education, 2002.
[2] Julius O. Smith Ⅲ, Mathematics of the Discrete Fourier Transform (DFT) with
Audio Applications, 2nd ed. http://ccrma.stanford.edu/~jos/mdft/
[3] Junli Zheng, Signals and systems, 2nd ed. Higher Education Press, 2000.
[4] AN142 “FFT Routines for the C8051F12X Family”, Rev.1.1. Inc. ©Silicon
Laboratories, 2003.
[5] “FFT Library-Module user’s Guide C28x Foundation Software”. Inc. ©Texas
Instruments, 2002.

- 12 -
APPENDIX

A. Important Codes
//-----------------------------------------------------------------------------
// Int_FFT
//-----------------------------------------------------------------------------
void Int_FFT(int ReArray[], int ImArray[])
{
#if (NUM_FFT >= 512)
unsigned int sin_index, g_cnt, s_cnt; // Keeps track of the proper index
unsigned int indexA, indexB; // locations for each calculation
#endif
#if (NUM_FFT <= 256)
unsigned char sin_index, g_cnt, s_cnt; // Keeps track of the proper index
unsigned char indexA, indexB; // locations for each calculation
#endif

unsigned int group = NUM_FFT/4, stage = 2;


long CosVal, SinVal;
long TempImA, TempImB, TempReA, TempReB, TempReA2, TempReB2;
IBALONG ReTwid, ImTwid, TempL;

// FIRST STAGE
indexA = 0;
for (g_cnt = 0; g_cnt < NUM_FFT/2; g_cnt++)
{
indexB = indexA + 1;
TempReA = ReArray[indexA];
TempReB = ReArray[indexB];

// Calculate new value for ReArray[indexA]


TempL.l = (long)TempReA + TempReB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempReA2 = (TempL.l >> 1) + 1;
else
TempReA2 = TempL.l >> 1;

// Calculate new value for ReArray[indexB]


TempL.l = (long)TempReA - TempReB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
ReArray[indexB] = (TempL.l >> 1) + 1;
else
ReArray[indexB] = TempL.l >> 1;

ReArray[indexA] = TempReA2;
ImArray[indexA] = 0; // set Imaginary locations to '0'
ImArray[indexB] = 0;
indexA = indexB + 1;
}// END OF FIRST STAGE

while (stage <= NUM_FFT/2)


{
indexA = 0;
sin_index = 0;
for (g_cnt = 0; g_cnt < group; g_cnt++)
{
for (s_cnt = 0; s_cnt < stage; s_cnt++)
{
indexB = indexA + stage;
TempReA = ReArray[indexA];
TempReB = ReArray[indexB];

- 13 -
TempImA = ImArray[indexA];
TempImB = ImArray[indexB];
// The following first checks for the special cases when the angle "x" is
// equal to either 0 or pi/2 radians. In these cases, unnecessary
// multiplications have been removed to improve the processing speed.
if (sin_index == 0) // corresponds to "x" = 0 radians
{
TempL.l = (long)TempReA + TempReB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempReA2 = (TempL.l >> 1) + 1;
else
TempReA2 = TempL.l >> 1;

// Calculate new value for ReArray[indexB]


TempL.l = (long)TempReA - TempReB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempReB2 = (TempL.l >> 1) + 1;
else
TempReB2 = TempL.l >> 1;

// Calculate new value for ImArray[indexB]


TempL.l = (long)TempImA - TempImB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempImB = (TempL.l >> 1) + 1;
else
TempImB = TempL.l >> 1;

// Calculate new value for ImArray[indexA]


TempL.l = (long)TempImA + TempImB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempImA = (TempL.l >> 1) + 1;
else
TempImA = TempL.l >> 1;
}
else if (sin_index == NUM_FFT/4) // corresponds to "x" = pi/2 radians
{
// Calculate new value for ReArray[indexB]
TempL.l = (long)TempReA - TempImB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempReB2 = (TempL.l >> 1) + 1;
else
TempReB2 = TempL.l >> 1;

// Calculate new value for ReArray[indexA]


TempL.l = (long)TempReA + TempImB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempReA2 = (TempL.l >> 1) + 1;
else
TempReA2 = TempL.l >> 1;

// Calculate new value for ImArray[indexB]


TempL.l = (long)TempImA + TempReB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempImB = (TempL.l >> 1) + 1;
else
TempImB = TempL.l >> 1;

// Calculate new value for ImArray[indexA]


TempL.l = (long)TempImA - TempReB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempImA = (TempL.l >> 1) + 1;
else
TempImA = TempL.l >> 1;
}
else
{
if (sin_index > NUM_FFT/4)

- 14 -
{
SinVal = SinTable[(NUM_FFT/2) - sin_index];
CosVal = -SinTable[sin_index - (NUM_FFT/4)];
}
else
{
SinVal = SinTable[sin_index];
CosVal = SinTable[(NUM_FFT/4) - sin_index];
}

// The SIN and COS values are used here to calculate part of the
// Butterfly equation
ReTwid.l = ((long)TempReB * CosVal) +
((long)TempImB * SinVal);
ImTwid.l = ((long)TempImB * CosVal) -
((long)TempReB * SinVal);

TempL.i[1] = 0;
TempL.i[0] = TempReA;
TempL.l = TempL.l >> 1;
ReTwid.l += TempL.l;
if ((ReTwid.l < 0)&&(ReTwid.i[1]))
TempReA2 = ReTwid.i[0] + 1;
else
TempReA2 = ReTwid.i[0];
// Calculate new value for ReArray[indexB]
TempL.l = TempL.l << 1;
TempL.l -= ReTwid.l;
if ((TempL.l < 0)&&(TempL.i[1]))
TempReB2 = TempL.i[0] + 1;
else
TempReB2 = TempL.i[0];
// Calculate new value for ImArray[indexA]
TempL.i[1] = 0;
TempL.i[0] = TempImA;
TempL.l = TempL.l >> 1;
ImTwid.l += TempL.l;
if ((ImTwid.l < 0)&&(ImTwid.i[1]))
TempImA = ImTwid.i[0] + 1;
else
TempImA = ImTwid.i[0];
// Calculate new value for ImArray[indexB]
TempL.l = TempL.l << 1;
TempL.l -= ImTwid.l;
if ((TempL.l < 0)&&(TempL.i[1]))
TempImB = TempL.i[0] + 1;
else
TempImB = TempL.i[0];
}

ReArray[indexA] = TempReA2;
ReArray[indexB] = TempReB2;
ImArray[indexA] = TempImA;
ImArray[indexB] = TempImB;
indexA++;
sin_index += group;
} // END of stage FOR loop (s_cnt)

indexA = indexB + 1;
sin_index = 0;
} // END of group FOR loop (g_cnt)

group /= 2;
stage *= 2;
} // END of While loop
} // END Int_FFT

- 15 -
B. Photo of the System

- 16 -

You might also like