Professional Documents
Culture Documents
R
AbstractWe present the FPGA/NIOS
implementation of
an adaptive finite impulse response (FIR) filter based on linear
prediction to suppress radio frequency interference (RFI). This
technique will be used for experiments that observe coherent
radio emission from extensive air showers induced by ultrahigh-energy cosmic rays. These experiments are designed to
make a detailed study of the development of the electromagnetic
part of air showers. Therefore, these radio signals provide
information that is complementary to that obtained by waterCherenkov detectors which are predominantly sensitive to the
particle content of an air shower at ground. The radio signals
from air showers are caused by the coherent emission due to
geomagnetic and charge-excess processes. These emissions can
be observed in the frequency band between 10 - 100 MHz.
However, this frequency range is significantly contaminated by
narrow-band RFI and other human-made distortions. A FIR
filter implemented in the FPGA logic segment of the front-end
electronics of a radio sensor significantly improves the signal-tonoise ratio. In this paper we discuss an adaptive filter, which
is based on linear prediction. The coefficients for the linear
predictor are dynamically refreshed and calculated in the virtual
R
NIOS
processor, which is implemented in the same FPGA chip.
The Levinson recursion, used to obtain the filter coefficients,
R
is also implemented in the NIOS
and is partially supported
by direct multiplication in the DSP blocks of the logic FPGA
segment. Tests confirm that the linear predictor can be an
alternative to other methods involving multiple time-to-frequency
domain conversions using an FFT procedure. These multiple
conversions draw heavily on the power consumption of the FPGA
and are avoided by the linear prediction approach. The FIR filter
R
has been successfully tested in the Altera
development kits with
R
R
the the EP4CE115F29C7 from the Altera
Cyclone
IV family
R
and the EP3C120F780C7 from the Cyclone
III family at a
170 MHz sampling rate, a 12-bit I/O resolution, and an internal
R
30-bit dynamic range. Most of the slow floating-point NIOS
calculations have been moved to the FPGA logic segments as
extended fixed-point operations, which significantly reduced the
refreshing time of the coefficients used in the linear prediction.
( )
(1)
=1
where is the number of coefficients and is the delayline. The delay-line implies that there is a gap between the
samples that are used for the prediction and the sample that is
to be predicted (Fig. 1). This delay-line is necessary to allow
transient signals to pass through the filter unaltered.
The here described method first makes a prediction () of
the samples () with
() =
( )
(2)
=1
I. I NTRODUCTION
Linear prediction [1] is a method widely used in real-time
audio processing such as the CELP algorithm [2], [3] in mobile
phones. With the advent of faster signal processing techniques
in FPGAs it is now possible to apply similar techniques to the
Manuscript received May 31, 2012; revised June 30, 2012.
Z. Szadkowski is with the University of odz, Department of Physics
and Applied Informatics, Faculty of High-Energy Astrophysics, 90-236 odz,
Poland, (e-mail: zszadkow @kfd2.phys.uni.lodz.pl, phone: +48 42 635 56 59).
E. D. Fraenkel and Ad M. van den Berg are with the Kernfysisch Versneller
Instituut of the University of Groningen, Groningen, The Netherlands
This work was funded by the National Center of Researches and Development under NCBiR Grant No. ERA/NET/ASPERA/02/11 and by the Stichting
voor Fundamenteel Onderzoek (FOM) in The Netherlands
(1)
We are now left with the task to find the optimal solution
of the predictor coefficients . An effective way of obtaining
the best solution is to assume Gaussianity and minimize the
estimated mean square error,
=
1
1
1
1 2
() =
{() ()}2
=0
=0
(3)
Fig. 2.
Fig. 1. An illustration of the method. The sine wave represents the signal
that is fitted (although in actuality no sinusoidal fit is performed like this)
where sample number is predicted by using the samples to .
The predicted values are then subtracted from the original values as illustrated
by the green dots below the horizontal bar with the minus sign, reducing the
variance of the signal.
= 0
(4)
( )() =
( )( ) (5)
=0 =1
=0
B. Fudge Factor
( )()
(6)
=0
(, ) ( ) =
( )( )
(7)
=0
(8)
where the vector and the symmetric Toeplitz (diagonalconstant) matrix R are known from Eqs. (6) and (7). The
special properties of the matrix R allow it to be represented
by the -dimensional vector . Solving for yields the
coefficients of the filter from equation (1). The method is
illustrated in Fig. 1.
In order to find a numerical solution to Eq. (8) some optimizations are possible: because of the diagonal-constant form
of the matrix R one can replace the conventional algorithm
using Gauss elimination by Levinson recursion [5], reducing
the time complexity from (3 ) to (2 ).
The desired signal from a cosmic-ray induced air-shower is
a short pulse that is contained well within 600 ns. Thus at a
sampling frequency of 200 MHz we choose = 128 for the
delay line.
We have hitherto assumed that the characteristics of the
background noise are constant. Although this is a good assumption for short time intervals the environment may change
over a larger period of time. Thus in order to make the
filter adaptive we require that the coefficients are re-calculated
after a predefined time interval. Initial tests have shown that
a sub-second recalculation of the coefficients can be easily
2 00
(9)
=+
=1
= R
(10)
II. R ESULTS
We present two analyses to show the viability of this
method. The first analysis is based on a realistic offline
situation. The second set of results is based on a simulation
akin to the final implementation into the FPGA.
a)
A. A Realistic Example
We here discuss an example based on a realistic situation
such as possible measurements performed with a radio array.
These pulses are measured in the north-south (N/S) and eastwest (E/W) polarizations of the antennas. Panel a) of Fig. 4
shows such a pulse together with its corresponding frequency
spectrum, in which the contaminating RFI lines are clearly
visible. We now investigate two methods for removing the
RFI from these radio traces.
a)
b)
b)
c)
c)
Fig. 4. In panel a) we see the original radio trace for two polarizations (N/S
and E/W) that is contaminated with RFI. In panel b) we see that the RFI
is reduced by applying a Hann window in the time domain and then setting
following frequencies to zero: 46.2-46.9, 55.1-55.4, 58.80-59.1, 66.8-67.5 and
71.19-71.3 MHz. In panel c) we see the results for the method with the linear
predictor. The delay line for this case is = 120 samples and the number
of predictor coefficients is = 50. The leading zeros occur because no data
is available to be predicted at the beginning of the trace.
a)
b)
c)
Fig. 5. In this figure we see spectrograms for the simulation. Panel a) contains
the spectrogram of the original noise. Panel b) contains the spectrogram of the
predicted signal and panel c) represents the cleaned signal which is essentially
panel a) minus panel b). The color scale is logarithmic and is clipped for
values lower than 30 dB.
( + + )
()
=
=
=
()
( + )
(12)
=0
( + ) =
(13)
1
()( + + )
=0
1
()( + )
(14)
=0
=0
()
=1 =0
( ) (15)
=0
NIOS
INPUT
VCC
NIOS_Clk
Clk
reset_n
out_port_from_the_CTRL[7..0]
Ext_ADC[11..0]
INPUT
VCC
data[11..0]
wraddress[10..0]
wren
rdaddress[10..0]
rden
ADC_Clk
NIOS_Clk
INPUT
VCC
INPUT
VCC
2048 Word(s)
RAM
dpram
q[11..0]
out_port_from_the_LP_COEFF_ADDR[6..0]
OUTPUT
COEFF_ADDR
out_port_from_the_LP_COEFF_DATA[17..0]
OUTPUT
COEFF_DATA
in_port_to_the_SMPL[11..0]
out_port_from_the_SMPL_ADDR[10..0]
wrclock
rdclock
rxd_to_the_UART
1
txd_from_the_UART
R
Fig. 6. A schematic connection of input data to the NIOS
via dual-port RAM. Calculated coefficients from the output port COEFF_DATA are next written
to the set of registers addressed by the port COEFF_ADDR.
DK-DSP-3C120N (Altera), with a Cyclone III FPGA EP3C120F780C7 (576 of 9 9 multipliers) and
R
DE2-115 (Terasic), with Cyclone
IV FPGA EP4CE115F29C7 (532 of 9 9 multipliers)
data[29..0]
clock
DFF
q[29..0]
dataa[29..0]
reg30
dataa[29..0]
A
result[29..0]
A+B
datab[29..0]
B
ADC_Clk
Fig. 7.
adder
adder
reg30
A
result[29..0]
A+B
B
data[29..0]
clock
datab[29..0]
DFF
reg30
A
result[29..0]
A+B
B
data[29..0]
clock
DFF
q[29..0]
q[29..0]
4
INPUT
VCC
delay_line
19
altshift_taps
shiftin[11..0]
shiftout[11..0]
clock
taps[11..0]
datab[29..0]
dataa[29..0]
A-B
A
Number of taps 1
Tap distance 7
INPUT
VCC
data[11..0]
clock
q[11..0]
reg12
DFF
data[11..0]
clock
q[11..0]
FIR output
OUTPUT
reg12
DFF
data[11..0]
clock
q[11..0]
data127[29..0]
data126[29..0]
data125[29..0]
data124[29..0]
data123[29..0]
data122[29..0]
data121[29..0]
data120[29..0]
data__[29..0]
data70[29..0]
data69[29..0]
data68[29..0]
data67[29..0]
data66[29..0]
data65[29..0]
data64[29..0]
data63[29..0]
data62[29..0]
data61[29..0]
data60[29..0]
data59[29..0]
data58[29..0]
data57[29..0]
data__[29..0]
data7x[29..0]
data6x[29..0]
data5x[29..0]
data4x[29..0]
data3x[29..0]
data2x[29..0]
data1x[29..0]
data0x[29..0]
reg12
DFF
data[11..0]
clock
q[11..0]
DFF
q[11..0]
INPUT
VCC
Addr_decoder[2]
Addr_decoder[3]
aload
DFF
q[17..0]
clock
data[17..0]
clock
reg18
data[17..0]
DFF
q[17..0]
clock
10
DFF
q[17..0]
reg18
data[17..0]
reg18
DFF
q[17..0]
reg18
q[17..0]
data[17..0]
clock
DFF
reg18
DFF
q[17..0]
aload
clock
aload
Addr_decoder[1]
INPUT
VCC
INPUT
VCC
INPUT
VCC
INPUT
VCC
data[17..0]
reg18
data[17..0]
DFF
q[17..0]
clock
11
data[17..0]
aload
clock
Addr_decoder[0]
reg18
DFF
q[17..0]
aload
data[17..0]
aload
INPUT
VCC
aload
reg18
COEFF_DATA
aload
ADC_Clk
DFF
q[29..0]
20
sub
reg12
DFF
data[11..0]
clock
data[29..0]
clock
result[29..0]
18
reg12
Ext_ADC
reg30
clock
12
13
INPUT
VCC
14
15
multiplication
Signed
datab[17..0]
16
multiplication
Signed
datab[17..0]
result[29..0]
dataa[11..0]
multiplication
Signed
datab[17..0]
result[29..0]
dataa[11..0]
mult
17
multiplication
Signed
datab[17..0]
result[29..0]
dataa[11..0]
mult
result[29..0]
dataa[11..0]
mult
mult
ADD_128
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
clock
result[36..0]
Input signed
LATENCY=7
inst
R
Fig. 8. A schematic of the FIR filter. The linear predictor coefficients calculated in the NIOS
are loaded sequentially into 18-bit registers (reg18) selected
by the output of the address decoder. Next, these coefficients are simultaneously reloaded to the 2nd level of registers connected to multipliers. A cascade
connection is necessary to refresh the new set of coefficients in a single clock cycle. Successively the delayed ADC data in the 12-bit register chain (reg12)
are multiplied by the coefficients in the embedded DSP multipliers and summed in the 128-input 30-bit routine. Finally the processed signal is subtracted
from the original one from the ADC. The memory-based delay line (Altera routine of altshift_taps) allows the synchronization of signals. The graph
shows only 4 stages. The next 124 stages are similar. The output of multipliers are connected to the inputs of ADD 128 routine.
logic block (Fig. 9). In this proposed implementation the samples are written to the 1 port. The addresses of the 2 and
3 ports are supplied by 11-bit counters, preloaded from the
R
by starting indices. The samples with synchronously
NIOS
shifted indices read from the tripple-port RAM are multiplied
with the original ones and summed in the ACCU procedure
(ALTMULT_ACCUM Altera library routine). After ( )
cycles the covariances for the selected index available on the
R
processor via a
output ACCU data is transfered to the NIOS
32-bit port. The ( ) cycles are repeated times for
the calculation of and again times for the calculation of
.
A multiplication inside the FPGA logic block by the use
of the embedded DSP multipliers takes only a single clock
cycle. A sequential multiplication with an accumulation of
temporary products takes simply clock cycles. Thus, the
calculation of both covariances requires approximately 2
clock cycles. For a 200 MHz clock, theoretically it
takes only ca. 1.3 ms. However, in reality the initialization of
R
slows the calculation down a bit to
processes of the NIOS
ca. 4 ms. This would still be 2.5 orders of magnitude faster
than performed by the soft-core processor only. We would get
a real speed optimization and a refreshment time reduction in
the order of 1.5 s, effectively reducing the refreshment time
to the time needed to perform the Levinson procedure.
B. A Power Consumption
The power consumption for the FIR filters with 64 and 128
stages is given in Table I. The filters were supplied by real
data (for 170 MHz extracted from the internal ROM) from the
AERA experiment for the EP3C120F780C7 FPGA (the heart
TABLE I
P OWER C ONSUMPTION FOR THE C YCLONE III FPGA - EP3C120F780C7
FOR 64/128- STAGE FIR F ILTERS AND 170 MH Z G LOBAL C LOCK
FIR
stages
64
128
64
128
chan
nels
1
1
2
2
LE
4749
9329
9399
18724
DSP
4%
8%
8%
16%
128
256
256
512
22%
44%
44%
89%
power
sim.
core
757 mW
1343 mW
1447 mW
2625 mW
power
mea.
core
684 mW
1160 mW
1263 mW
2255 mW
INPUT
VCC
Ext_ADC[11..0]
cnt11
clock
clk_en
cnt_en
data[10..0]
sload
NIOS_opt
rdaddress_a[10..0]
rden_a
q[10..0]
rdaddress_b[10..0]
rden_b
accu
qa[11..0]
qb[11..0]
12 bits
2048 words
up counter
cnt11
inst4
MULT0
dataa[11..0]
c0
a3
datab[11..0]
result[31..0]
c0
OUTPUT
COEFF_ADDR
out_port_from_the_COEFF_DATA[17..0]
OUTPUT
COEFF_DATA
a0
out_port_from_the_SMPL_ADDR[10..0]
a3
in_port_to_the_TEST[7..0]
overflow
clock
inst5
rxd_to_the_UART
datab: Signed
dataa: Signed
aclr0
aclr3
clock0
out_port_from_the_COEFF_ADDR[6..0]
in_port_to_the_COVARIANCES[31..0]
c0
a3
c0
inst6
ADC_Clk
Clk
reset_n
out_port_from_the_A_CTRL[7..0]
3portram
data[11..0]
wraddress[10..0]
wren
inst2
clock
clk_en
cnt_en
data[10..0]
sload
INPUT
VCC
NIOS_Clk
q[10..0]
txd_from_the_UART
inst
up counter
INPUT
VCC
R
Fig. 9. A schematic connection of the input data to the NIOS
via tripple-port RAM. The samples are written to the 1st port with an ADC speed of
R
by the starting indices. The samples
170/200 MHz. The addresses of the 2nd and 3rd ports are supplied by 11-bit counters, preloaded from the NIOS
with synchronously changed indices read from the tripple-port RAM are multiplied by themselves and summed in the ACCU procedure (ALTMULT_ACCUM
R
processor
Alteras library routine). After ( ) cycles the covariances for selected index available on the output ACCU data is transfered to the NIOS
via 32-bit port. Calculated coefficients from the output port COEFF_DATA are next written to the set of registers addressed by the port COEFF_ADDR.
R
Development Kit equipped in the EP3C120F780C7 Cyclone
R
C. Design Improvement
R
softhigher calculation power in comparison to the NIOS
R
core processor implemented in Cyclone III/IV FPGAs working with 75 MHz clock.
The prototype will be built on the 5CSEA6 chip with
110 kLE, 5 Mbit of internal memory and 224 18 19-bit
multipliers. These FPGAs should appear on the market in the
end of 2012.
The current design allows tests of algorithms and optimization of details as well as an estimation of crucial parameters
for the global system.
Arbitrary
waveform
generator
Development kit
DK-DSP-3C120N
with the FPGA EP3C120F780C7
Fig. 10. A view of a measurement system. For the 170 MHz sampling, AERAs data stored in the internal ROM are used for the estimation and measurements
of the power consumption. The 14-bit ADC (clocked with 150 MHz) on the Terasics daughter board (left connection via HSMC) is supplied from the arbitrary
waveform generator (Agilent 33250A) and provides a digitized test signals for a design verification: an agreement with an earlier PC calculated patterns, an
R
provides an addditional diagnostics sending supporting data via the UART port and the RS232 interface
estimation of quantization errors etc. The NIOS
located on the Industrial Communication Board (right connection via HSMC) to the external host.