You are on page 1of 8

FPGA/NIOS Implementation of an Adaptive FIR

Filter Using Linear Prediction to Reduce Narrow


Band RFI for Radio Detection of Cosmic Rays
Zbigniew Szadkowski, Member, IEEE, E.D. Fraenkel, Ad M. van den Berg

R
AbstractWe present the FPGA/NIOS
implementation of
an adaptive finite impulse response (FIR) filter based on linear
prediction to suppress radio frequency interference (RFI). This
technique will be used for experiments that observe coherent
radio emission from extensive air showers induced by ultrahigh-energy cosmic rays. These experiments are designed to
make a detailed study of the development of the electromagnetic
part of air showers. Therefore, these radio signals provide
information that is complementary to that obtained by waterCherenkov detectors which are predominantly sensitive to the
particle content of an air shower at ground. The radio signals
from air showers are caused by the coherent emission due to
geomagnetic and charge-excess processes. These emissions can
be observed in the frequency band between 10 - 100 MHz.
However, this frequency range is significantly contaminated by
narrow-band RFI and other human-made distortions. A FIR
filter implemented in the FPGA logic segment of the front-end
electronics of a radio sensor significantly improves the signal-tonoise ratio. In this paper we discuss an adaptive filter, which
is based on linear prediction. The coefficients for the linear
predictor are dynamically refreshed and calculated in the virtual
R
NIOS
processor, which is implemented in the same FPGA chip.
The Levinson recursion, used to obtain the filter coefficients,
R
is also implemented in the NIOS
and is partially supported
by direct multiplication in the DSP blocks of the logic FPGA
segment. Tests confirm that the linear predictor can be an
alternative to other methods involving multiple time-to-frequency
domain conversions using an FFT procedure. These multiple
conversions draw heavily on the power consumption of the FPGA
and are avoided by the linear prediction approach. The FIR filter
R
has been successfully tested in the Altera
development kits with
R
R
the the EP4CE115F29C7 from the Altera
Cyclone
IV family
R
and the EP3C120F780C7 from the Cyclone
III family at a
170 MHz sampling rate, a 12-bit I/O resolution, and an internal
R
30-bit dynamic range. Most of the slow floating-point NIOS
calculations have been moved to the FPGA logic segments as
extended fixed-point operations, which significantly reduced the
refreshing time of the coefficients used in the linear prediction.

real-time processing of radio signals in the 10 - 100 MHz


region [4].
In this document we will describe a method to remove the
RFI lines from a signal contaminated with narrow band emitters and we will show how this can be done without reducing
the amplitude of transient signals which is tantamount to the
detection of cosmic ray induced radio pulses. In addition we
will make a comparison with other techniques that require
a treatment of the signal in the frequency domain. We will
show that the method is both adaptive and efficient in terms
of energy consumption, and that it can be used as an alternative
to methods in the frequency domain.
A. Mathematical Background
A continuous stream of data is assumed to be contaminated
with narrow-band RFI and is represented by the samples ().
It is our goal to design a FIR filter with coefficients such
that that the narrow-band RFI in the resulting signal () is
reduced as much as possible. The filter can be described as
() = ()

( )

(1)

=1

where is the number of coefficients and is the delayline. The delay-line implies that there is a gap between the
samples that are used for the prediction and the sample that is
to be predicted (Fig. 1). This delay-line is necessary to allow
transient signals to pass through the filter unaltered.
The here described method first makes a prediction () of
the samples () with
() =

( )

(2)

=1

I. I NTRODUCTION
Linear prediction [1] is a method widely used in real-time
audio processing such as the CELP algorithm [2], [3] in mobile
phones. With the advent of faster signal processing techniques
in FPGAs it is now possible to apply similar techniques to the
Manuscript received May 31, 2012; revised June 30, 2012.
Z. Szadkowski is with the University of odz, Department of Physics
and Applied Informatics, Faculty of High-Energy Astrophysics, 90-236 odz,
Poland, (e-mail: zszadkow @kfd2.phys.uni.lodz.pl, phone: +48 42 635 56 59).
E. D. Fraenkel and Ad M. van den Berg are with the Kernfysisch Versneller
Instituut of the University of Groningen, Groningen, The Netherlands
This work was funded by the National Center of Researches and Development under NCBiR Grant No. ERA/NET/ASPERA/02/11 and by the Stichting
voor Fundamenteel Onderzoek (FOM) in The Netherlands

978-1-4673-1084-0/12/$31.00 2012 IEEE

Subsequently the prediction is subtracted from the original


signal () such that
() = () ()

(1)

We are now left with the task to find the optimal solution
of the predictor coefficients . An effective way of obtaining
the best solution is to assume Gaussianity and minimize the
estimated mean square error,
=

1
1
1
1 2
() =
{() ()}2
=0
=0

(3)

were is a number large enough to obtain convergence of this


estimate. In our present case should be at least a thousand

Fig. 2.

Fig. 1. An illustration of the method. The sine wave represents the signal
that is fitted (although in actuality no sinusoidal fit is performed like this)
where sample number is predicted by using the samples to .
The predicted values are then subtracted from the original values as illustrated
by the green dots below the horizontal bar with the minus sign, reducing the
variance of the signal.

samples or more. In order to obtain the best values of the


mean square error is minimized by the requirement that,

= 0

(4)

which yields the following system of equations,

( )() =

( )( ) (5)

The data flow of the FIR filter based on linear prediction.

obtained. This recalculation rate is more than sufficient for


our experimental conditions.
Fig. 2 shows the structure of the FIR filter based on linear
prediction according to the Eq. (1). Covariances from Eqs.
R
(6) and (7) can be calculated either in the vitual NIOS
processor (Fig. 6) or much faster in the FPGA fast logic
R
processor solves the matrix Eq.
block (Fig. 9). The NIOS
(8) and provides coefficients needed for the FIR filter. The
R
to
calculated coefficients are transferred from the NIOS
the fast logic block, uploaded into appropriate registers and
used as the FIR coefficients in the ADC data filtering (see
Fig. 8).

=0 =1

=0

We can now define the covariances


() =

B. Fudge Factor

( )()

(6)

=0

(, ) ( ) =

( )( )

(7)

=0

and rewrite equation (5) in vectorial form as,


= R

(8)

where the vector and the symmetric Toeplitz (diagonalconstant) matrix R are known from Eqs. (6) and (7). The
special properties of the matrix R allow it to be represented
by the -dimensional vector . Solving for yields the
coefficients of the filter from equation (1). The method is
illustrated in Fig. 1.
In order to find a numerical solution to Eq. (8) some optimizations are possible: because of the diagonal-constant form
of the matrix R one can replace the conventional algorithm
using Gauss elimination by Levinson recursion [5], reducing
the time complexity from (3 ) to (2 ).
The desired signal from a cosmic-ray induced air-shower is
a short pulse that is contained well within 600 ns. Thus at a
sampling frequency of 200 MHz we choose = 128 for the
delay line.
We have hitherto assumed that the characteristics of the
background noise are constant. Although this is a good assumption for short time intervals the environment may change
over a larger period of time. Thus in order to make the
filter adaptive we require that the coefficients are re-calculated
after a predefined time interval. Initial tests have shown that
a sub-second recalculation of the coefficients can be easily

The solution to the eigenvalue problem can be found using


Gauss elimination or by exploiting the band-diagonal symmetry of the matrix R by using Levinson recursion. The use
of double precision (64 bit) floating point values ensures that
neither of these methods incur significant numerical rounding
errors. Nevertheless an uncertainty is introduced due to the
finite amount of background signal that is available to estimate
the coefficients. Because the signal is band-width limited,
this uncertainty can cause the eigenvalues of R to fluctuate
dangerously close to zero which in turn causes the amplitudes
of to become very large and this results in an unstable
filter as can be seen in Fig 3a). To prevent the coefficients
from becoming very large we introduce a fudge factor into
the expression for the mean square error which stabilizes the
result:

2 00
(9)
=+
=1

now includes the requirement that the


This extra term in
is minimized.
amplitudes of the coefficients remain low as

The solution to = 0 can then be written as

= R

(10)

similar to the previous equation (8), but now with


= (1 + ) and
= for = (11)

In Fig. 3b) we have used a fudge factor of = 0.1 and in


Fig. 3c) the implementation of the fudge factor with a value
of 1 is shown (the value used in this paper).

II. R ESULTS
We present two analyses to show the viability of this
method. The first analysis is based on a realistic offline
situation. The second set of results is based on a simulation
akin to the final implementation into the FPGA.

a)

A. A Realistic Example
We here discuss an example based on a realistic situation
such as possible measurements performed with a radio array.
These pulses are measured in the north-south (N/S) and eastwest (E/W) polarizations of the antennas. Panel a) of Fig. 4
shows such a pulse together with its corresponding frequency
spectrum, in which the contaminating RFI lines are clearly
visible. We now investigate two methods for removing the
RFI from these radio traces.
a)

b)

b)

c)

c)

Fig. 3. Panel a) shows an example of the calculation for 64 coefficients for


various starting points when no fudge factor is applied. Panel b) shows the
coefficients for a fudge factor of = 0.1. Panel c) shows how the instability
is removed when choosing = 1, which is used in this paper. However, as
can be seen in Panel b), even for 1 the stability improves considerably.

Fig. 4. In panel a) we see the original radio trace for two polarizations (N/S
and E/W) that is contaminated with RFI. In panel b) we see that the RFI
is reduced by applying a Hann window in the time domain and then setting
following frequencies to zero: 46.2-46.9, 55.1-55.4, 58.80-59.1, 66.8-67.5 and
71.19-71.3 MHz. In panel c) we see the results for the method with the linear
predictor. The delay line for this case is = 120 samples and the number
of predictor coefficients is = 50. The leading zeros occur because no data
is available to be predicted at the beginning of the trace.

a)

b)

c)

simulation like this feasible, down-scaling with approximately


two or three orders of magnitude away from a real implementation is necessary. Nevertheless this does not affect the
general principle or validity of this test.
The simulated trace is created by applying a digital rectangular band pass filter (of 30 to 80 MHz) to white noise
obtained from a Gaussian random number generator. RFI lines
including amplitude and frequency modulation are added with
the use of sine functions. In addition one frequency is added
that turns abruptly on and off. Finally the values are digitized
by converting the floating point numbers to integers in a range
of 2048 ADC.
Panel a) of Fig. 5 shows a spectrogram of this simulated
noisy environment. Panel b) of the same figure shows the
predicted signal and Panel c) shows the cleaned signal.
III. FPGA I MPLEMENTATION

Fig. 5. In this figure we see spectrograms for the simulation. Panel a) contains
the spectrogram of the original noise. Panel b) contains the spectrogram of the
predicted signal and panel c) represents the cleaned signal which is essentially
panel a) minus panel b). The color scale is logarithmic and is clipped for
values lower than 30 dB.

The first method, is based on a procedure in the frequency


domain based on a digital notch filter. A Hann window is
applied to the trace in the time domain, an FFT is taken and the
offending frequencies are set to zero. Subsequently the inverse
FFT is calculated to obtain the cleaned trace. The results of
this method are shown in panel b) of Fig. 4. Similar results
can be obtained in the time domain as well with a chain of
IIR filters [6]. However both methods require that that the
removed frequencies are determined manually.
The second method that we illustrate here is the method
based on linear prediction. The covariances and R are
obtained from the 1500 samples in the last part of the trace
that contains only background. The full trace is then cleaned
using the filter from (1). The results of this method are shown
in panel c) of Fig. 4.
Both methods significantly improve the signal to noise ratio.
However in order to compare both methods in more detail
we also compared the signal to noise ratios of several such
radio pulses. We concluded that the signal to noise ratios based
on linear prediction are equivalent to those obtained from the
method in the frequency domain.
B. Simulation
In order to ascertain the effectiveness of the method in a
changing noise environment we have done another type of
analysis on a PC with a relatively large simulated radio trace
consisting of 20480000 samples. This would be equivalent
to 0.1 s of data taking at a hypothetical sampling frequency
of 200 MHz. The re-calculation of the coefficients is done
every 1024 samples. Thus, this re-calculation is done at a
much higher refresh rate than will be obtained in the real
implementation in the FPGA. However, in order to keep a

In order to implement the method into an algorithm suitable


for numeric computation c.q. in the FPGA it is necessary
to specify the sums and multiplications such that the appropriate boundary conditions for the available data are satisfied.
With the requirement that the indices of () lie within the
range of 0...1, the formulas (2), (6) and (7) can be rewritten
by shifting and re-ordering the indices,

( + + )
()

=
=
=

()

( + )

(12)

=0

( + ) =
(13)
1

()( + + )
=0
1

()( + )

(14)

=0

and the required equation to be solved by Levinson recursion


becomes,
1

=0

()

=1 =0

( ) (15)

=0

A solution to equation (15) for high values of requires


significant calculation. The coefficients needed for the FIR
filter can be calculated either by an external host or by a
micro-controller located in the local system. The transmission
of data to an external host needs several seconds due to a
relatively narrow transmission band-width. Thus, the profit on
time obtained by a fast calculation in an external computer is
wasted by slow data transmission. Additionally, a permanent
link needed for calculation is not a reliable approach. A much
more reliable calculation can be performed locally, even if the
calculation in itself consumes more time.
The modern FPGA chips proposed to be used for data
processing allow, in addition to parallel calculations, an implementation of a local micro-controller section (the virtual
R
processor). The analysis showed that a NIOS approach
NIOS
gives very satisfactory results and this solution can be treated

NIOS
INPUT
VCC

NIOS_Clk

Clk
reset_n
out_port_from_the_CTRL[7..0]

Ext_ADC[11..0]

INPUT
VCC

data[11..0]
wraddress[10..0]
wren
rdaddress[10..0]
rden

ADC_Clk
NIOS_Clk

INPUT
VCC
INPUT
VCC

2048 Word(s)
RAM

dpram

q[11..0]

out_port_from_the_LP_COEFF_ADDR[6..0]

OUTPUT

COEFF_ADDR

out_port_from_the_LP_COEFF_DATA[17..0]

OUTPUT

COEFF_DATA

in_port_to_the_SMPL[11..0]
out_port_from_the_SMPL_ADDR[10..0]

wrclock
rdclock
rxd_to_the_UART
1

Block Type: AUTO

txd_from_the_UART

R
Fig. 6. A schematic connection of input data to the NIOS
via dual-port RAM. Calculated coefficients from the output port COEFF_DATA are next written
to the set of registers addressed by the port COEFF_ADDR.

as the final one. Computational support by an external host is


not needed.
The samples () needed for the calculation of the filter
R
is
coefficients have to be frozen before a transfer to the NIOS
performed. The dual-port RAM library routine allows writing
samples via the left port with a full speed (the same as the
ADC) and subsequently, when the memory is filled, the data
R
via the right
can be transfered from the memory to the NIOS

port with a speed corresponding to the NIOS R processing.


An array of = 2048 words is sufficient. The transfer of
the necessary data takes approximately 3 ms. After the datatransfer the covariances and according to Eqs. (13) and
R
. The calculation time
(14) are calculated inside the NIOS
is 1.5 s for a 64-bit double representation of the variables.
Theoretically the calculation time could be reduced by using
a representation based on 48 bit float variables, however, the
accuracy dramatically falls down from 1011 to only 102 .
We conclude that the float representation is too short for the
required accuracy and for all calculations in this paper only
double representation are used.
The standard Gauss procedure for 64 and 128 coefficients
takes 1.964 s and 14.78 s respectively. The Levinson procedure is much more efficient and takes only 0.191 s and 0.758 s,
respectively. The approximation errors are on the same level;
1011 .
The FIR filter from equation (1) is built from a chain of
delay registers multipliers and adders (Fig. 8). According to
equation (2) a prediction is calculated as the sum of the
products of coefficients and delayed samples ( ).
The samples are given by 12-bit ADCs. The coefficients
can be implemented in a fixed-point 18-bit representation. This
12 18 bits multiplication (with 30 bit as a result) requires
only 2 DSP blocks.
For the tests two types of development kits have been used:

DK-DSP-3C120N (Altera), with a Cyclone III FPGA EP3C120F780C7 (576 of 9 9 multipliers) and
R
DE2-115 (Terasic), with Cyclone
IV FPGA EP4CE115F29C7 (532 of 9 9 multipliers)

Adding 128 temporary sums extends the internal data width


to 37 bits. Altera provides the PARALLEL ADD routine
allowing a sum of 128 inputs with 30-bit width. However the
routine is relatively slow even for 7 clock cycles of latency. A
standard cascade chain of lpm_add_sub routines supported
by a register lpm_ff (Fig. 7) dramatically increases the
speed.
Finally the processed signal is subtracted from the original
one from the ADC. Altera provides useful procedures from
the library: multiplication of signed variables, parallel adding
of multiple variables (even 128 in a single routine). Various
configurations: FIR length, coefficients width and internal
approximations on pipeline stages will be optimized as well
to minimize the power consumption. A relative accuracy of
1011 for double variables is easily achieved.
R
The development kit DE2-115 with Cyclone
IV has been
R

used for the development of the NIOS code in the C


language. This kit is equipped in a RS232 port used for
the transmission of a large amount of data to the external
computer for an analysis. Unfortunately, it is not equipped
with any system that supports a measurement of the power
consumption.
The power consumption has been measured in the developR
III FPGA equipped
ment kit DK-DSP-3C120N with Cyclone
with an excellent system of power measurements.
A. Possible Speed Optimization
The calculation of the covariances and according to
R
takes ca. 1.5 s and could
Eqs. (13) and (14) inside the NIOS
be significantly reduced if it were moved from the NIOS to the
adder
dataa[29..0]
datab[29..0]
1

data[29..0]
clock

DFF

q[29..0]
dataa[29..0]

reg30

dataa[29..0]

A
result[29..0]
A+B
datab[29..0]
B

ADC_Clk

Fig. 7.

adder

adder

Both FPGAs contain enough DSP multipliers to implement


even 2 channels (2 FIRs for 2 polarizations) with 128 coefficients each.

reg30

A
result[29..0]
A+B
B

data[29..0]
clock

datab[29..0]
DFF

reg30

A
result[29..0]
A+B
B

data[29..0]
clock

DFF

q[29..0]

q[29..0]

4
INPUT
VCC

A part of a cascade chain calculating the sum of 128 inputs.

delay_line

19

altshift_taps
shiftin[11..0]
shiftout[11..0]
clock
taps[11..0]

datab[29..0]
dataa[29..0]

A-B
A

Number of taps 1
Tap distance 7

INPUT
VCC

data[11..0]
clock

q[11..0]

reg12
DFF

data[11..0]
clock

q[11..0]

FIR output

OUTPUT

reg12
DFF

data[11..0]
clock

q[11..0]

data127[29..0]
data126[29..0]
data125[29..0]
data124[29..0]
data123[29..0]
data122[29..0]
data121[29..0]
data120[29..0]
data__[29..0]
data70[29..0]
data69[29..0]
data68[29..0]
data67[29..0]
data66[29..0]
data65[29..0]
data64[29..0]
data63[29..0]
data62[29..0]
data61[29..0]
data60[29..0]
data59[29..0]
data58[29..0]
data57[29..0]
data__[29..0]
data7x[29..0]
data6x[29..0]
data5x[29..0]
data4x[29..0]
data3x[29..0]
data2x[29..0]
data1x[29..0]
data0x[29..0]

reg12
DFF

data[11..0]
clock

q[11..0]

DFF

q[11..0]

INPUT
VCC

Addr_decoder[2]
Addr_decoder[3]

aload

DFF

q[17..0]

clock

data[17..0]
clock

reg18

data[17..0]

DFF

q[17..0]

clock

10

DFF

q[17..0]

reg18

data[17..0]

reg18
DFF

q[17..0]

reg18
q[17..0]

data[17..0]
clock

DFF

Load from NIOS

reg18
DFF

q[17..0]

aload

clock

aload

Addr_decoder[1]

INPUT
VCC
INPUT
VCC
INPUT
VCC
INPUT
VCC

data[17..0]

reg18

data[17..0]

DFF

q[17..0]

clock

11

data[17..0]

aload

clock
Addr_decoder[0]

reg18
DFF

q[17..0]
aload

data[17..0]

aload

INPUT
VCC

aload

reg18
COEFF_DATA

aload

ADC_Clk

DFF

q[29..0]

20

sub

reg12

DFF

data[11..0]
clock

data[29..0]
clock

result[29..0]

18

reg12
Ext_ADC

reg30

clock

12

13

INPUT
VCC

14

15
multiplication
Signed

datab[17..0]

16
multiplication
Signed

datab[17..0]

result[29..0]
dataa[11..0]

multiplication
Signed

datab[17..0]

result[29..0]
dataa[11..0]

mult

17
multiplication
Signed

datab[17..0]

result[29..0]
dataa[11..0]

mult

result[29..0]
dataa[11..0]

mult

mult

ADD_128
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

clock

result[36..0]

Input signed
LATENCY=7

inst

R
Fig. 8. A schematic of the FIR filter. The linear predictor coefficients calculated in the NIOS
are loaded sequentially into 18-bit registers (reg18) selected
by the output of the address decoder. Next, these coefficients are simultaneously reloaded to the 2nd level of registers connected to multipliers. A cascade
connection is necessary to refresh the new set of coefficients in a single clock cycle. Successively the delayed ADC data in the 12-bit register chain (reg12)
are multiplied by the coefficients in the embedded DSP multipliers and summed in the 128-input 30-bit routine. Finally the processed signal is subtracted
from the original one from the ADC. The memory-based delay line (Altera routine of altshift_taps) allows the synchronization of signals. The graph
shows only 4 stages. The next 124 stages are similar. The output of multipliers are connected to the inputs of ADD 128 routine.

logic block (Fig. 9). In this proposed implementation the samples are written to the 1 port. The addresses of the 2 and
3 ports are supplied by 11-bit counters, preloaded from the
R
by starting indices. The samples with synchronously
NIOS
shifted indices read from the tripple-port RAM are multiplied
with the original ones and summed in the ACCU procedure
(ALTMULT_ACCUM Altera library routine). After ( )
cycles the covariances for the selected index available on the
R
processor via a
output ACCU data is transfered to the NIOS
32-bit port. The ( ) cycles are repeated times for
the calculation of and again times for the calculation of
.
A multiplication inside the FPGA logic block by the use
of the embedded DSP multipliers takes only a single clock
cycle. A sequential multiplication with an accumulation of
temporary products takes simply clock cycles. Thus, the
calculation of both covariances requires approximately 2
clock cycles. For a 200 MHz clock, theoretically it
takes only ca. 1.3 ms. However, in reality the initialization of
R
slows the calculation down a bit to
processes of the NIOS
ca. 4 ms. This would still be 2.5 orders of magnitude faster
than performed by the soft-core processor only. We would get
a real speed optimization and a refreshment time reduction in
the order of 1.5 s, effectively reducing the refreshment time
to the time needed to perform the Levinson procedure.
B. A Power Consumption
The power consumption for the FIR filters with 64 and 128
stages is given in Table I. The filters were supplied by real
data (for 170 MHz extracted from the internal ROM) from the
AERA experiment for the EP3C120F780C7 FPGA (the heart

TABLE I
P OWER C ONSUMPTION FOR THE C YCLONE III FPGA - EP3C120F780C7
FOR 64/128- STAGE FIR F ILTERS AND 170 MH Z G LOBAL C LOCK

FIR
stages
64
128
64
128

chan
nels
1
1
2
2

LE
4749
9329
9399
18724

DSP
4%
8%
8%
16%

128
256
256
512

22%
44%
44%
89%

power
sim.
core
757 mW
1343 mW
1447 mW
2625 mW

power
mea.
core
684 mW
1160 mW
1263 mW
2255 mW

of the development kit DK-DSP-3C120N) both for simulation


and real measurements. The measured power for the core is
ca. 15% lower than for simulations.
The power consumptions given in Table I were measured
R
processor (pure FIR filters). Merging the
without the NIOS
R

NIOS caused an increase in the power consumption of ca.


170 mW for the Levinson procedure and 128 stages. For
two channels the linear predictor coefficients are calculated
R
power consumption remains on
sequentially, thus the NIOS
the same level as for the single channel. The refreshment time
increases twice.
The EP3C120F780C7 FPGA is the middle speed grade
version. The clock giving positive slacks for all paths is 170
MHz only. The required speed of 200 MHz has been achieved
for the fastest speed grade FPGA i.e. EP3C80F780C6. The
simulated power consumption increased 3% only with respect
to 170 MHz version.
The measurement system is shown on the Fig. 10. The
arbitrary pattern generator supplies the ADC daughter card,
which converts the analog signal into the digital 14-bit representation. The ADC card is connected via HSMC to the DSP

INPUT
VCC

Ext_ADC[11..0]

cnt11
clock
clk_en
cnt_en
data[10..0]
sload

NIOS_opt

rdaddress_a[10..0]
rden_a

q[10..0]

rdaddress_b[10..0]
rden_b

accu
qa[11..0]
qb[11..0]

12 bits
2048 words

up counter

cnt11

inst4

MULT0

dataa[11..0]
c0
a3

datab[11..0]

result[31..0]
c0

OUTPUT

COEFF_ADDR

out_port_from_the_COEFF_DATA[17..0]

OUTPUT

COEFF_DATA

a0

out_port_from_the_SMPL_ADDR[10..0]

a3

in_port_to_the_TEST[7..0]

overflow

clock
inst5

rxd_to_the_UART

datab: Signed
dataa: Signed

aclr0
aclr3

clock0

out_port_from_the_COEFF_ADDR[6..0]

in_port_to_the_COVARIANCES[31..0]

c0
a3

c0

inst6

ADC_Clk

Clk
reset_n
out_port_from_the_A_CTRL[7..0]

3portram
data[11..0]
wraddress[10..0]
wren

inst2

clock
clk_en
cnt_en
data[10..0]
sload

INPUT
VCC

NIOS_Clk

q[10..0]

txd_from_the_UART

inst

up counter
INPUT
VCC

R
Fig. 9. A schematic connection of the input data to the NIOS
via tripple-port RAM. The samples are written to the 1st port with an ADC speed of
R
by the starting indices. The samples
170/200 MHz. The addresses of the 2nd and 3rd ports are supplied by 11-bit counters, preloaded from the NIOS
with synchronously changed indices read from the tripple-port RAM are multiplied by themselves and summed in the ACCU procedure (ALTMULT_ACCUM
R
processor
Alteras library routine). After ( ) cycles the covariances for selected index available on the output ACCU data is transfered to the NIOS
via 32-bit port. Calculated coefficients from the output port COEFF_DATA are next written to the set of registers addressed by the port COEFF_ADDR.

R
Development Kit equipped in the EP3C120F780C7 Cyclone
R

III FPGA. The NIOS provides an addditional diagnostics


sending supporting data via the UART port and the RS232
interface located on the Industrial Communication Board (right
connection via HSMC) to the external host.
The ADC daughter card provides, actually, a data quantization with a maximal sampling of 150 MHz. Tests with higher
sampling used data stored in the FPGAs ROM and read with
i.e. 170 MHz.

C. Design Improvement

R
softhigher calculation power in comparison to the NIOS
R

core processor implemented in Cyclone III/IV FPGAs working with 75 MHz clock.
The prototype will be built on the 5CSEA6 chip with
110 kLE, 5 Mbit of internal memory and 224 18 19-bit
multipliers. These FPGAs should appear on the market in the
end of 2012.
The current design allows tests of algorithms and optimization of details as well as an estimation of crucial parameters
for the global system.

IV. C ONCLUSIONS AND D ISCUSSION


R

The new Alteras Low-Cost FPGA family - Cyclone V


offers for high-volume applications the lowest system cost
and power consumption in comparison to the other Midrange
FPGAs (Arria Series) of High-End FPGAs (Stratix Series).
The total power consumption compared with the previous genR
R
IV) is reduced up to 40%. The Cyclone
eration (Cyclone
V FPGA includes also an integrated hard processor system
(HPS) consisting of processors, peripherals and memory
controller with the FPGA fabric using a high-bandwidth
interconnections.
R
IV proAccording to Alteras documentation the Cyclone
R
vides 30% of the power reduction in comparison to Cyclone
R

III family and the Cyclone V 40% in comparison to the


R
IV. A level of 58% of the total power reduction
Cyclone
R
R
III to Cyclone
V is a significant factor due
from Cyclone
to a supply of the autonomous system from solar panels.
The hard processor system and the FPGA fast logic block
are interfaced via a high-bandwidth system interconnect built
R
AXITM bus bridges.
from high-performance ARM AMBA
The bus masters in the FPGA have access to HPS bus slaves
via the FPGA-to-HPS interconnect and vice-versa HPS bus
masters have access to bus slaves in the FPGA via the HPSto-FPGA bridge. Both bridges support simultaneous read and
write transactions. Data exchange is much more flexible in
R
.
comparison to the NIOS
The HPS with 800-MHz dual-core processor supporting
symmetric and asymmetric multiprocessing provides much

The analysis shows that the linear prediction method is an


adaptive method that is as efficient as other noise-elimination
R
processor
techniques. Preliminary tests show that the NIOS
is a powerful tool allowing a calculation of the linear predictor
coefficients with a very high precision. We conclude that
the linear predictor is a viable alternative to other methods
such as non-adaptive methods involving digital notch filters or
multiple time-to-frequency domain conversions using an FFT
procedure.
R EFERENCES
[1] J. Makhoul, Linear prediction: A tutorial review Proceedings of the
IEEE, vol. 63, no. 4, pp. 561-580, Apr. 1975.
[2] M. R. Schroeder, B. S. Atal, Stochastic coding of speech signals at very low bit rates: The importance of speech perception, Speech Communication, vol. 4, no, 1-3, pp. 155-162, 1985,
http://www.sciencedirect.com/science/article/pii/0167639385900433
[3] I. M. Trancoso, J. S. Marques, C. M. Ribeiro, CELP and
sinusoidal coders: Two solutions for speech coding at 4.8-9.6
kbps, Speech Communication, vol. 9, no. 5-6, pp. 389-400, 1990,
http://www.sciencedirect.com/science/article/pii/0167639390900163
[4] P. Abreu et al. [Pierre Auger Collaboration], The Pierre Auger Observatory V: Enhancements, Proceedings of the 32 ICRC - Beijing, Aug.
2011, arXiv:1107.4807v1.
[5] N. Levinson, The Wiener RMS (root mean square) error criterion in
filter design and prediction, J. Math. Phys., vol. 25, no. 4, pp. 261-278,
1947.
[6] J. Kelley for the Pierre Auger Collaboration, Data Acquisition, Triggering, and Filtering at the Auger Engineering Radio Array, Submitted to
Nucl. Instr. Meth., 2012.

Arbitrary
waveform
generator

ADDA daughter board


with two 14-bit ADCs
(150 MHz) and two 14bit DACs (250 MHz)

Development kit
DK-DSP-3C120N
with the FPGA EP3C120F780C7

ICB HSMC daughter


board with RS232,
RS484 and CANBUS
interfaces

Fig. 10. A view of a measurement system. For the 170 MHz sampling, AERAs data stored in the internal ROM are used for the estimation and measurements
of the power consumption. The 14-bit ADC (clocked with 150 MHz) on the Terasics daughter board (left connection via HSMC) is supplied from the arbitrary
waveform generator (Agilent 33250A) and provides a digitized test signals for a design verification: an agreement with an earlier PC calculated patterns, an
R
provides an addditional diagnostics sending supporting data via the UART port and the RS232 interface
estimation of quantization errors etc. The NIOS
located on the Industrial Communication Board (right connection via HSMC) to the external host.

You might also like