Letters: Direct Neural-Network Hardware-Implementation Algorithm

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 57, NO.
5, MAY 2010 1845
Letters
Direct Neural-Network leads to compact hardware structures but it cannot be used for FPGA
Hardware-Implementation Algorithm implementation because TGs are not available in FPGA’s configurable
logic blocks.
Andrei Dinu, Marcian N. Cirstea, and Silvia E. Cirstea The algorithm presented in this letter is applicable to both
application-specific-integrated-circuit and FPGA implementations of
ANNs composed of neurons with step activation functions [10].
Abstract—An algorithm for compact neural-network hardware imple-
Each neuron is treated as a Boolean function, and it is implemented
mentation is presented, which exploits the special properties of the Boolean
functions describing the operation of artificial neurons with step activation separately, thus minimizing implementation complexity. The most
function. The algorithm contains three steps: artificial-neural-network useful property of such a Boolean function is that, if its truth table
(ANN) mathematical model digitization, conversion of the digitized model is constructed as a matrix with as many dimensions as neuron inputs,
into a logic-gate structure, and hardware optimization by elimination then the truth table has only one large group of “1s” and one large
of redundant logic gates. A set of C++ programs automates algorithm
implementation, generating an optimized very high speed integrated cir- group of “0s.” The solid group of 1s is not visible when the Gray
cuit hardware description language code. This strategy bridges the gap codification is used, and thus, classical Quine–McClusky algorithms
between ANN design software and hardware design packages (Xilinx). or Karnaugh maps cannot be efficiently used. Our algorithm uses
Although the method is directly applicable only to neurons with step a different approach and generates a multilayer pyramidal hardware
activation functions, it can be extended to sigmoidal functions.
structure, where layers of AND gates alternate with layers of OR gates.
Index Terms—Field-programmable gate array (FPGA), The bottom layer consists of incomplete NOT gates, a structure to be
hardware implementation, neural networks. optimized later by eliminating redundant logic-gate groups. However,
the method is effective only when the numbers of inputs and bits
I. I NTRODUCTION on each input are low; otherwise, a classical circuit may be more
efficient.
According to a European Network of Excellence report [1], the
future implementation of hardware neural networks is shaped in three II. I MPLEMENTATION A LGORITHM
ways: 1) by developing advanced techniques for mapping neural net-
works onto field-programmable gate array (FPGA); 2) by developing Each neuron of the ANN is first converted into a binary equivalent
innovative learning algorithms which are hardware realizable [2]; and neuron whose inputs are only 1 and 0, in a two-step process. Sub-
3) by defining high-level descriptions of the neural algorithms in sequently, the binary neuron model is iteratively transformed into a
an industry standard to allow full simulations and fabrication and logic-gate structure.
to produce demonstrators of the technology for industry. Such new
designs will be of use to industry if the cost of adopting them is
sufficiently low. Hardware-based neural networks are important to A. Digitization of One Neuron Mathematical Model
industry as they offer low power consumption and small size com- The binary codification used for neuron inputs is the “two’s com-
pared with PC software and they can be embedded in a wide range plement,” which is generally used to represent integers, but it can be
of systems. Software libraries exist for traditional artificial-neural- adapted for real values in the interval [−1, +1). Thus, considering an
network (ANN) models (Matlab). However, the industry-standard n-bit representation bn−1 , bn−2 , bn−3 , . . . , b1 , b0 , the corresponding
form is very high speed integrated circuit hardware description integer value (In ) is given by
language (VHDL) or C++ parameterized modular code, allowing
customization.
n−2
A range of research papers on ANN-based controllers were pub- In = −2n−1 · bn−1 + 2i · bi . (1)
i=0
lished over the last decade [3], [4]. Some recent publications [5]–[8]
consider the FPGA as an effective implementation solution of control The largest positive number, which can be represented on “n” bits,
algorithms for industrial applications. Hardware-implemented ANNs is 2n−1 − 1, and −2n−1 is the smallest. Real values between −1.0
have an important advantage over computer-simulated ones by fully and +1.0 can be represented by dividing the corresponding integer
exploiting the parallel operation of the neurons, thereby achieving high value In by 2n−1 . Thus, (2) illustrates the complementary code for
speed of information processing [9]. Some very large scale integration real numbers
algorithms achieve efficient implementation by using a combination
of AND gates, OR gates, and threshold gates (TGs) [10]. This method In
n−2
Rn = = −bn−1 + 2−n+1+i · bi . (2)

2n−1
i=0
Manuscript received June 15, 2008; revised September 14, 2009. First
The analog neuron model is transformed, in two steps, into an
published October 6, 2009; current version published April 14, 2010. appropriate digital model. At each stage, the input weights and the
A. Dinu is with Goodrich Corporation, Birmingham, B28 8LN, U.K. (e-mail: threshold levels of the initial NN are altered carefully, keeping the
andrei.dinu@goodrich.com). neuron functionality. This can be achieved by keeping constant the sign
M. N. Cirstea and S. E. Cirstea are with Anglia Ruskin Uni-
of the argument of the activation function
versity, Cambridge, CB1 1PT, U.K. (e-mail: marcian@ieee.org; silvia.
cirstea@anglia.ac.uk). m
Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.
sign wi · xi − t = sign(net − t) = constant. (3)
Digital Object Identifier 10.1109/TIE.2009.2033097 i=1
0278-0046/$26.00 © 2010 IEEE

1846 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 57, NO. 5, MAY 2010
where xj is an analog input value of the initial neuron. This meets

the condition expressed by (4). Thus, the codification style based on
complementary code has been introduced, and the required parameter
modifications have been performed, without changing the neuron’s
behavior.
Conversion Stage Two: The second conversion stage aims to re-
place the neurons with negative weights resulting from the first stage
with equivalent ones, having only positive weights, by using only the
(2) (1)
module of their values: wijp = |wijp |. This means that supplementary
Fig. 1. Neuron model before/after stage-one conversion. parameter alterations are required in order to counteract the neuron
behavior alteration caused by changing the sign of some input weights.
A simple solution is to reverse the value of the affected input bits. The
However, for mathematical simplicity, a more restrictive condition
modification can be implemented into hardware with NOT logic gates.
is used instead: Argument “net − t” of the activation function is kept (2)
constant itself rather than only its sign The relationship between the input bits xijp and those at stage one
(1)
(xijp ) is

m
wi · xi − t = net − t = constant. (4) (1) (1)

(2) xijp , if wijp > 0
i=1 xijp = (1) (1) (9)
1 − xijp , if wijp < 0.
Conversion Stage One: The first step transforms the analog inputs
of the neurons into digital inputs expressed as groups of nb bits. This These two alternatives can be compressed into
process is associated with transforming each analog neuron input into

(1)
an equivalent group of nb binary inputs. The task is achieved by 1 − sign wijp

(2) (1) (1)
xijp = + sign wijp · xijp . (10)
splitting each input defined by its initial weight wij into nb subin- 2
puts, whose weights wijp (p = 0, 1, . . . , nb − 1) are calculated as
The transfer function argument is calculated as [11]
follows [11]:
⎧
m n
b −1
⎪ (1) 2p+1
⎨ wijp = 2n b
· wij ∀ p < nb − 1 (2)
neti −
(2)
ti =
(1)
wijp · xijp
(1)
(1)
wij(nb −1) = −wij (5) j=1 p=0
⎪
⎩ (1) ⎛ (1) ⎞
ti = ti . (1)

m n
b −1 wijp − wijp
+ ⎝ ⎠ − t(2)
i . (11)
The superscripts “(1)” and “(2)” refer to the respective conversion 2
stage. The initial “m” inputs are turned into “m” input clusters, j=1 p=0
each containing “nb ” subinputs (Fig. 1). The symbol “wij ” stands
The arguments of the activation function before and after the second
for the weight number “j” of the neuron “i” in the network, while
(1) conversion stage have to be equal
“wijp ” represents the weight of subinput “p” in cluster “j” pertaining ⎛ (1) (1)
⎞
to neuron “i.” According to the previous considerations, only those
m n
b −1
m n
b −1 wijp − wijp
neuron parameter changes that maintain the argument “neti − ti ” of (1)
wijp · xijp +
(1) ⎝ ⎠ − t(2)
i
the activation function constant are allowed. The argument after the 2
j=1 p=0 j=1 p=0
first conversion stage is calculated as

m n
b −1
(1) (1) (1)
(1)
neti −
(1)
ti = wijp · xijp − ti . (12)
j=1 p=0

m n
b −1
(1) (1) (1)
= wijp · xjp − ti Therefore, the threshold level of the stage-two neurons is

j=1 p=0
m nb −1 w (1) − w (1)
ijp ijp

m
(1)

nb −2
2p+1 (1) (1) ti
(2)
= ti
(1)
+ . (13)
= −wij · xjp + wij · · xjp − ti (6) 2
2nb j=1 p=0
j=1 p=0
The stage-one neuron parameters in (13) depend on the initial
(1)
where xjp (p = 0, 1, 2, . . . , nb − 1) are bits of the complementary parameters of the analog neuron as described by (5). Consequently,
code received by each new neuron input. The equation is substituting (5) in (13)
(1)
neti − ti
(1)
(2)

m
|wij | + wij
ti = ti +
2

m
(1)
nb −2
(1) (1)
j=1
= wij · −xj(nb −1) + 2−nb +p+1 · xjp −ti . (7)
m n
b −2 2p+1 2p+1
2n b
· |wij | − 2n b
· wij
j=1 p=0 + . (14)
2
j=1 p=0
The expression in brackets relates to the complementary code
definition given in (2). Then, (6) becomes This expression can be successively transformed [11] into
(1) (1)
m
(1)
m
(2)

m
m
neti −ti = wij · xj −ti = wij · xj −ti = neti −ti (8) ti = ti + (1 − 2−nb ) · |wij | + 2−nb · wij . (15)
j=1 j=1 j=1 j=1
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 57, NO. 5, MAY 2010 1847
Fig. 2. Digital mathematical model to gate conversion.
D. Automated Implementation Method

Therefore, the neuron parameters after stage two can be calculated
as a function of initial analog neuron parameters The algorithm was automated using C++ programs that generate
⎧ (2) 2p+1
a netlist description of the circuit, optimize it, and then generate
⎨ wijp = 2n b
· |wij | , p = 0, 1, 2, . . . , nb − 1 the VHDL code. In terms of the software, there is no limitation of
(2)
m
m
(16) the ANN size. The characteristics of the ANN are introduced in the
⎩ ti = ti + (1 − 2−nb ) · |wij | + 2−nb · wij .
j=1 j=1
C++ program as a matrix text file (.csv format). A feedforward ANN
with three subnetworks generating the pulsewidth modulation (PWM)
switching pattern for an inverter was designed [9] using this method
(Fig. 3).
B. Binary Neuron Implementation and Optimization
The ANN implementation into a hardware structure is done sepa- 1) Angle analyzes the argument of current difference vector.
(2) 2) Position analyzes the argument and value of the voltage.
rately for each neuron and requires, at first, that the input weights wijp
are sorted in descending order, in an array with A = m · nb elements: 3) Control Signals generates three PWM binary outputs.
w1s , w2s , w3s , . . . , wA
s
, where “m” is the number of initial analog neuron
inputs and “nb ” is the number of bits for each input binary code. In contrast with training algorithms, constructive ones determine
The weights correspond to the input binary signals: xs1 , xs2 , . . . xsA . An both the network architecture and the neuron weights and are guar-
iterative conversion procedure is used to analyze the input weights and anteed to converge in finite time. The numerical values of all neuron
to generate the logic-gate implementation netlist description. At each weights and thresholds were calculated [11] using a geometric con-
step, a larger neuron is split into subneurons. Some of them can be structive solution known as Voronoi diagrams [7]. For this paper, the
implemented with only a few AND and OR logic gates, while the rest complex plane is divided into triangular Voronoi cells. The master
are further decomposed into simpler subneurons until all have been program allows user control over main parameters: 1) number of
implemented. Several important concepts and definitions are presented Voronoi cells; 2) number of sectors dividing the 360◦ interval for
in [12], along with the step-by-step iterative implementation proce- argument analysis; 3) number of bits used to code the components of
dure, which ends by adding inverters to those inputs corresponding to the two complex inputs; and 4) maximum fan-in for the VHDL logic-
the initial negative weights at stage one of neural model digitization. gate model. The desired performance/complexity ratio is adopted. In
The hardware-implementation netlist obtained has redundancies both this case, 5 b to code each component of the two complex inputs gives
inside each neuron and across different neurons. Most are eliminated enough precision (delays less than 100 ns), resulting in a total number
using a simple procedure: The file is repeatedly analyzed, and when of logic gates of 1329 on 14 + 6 = 20 layers [10], which fits Xilinx
the same type of logic gates are found, of same input signals, all but XC4010XL FPGA.
one are removed from the netlist, and interconnections are updated; the When the number of inputs and bits on each input is low (precision
cycle ends when no gates can be removed. appropriate for drives), this method is more effective than a classical
digital circuit design implemented in FPGA. For a high number of
bits/controller inputs, the NN approach can be less effective than a
C. Neuron Implementation Example classical circuit. The explanation is that, in the NN approach, the
The sample in Fig. 2 shows a neuron with 12 input weights and complexity of the resulting circuit raises exponentially with these
a positive threshold level. The weights are sorted in descending order, numbers, whereas in a traditional approach, the complexity increases
and a recursive implementation starts. The first three weights are larger quadratically. The case study presented in this paper was implemented
than the threshold; therefore, inputs 4, 7, and 1 will drive an OR gate as part of an induction-motor controller in a 10 000-gate-equivalent
along with the subneurons built using the other subgroups [11]. FPGA, as opposed to a classical digital vector control circuit, for
1848 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 57, NO. 5, MAY 2010
Fig. 3. ANN structure and test bench for operation speed testing.
design. This method is most efficient for a low number of inputs/bits

on each input; otherwise, a classical circuit may be preferred.
R EFERENCES
[1] NEuroNet Roadmap, “Future prospects for neural networks,” European
Network of Excellence in Neural Networks2001. [Online]. Available:
http://cordis.europa.eu/ist/ka3/iaf/links.htm
[2] B. M. Wilamowski, N. J. Cotton, O. Kaynak, and G. Dundar, “Comput-
ing gradient vector and Jacobian matrix in arbitrarily connected neural
networks,” IEEE Trans. Ind. Electron., vol. 55, no. 10, pp. 3784–3790,
Oct. 2008.
[3] “Special section: ‘Neural networks for robotics’,” IEEE Trans. Ind.
Fig. 4. Timing simulation using Xilinx software. Electron., vol. 44, no. 6, Dec. 1997.
[4] “Special section: ‘Fusion of neural nets, fuzzy systems and genetic
controlling the same motor, which was commissioned in our research algorithms in industrial applications’,” IEEE Trans. Ind. Electron., vol. 46,
group, using 99% of a 40 000-gate-equivalent FPGA [12]. no. 6, 1999.
[5] J. J. Rodriguez-Andina, M. J. Moure, and M. D. Valdes, “Features, design
tools and application domains of FPGAs,” IEEE Trans. Ind. Electron.,
III. S IMULATION AND V ERIFICATION vol. 54, no. 4, pp. 1810–1823, Aug. 2007.
[6] E. Monmasson and M. N. Cirstea, “FPGA design methodology for in-
The ANN operation speed was tested by designing a VHDL test dustrial control systems—A review,” IEEE Trans. Ind. Electron., vol. 54,
bench (Fig. 3). Input patterns are generated by a 20-b counter and a no. 4, pp. 1824–1842, Aug. 2007.
pseudorandom sequence block. A simulation waveform is shown in [7] L. Vachhani and K. Sridharan, “Hardware efficient prediction-
correction-based generalized Voronoi diagram construction and FPGA
Fig. 4, illustrating delay readings of 39.5 and 80.5 ns.
implementation,” IEEE Trans. Ind. Electron., vol. 55, no. 4, pp. 1558–
Generally, oscilloscope measurements taken on the XS40 board, 1569, Apr. 2008.
containing a Xilinx XC4010XL FPGA, indicate delays not exceeding [8] D. Zhang and H. Li, “A stochastic-based FPGA controller
100 ns. Thus, the propagation time is less than 1.5 clock cycles, which for an induction motor drive with integrated neural network
demonstrates the advantage of higher operating speeds comparing with algorithms,” IEEE Trans. Ind. Electron., vol. 55, no. 2, pp. 551–561,
Feb. 2008.
other digital circuits [13]. [9] M. N. Cirstea, A. Dinu, J. Khor, and M. McCormick, Neural and Fuzzy
Logic Control of Drives and Power Systems. Oxford, U.K.: Elsevier,
IV. C ONCLUSION 2002.
[10] A. Dinu and M. N. Cirstea, “A digital neural network FPGA direct
A new digital hardware-implementation strategy for feedforward hardware implementation algorithm,” in Proc. ISIE, Vigo, Spain,
ANNs with step activation functions has been reported. The novel pp. 2307–2312.
[11] A. Dinu, “FPGA neural controller for three phase sensorless induction
algorithm treats each neuron as a special case of Boolean functions motor drive systems,” Ph.D. dissertation, De Montfort Univ., Leicester,
with properties that can be exploited to achieve compact implemen- U.K., 2000.
tation. This is accomplished by means of reusable VHDL code that [12] A. Aounis, M. McCormick, and M. N. Cirstea, “A novel approach to
can be easily translated into an FPGA implementation, using suitable induction motor controller design and implementation,” in Proc. IEEE
PCC, Osaka, Japan, Apr. 2002, pp. 993–998.
electronic-design-automation software. [13] B. K. Bose, “Neural network applications in power electronics and motor
The VHDL programs bridge the gap between the facilities offered drives—An introduction and perspective,” IEEE Trans. Ind. Electron.,
by simulation software and software packages specialized in hardware vol. 54, no. 1, pp. 14–33, Feb. 2007.

Letters: Direct Neural-Network Hardware-Implementation Algorithm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Letters: Direct Neural-Network Hardware-Implementation Algorithm

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 57, NO.

5, MAY 2010 1845

Rn = = −bn−1 + 2−n+1+i · bi . (2)

0278-0046/$26.00 © 2010 IEEE

where xj is an analog input value of the initial neuron. This meets

wi · xi − t = net − t = constant. (4) (1) (1)

Fig. 2. Digital mathematical model to gate conversion.

D. Automated Implementation Method

design. This method is most efficient for a low number of inputs/bits

You might also like