Professional Documents
Culture Documents
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-11: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 39, NO. 5 , MAY 1992
Express Letters
Complex Domain Backpropagation
George M. Georgiou and Cris Koutsougeras
Abstract-The well-known backpropagation algorithm is extended to
complex domain backpropagation (CDBP) which can be used to train
neural networks for which the inputs, weights, activation functions, and
outputs are complex-valued. Previous derivations of CDBP were necessarily admitting activation functions that have singularities, which is
highly undesirable. Here CDBP is derived so that it accommodates
classes of suitable activation functions. One such function is found and
the circuit implementation of the corresponding neuron is given. CDBP
hardware circuits can be used to process sinusoidal signals all at the
same frequency (phasors).
I. INTRODUCTION
The complex domain backpropagation (CDBP) algorithm provides a means for training feed-forward networks for which the
inputs, weights, activation functions, and outputs are complex-valued. CDBP, being nonlinear in nature, is more powerful than the
complex LMS algorithm [l] in the same way that usual (real)
backpropagation [2] is more powerful than the (real) LMS algorithm
[3]. Thus, CDBP can replace the complex LMS algorithm in
applications such as filtering in the frequency domain [4].
In addition, CDBP can be used to design networks in hardware
that process sinusoidal inputs all having the same frequency
(phasors). Such signals are commonly represented by complex
values. The complex weights of the neural network represent
impedance as opposed to resistance in real backpropagation networks. The desired outputs are either sinusoids at the same frequency as the inputs or, after further processing, binary.
By decomposing the complex numbers that represent the inputs
and the desired outputs into real and imaginary parts, the usual
(real) backpropagation algorithm [2] can be used to train a conventional neural network which performs, in theory, the same
input-output mapping without involving complex numbers. However, such networks, when implemented in hardware, cannot accept
the sinusoidal inputs (phasors) and give the desired outputs.
In [5] and [7] complex domain backpropagation algorithms were
independently derived. The derivation was single hidden layer
feed-forward network in [SI and for a multilayer network in [6].
Both derivations are based on the assumption that the (complex)
derivative f( z) = d f ( z ) / dz of the activation function f(z) exists.
Here we will show that if the derivative f ( z ) exists for all ZE@,
the complex plane, then f(z) is not an appropriate activation
function. We will derive CDBP for multilayer feed-forward networks imposing less stringent conditions on f(z) so that more
appropxiate activation functions can be used. One such function is
found and its circuit implementation is given.
II. COMPLEXDOMAINBACWROPAGATION
m.
Ek =
dk -
ok
where d k is the desired output and ok the actual output of the kth
output neuron. The overbar signifies complex conjugate. It should
be noted that the error E is a real scalar function. The output oj of
neuron j in the network is
oj = f ( z j ) = U J + i v J ,
z j = xi
+ iyj = I =
WjlXjl (2)
(3)
In order to use the chain rule to find the gradient of the error
function E with respect to the real part of Wjl, we have to observe
the variable dependencies: The real function E is a function of both
u(xj, y j ) and v(xj, y j ) , and x j and y j are both functions of
WjlR (and WjlI).Thus, the gradient of the error function with
respect to the real part of Wjl can be written as
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-U: ANALOG AND DIGITAL SIGNAL PROCESSING. VOL. 39, NO. 5, MAY 1992
= -SjR(u{XilR
aE
def
aE
--
aw,,,
331
auk a x ,
auk a y ,
a x , auJ
a y , auJ
--+--
a d a x j +--)
a d ayj
- -aE -a d ax, awi,I a y j aw,,,
SkR(
wkjR
+ '$
wkjI)
+CSkI(UxkwkjR+U$WkjI)
= -SjR(U{(
-ICjlr)
- Sir( v i ( - X j / I )
+ .$ICilR)
(12)
where the index k runs through the neurons that receive input from
neuron j. In a similar manner we can compute SiI:
+ UjXj/R)
(5)
aE
Combining (4) and (S), we can write the gradient of the error
function E with respect to the complex weight Wjl as
def aE
aE
Vw,,E= - i -
=
=
-zi/((u{
+ iu$)SiR + ( v i + i u i ) S j r ) .
SkR(u$(-
(6)
+c6kI(vxk(-
+ iu$)SiR + ( v i + iv$)Sjr)
+ u$WkjR)
wkjI)
wkjI)
+ v$WkjR)
(13)
si:
(7)
6, = SiR
where CY is the learning rate, a real positive constant. A momentum
term [2] may be added in the above learning equation. When the
weight Wjl belongs to an output neuron, then Si, and Si/ in (6)
have the values
or more compactly:
When weight Wjl belongs to a hidden neuron, i.e., when the output
of the neuron is fed to other neurons in subsequent layers, in order
to compute Si, or equivalently S j R and Si/, we have to use the chain
rule. Let the index k indicate a neuron that receives input from
neuron j. Then the net input z , to neuron k is
where the index I runs through the neurons from which neuron k
receives input. Thus, we have the following partial derivatives:
+ iSjI =
k
+ iuf)SkR
Wkj((u;
III. THEACTTVATION
FUNCTION
In the next section the important properties that a suitable activation function f(z) must possess will be discussed. For the purposes
of this section, we identify two properties that a suitable f ( z )
should possess: i) f(z ) is bounded (see next section), and ii) f(z ) is
such that Si # (0,O)and Xi/ # (0,O)imply Vw.,E # (0,O).Noncompliance with the latter condition is undesirahle since it would
imply that even in the presence of both non-zero input ( X j l # (0,O))
and non-zero error (Si # (0,O))it is still possible that A Wi/ = 0 ,
i.e., no learning takes place.
The deviations of CDBP in [5] and [6] are based on the assumption that the (complex) derivative f ' ( z ) of the activation function
exists without giving a specific domain. If the domain is taken to be
@, then such functions are called entire.
Definition 1: A function f(z) is called analytic at a point zo if
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-11: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 39, NO. 5, MAY 1992
332
-Fj1Sjf(zj)
(15)
e.
-+
-+
vi
Vwil E = - X j l ( ( U: - iui) S j R
=
+ (U: + i u i ) S j r )
- FjISjf(Zj).
(17)
0
0
IV. A SUKABLEACTIVATIONFUNCTION
where c and r are real positive constants. This function has the
property of mapping a point z = x
iy = (x, y ) on the complex
plane to a unique point f(z) = ( x / ( c
( l / r )I z I), y / ( c
( l / r )I z I)) on the open disc { z : I z I < r } . The phase angle of z
is the same as that of f ( z ) . The magnitude I z I is monotonically
squashed to I f(z)1, a point in the interval [0, r ) ; in an analogous
way, the real sigmoid and hyperbolic tangent functions map a point
x on the real line to a unique point on a finite interval. Parameter c
controls the steepness of I f(z) I. The partial derivatives u x , U,,, v x ,
and uy are
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-11: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 39, NO. 5 , MAY 1992
333
TABLE I
Input Pattern
(and Desired Output)
1.8
1.6
1.4
i.
0
wY
1.2
1
0.8
0.6
0.4
0.2
0
0
50
1 0 0 1 5 0 2 0 0 2 5 0 300 3 5 0 4 0 0 4 5 0 5 0 0
Epochs
R1
zJ /U
+ I zJ I ), where zJ =
r=l
x,,w,,.
if
IzI = O
e.
V. CONCLUSION
CDBP is a generalization of the usual (real) backpropagation to
handle complex-valued inputs, weights, activation functions, and
outputs. Unlike previous attempts, the CDBP algorithm was derived
so that it can accommodate suitable activation functions, which were
discussed. A simple suitable activation function was found and the
circuit implementation of the corresponding complex neuron was
given. The circuit can be used to process sinusoids at the same
frequency (phasors).
REFERENCES
[l] B. Widrow, J. McCool, and M. Ball, The complex LMS algorithm,
Proc. IEEE, vol. 63, pp. 719-720, 1975.
[2] D. E. Rumelhart, G . E. Hinton, and R. J . Williams, Learning
internal representations by error propagation, in Parallel Distributed Processing: Volume 1: Foundations, D. E. Rumelhart and
J. L. McClelland, Fds. Cambridge: MIT Press, 1986, pp. 318-362.
[3] B. Widrow and M. Hoff, Adaptive switching circuits, in 1960 IRE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-U: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 39, NO. 5 , MAY 1992
APPLICATIONS.