You are on page 1of 6

ADAPTABLE NONLINEARITY FOR COMPLEX MAXIMIZATION OF NONGAUSSIANITY AND A FIXED-POINT ALGORITHM

Mike Novey and Tiilay Adali

University of Maryland Baltimore County, Baltimore, MD 21250 {mnoveyl, adali@umbc.edu}


ABSTRACT
of c-FastICA, but unlike c-FastICA which assumes circular sources, develop the algorithm specifically for noncircularity. Second, to adapt the nonlinearity, on-line, to the source distributions in an efficient manner. To this end, we first derive a fixed-point algorithm in the CMN framework based on the approximate Newton method proposed in [4] for c-FastICA. Our second goal is to adapt the nonlinearity to the joint distribution of each source, or more specifically, we want the nonlinearity used for source i to be the differential entropy E -log(p(si))}, where p(si) is the joint distribution of the real and imaginary parts of the ith source. To maintain efficient adaptability, we assume that the distribution of each source can be modeled by the exponential power family of distributions [7] that can take on both sub and super gaussian densities by means of the shape parameter. We show that the log of this distribution (entropy) fits naturally into the CMN framework. Adaptability is implemented by estimating the shape parameter by means of a novel ML type estimator and applying to the nonlinearity on-line. The paper is organized as follows. First some relevant information concerning complex random variables, complex ICA, and the CMN algorithm are reviewed. Then the fixedpoint algorithm is derived followed by the adaptive algorithm description. The performance of the algorithm is demonstrated with simulations followed by concluding remarks.

Complex maximization of nongaussianity (CMN) has been shown to provide reliable separation of both circular and noncircular sources using a class of complex functions in the nonlinearity. In this paper, we derive a fixed-point algorithm for blind separation of noncircular sources using CMN. We also introduce the adaptive CMN (A-CMN) algorithm that provides significant performance improvement by adapting the nonlinearity to the source distribution. The ability of A-CMN to adapt to a wide range of source statistics is demonstrated by simulation results.
1. INTRODUCTION
Independent component analysis (ICA) for separating complex valued signals is needed in many applications such as magnetic resonance imaging, radar, and wireless communications (see [1] for references). Depending on the application, the sources may come from both subgaussian and supergaussian distributions, and specifically in the complex domain, can have circular and noncircular distributions. For complex-valued ICA algorithms that use nonlinearities to implicitly generate higher-order statistics such as complex infomax, nonlinear decorrelation [2], maximum likelihood (ML) [2, 3], and negentropy [4, 5], the optimal choice of nonlinearity is based on the source distribution either through the likelihood function or the entropy. In the case of complex-valued sources this distribution is the joint distribution of the real and imaginary parts. Therefore if the source distribution is unknown, one must perform some on-line density estimation to attain optimal performance. In the case of complex FastICA (c-FastICA) [4] the user picks a nonlinearity based on a hypothetical distribution which is then applied to all sources. An ML approach using adaptive score functions is given in [3] for the complex case whereby the nonlinearity is modeled as the sum of basis functions with the mixing coefficients estimated from the data. A complex-valued fixed-point algorithm using kurtosis is developed in [6] where the fourth-moment properties of the real and imaginary parts are used. Our goal in this study is twofold. First, to provide an algorithm for complex ICA that retains the convergence properties

2. COMPLEX ICA BY MAXIMIZATION OF NONGAUSSIANITY 2.1. Complex ICA


A complex variable z is defined in terms of two real variables zR and zI as z = zR + jzl. Statistics of a complex random vector x = xR + ix' is defined by the joint probability density function (pdf) p, (xR, x') provided that it exists. The expectation of a complex random vector x is then given with respect to this pdf and is written as E{x} = E{xR} + jE{x'}. The covariance matrix is written as cov(x) = E{xE{x}}E{x -E{x}}H where H denotes Hermitian transform and the pseudo-covariance matrix is defined as pcov(x) = E{x -E{x}}E{x -E{x}}T where T denotes the transpose. These two quantities together define a complex ran-

1-4244-0657-9/06/$20.00 2006 IEEE.

79

dom vector, and the random vector is second-order circular if pcov(x) = 0. A stronger definition of circularity is based on the pdf of the complex random variable such that for any a, the pdf of z and ei%z are the same [8]. In ICA, the observed data z are typically expressed as a linear combination of latent variables such that z = As where s = [Sl, SN]T is the column vector of latent sources, z = [Zl. . Z I' is the column vector of observed mixtures, and matrix A is the N x N mixing matrix. ICA then identifies the statistically independent sources given the observed mixtures typically by estimating a matrix W so that the source estimates become s = Wz. We will assume without loss of generality that the sources have zero mean and unit variance, i.e., E{sksk*} = 1.

where H is the complex Hessian, V* is the conjugate gradient and complex vectors, denoted with a tilde, are of the form ze C2N =[Z1,z1*,.. ,ZN, ZN]T. The Newton update to the Lagrangian (4) is written as Aw
=

-(Hj + AI) -1(V* + Awn)


-V*J+ Hjwn

and upon expanding we obtain

(Hi + AI)wn+1
where A*w = w derived in [10] to be
-

(6)

w. The conjugate gradient, V* , was

G*(y)g(y)xl
V*J= E
G(y)g (y)x1

2.2. CMN Algorithm


The CMN contrast function is written as

(7)

J(w)

E {G(wHx)}

(1)

where G is defined in (1), g is the derivative of G, and y wHx. The complex Hessian, derived in [10], is given by {

where G: C - C. The CMN algorithm requires a preliminary sphering or whitening transform V, resulting in
x
=

HJ= E

(x )
X

xlxlg(y)g (y)

H~~ F 2

G(y)g'* (y) (~~~) .

xlxig(y)g (y)
x'

X1G* (y)g'(y)

...

...
...

Vz

VAs

As

where E{xxH} = I. We estimate each source, k, separately by finding a vector w such that
H H -A ~k=Wk X =WkA-s:
q

NXig(y)g,(Y) XNg(y)g'(Y) (A ) * G (y) g* (Y) XNX 1 g(Y)g* (Y)

G *(y) '(y)

Hs qk

(2)

where g' is the derivative of g. We write HJ the sum of two matrices, HJ= HJ i ,i~ ~ ~~xxgy) y + H where

I
.
...

where qk = [O, ... . qk, 0,... ]T. Constraining the source estimates such that E{sksk } = 1, also constrains the weights to Iw 2 = 1 due to the whitening transform. The optimal weight is then (3) wopt = arg max E{ G(wHx) 2} where we maximize the cost under the unit norm constraint.

HaJ=E
and

xlx*g(y)g*(y) 0

1 *xig(y)g* (Y)

XIX

*g(Y)g* (y)
o

...

Hb H'=E{ [ x1x1CG(y)g'*(y) F
to rewrite the product HJWv as

xxlG* (y)g'(y)
0

0 *XX2G(y)g'* (y)

3. FIXED-POINT ALGORITHM
We now derive a fixed point algorithm for the constrained optimization problem outlined in (3). Our starting point is a Newtons method based on the Lagrangian function

HJW = HJW+ H'W.

(8)

Expanding (8) and simplifying by retaining the non-conjugated values or odd-numbered rows results in
E {XxHg(y)g*(y) }w + E {xxTG* (y)g'(y) } w* (9) with Hjw c CN. Substituting (9) into (6) and making use of the approximation, E{xxf (x)} E{xx}E{f(x)}, and the whiteness of x we obtain

L(w, A)

J(w) + A(w*w -1)

(4)

Hjw

where A is the real-valued Lagrange multiplier and J, the cost, is defined in (1). We make use of the complex gradient and Hessian derived in [9] where the Newton update is defined as

Kw"'l

Aw

-H-lt*

Hf**OT J

2f0

-1

Of

= -E{G*(y)g(y)x} + E{g(y)g (y)}wn + E{xxT}E{G* (y)g'(y)}(wn)* (10)

**

where K =(Hi + AI). At the convergence point K becomes real-valued, shown later, and can be removed from (10) due

80

to the subsequent normalization of w to unit norm. Our fixedpoint update now becomes
wn+l

E{xxT}E{G* (y)g'(y)}(w')*.

-E{G* (y)g(y)x} + E{g(y)g* (y)}w' +

(11)

that contains the information in the pseudo-covariance matrix through the indefinite matrix Hb. How this modified nonlinearity affects the convergence properties can be seen by evaluating the complex Hessian of (15) at the optimal solution,
qo
=

[eJi,0, .. .]T, as
HJM
=

For (11) to be a fixed-point iteration, however, we need to show that at the optimal solution

Ha+Hb_ Hb

Kw

aw

E{g(e-jOsj)g*(e jOsj)}I

where a is a real-valued scalar that will not change the fixedpoints because of normalization of w. We use the orthogonal change of coordinates q = AHw and without loss of generality assume a solution at q0 = [eJ0, 0, ... .]T. Using the same results from (9) we expand the first element of vector Kqo as

{Kqo}l

ejo (E{ g 2} + A) + e-io (E{sl2}G*gl) ejo {E{ g 2} + A + e-2jO (E{sl2}G*gf) } eio {A + e-2j0B} (12)

which is diagonal due to the independence, zero mean, and unit variance properties of the sources. A diagonal Hessian ensures that the optimization landscape is positive definite at the optimal solution with very desiirable convergence properties due to a unity condition number. This result shows us that the fixed-point update we present in this paper implicitly modifies the nonlinearity to match the source distribution through the quadratic term seen in (15), which also explains why we observe improved performance with noncircular sources using a circular symmetric cost function.

where the argument to the functions g and g' is (e-ios1). Noting that A is real-valued we concentrate on the last term e- 20B. For this study, we focus on polynomial functions of the form G(z) = zK where K is real-valued. Substituting zK into the above equation, we obtain

4. ADAPTABLE CMN ALGORITHM (A-CMN)


The CMN algorithm defined in Section 2.2 employs complex analytic functions as the nonlinearity to measure negentropy or nongaussianity. As shown in [1], for whitened sources, maximizing negentropy is equivalent to minimizing the differential entropy defined as -E{log p(s)}. One hopes that the choice of nonlinearity provides a reasonable approximation to the differential entropy of the source that is being estimated. Hence the optimal nonlinearity for complex sources, in an information-theoretic framework, is the log of the joint probability density function, p(SR, S'), of the source being estimated. We introduce the simple function G(z) = zK/2K, K C R, to derive the adaptable CMN algorithm which after substituting into equation (1) results in a nonlinearity of the form

e2
= =

-2jE {s2(ejOs*)K (K2 -K)(e-josl) K-2}

(slK) (K2 K) |E{s} 2


(K2
-

K)E {SK

which is real-valued showing that at the optimal solution the phase of qo is not changed by K, thus (11) is a fixed-point iteration. The fixed-point derivation is novel because it is valid for any analytic G and takes into account the noncircular nature of the sources through E{xxT} when optimizing the Lagrangian. To see this, let us rewrite the fixed-point algorithm (6) using (8) as

J(Z)

ZK

ZK

(Hj + AI)>Vn+l

Vn QW3_ {V*J- H}b

(13)

eK log(z)eK log(z ) /2K eK log z + arg(z)eK log z I-j arg(z) /2K

where Q = E{g(y)g* (y) }. Noting that Q, a real-valued scalar, and (Hi + AI) do not change the fixed-points, we simplify (13) to Vn+1 = Vn_ ,U{VJ -HJb }V (14)
We can interpret (14) as a gradient descent algorithm with a learning rate of ,u and a modified nonlinearity

z2K /2K.

(16)

The motivation for this class of functions is clearly seen as the modulus of the sources are well fitted by the symmetric exponential power family (epf) of distributions introduced by Lunetta [11] as

py(y)

x2ppplypr(l + l/p)

P (

pp

(17)

JM (Wv)

J (W)- 2 W H

JWv

(15)

found as the antiderivative of the second term in (14). Note that the modified nonlinearity, JM, contains a quadratic term

which is parameterized by the scale parameter o-p > 0 and shape parameter p > 0 with y assumed zero mean. By changing the shape parameter the epf describes supergaussian distributions with p < 2, subgaussian distributions with p > 2,

81

and Gaussian with p tor becomes


PY(Y

2. The distribution of the source vec-

2A4r

(18) where we assume y is independent and identically distributed and of size N. The nonlinearity in (16) therefore describes the differential entropy of this distribution if we let p = 2K and assume the sources are circular. To adapt the nonlinearity in (16) to the source distribution, we must estimate the shape parameter, p, from the data. Closed form estimates for the shape parameter do not exist so various numerical methods have been proposed, namely maximum likelihood and method of moments (see [12] for references). Our method of estimating the shape parameter p, and thus K, is based on the fact that the source estimates in each step of the CMN algorithm have unit variance from the whitening step. The variance of (17) is found to be
Joo

K2opplpr(i + 1/p)

exp

ZppY

)
u

E
as

1.4

4.5 Run numlber

Fig. 1. Performance of p estimate using equation (21). The straight line with circles is the true p value with ten results of estimates of p superimposed from different realizations of epf variates. 1. 2. 3. 4. 5. 6. 7. Whiten the data to give x = Vz Initialize w and K Calculate current source estimate y Update wfll using equation (11) Estimate K using equation (21) Normalize w <- w/ w If not converged, go to step 3

E{y2}

y2
1

yPA

wHx

o( PF/Pr(I + l/p) pxpp Up F(3/p)o2p2/pp

dy
(1
9)

To use equation (19) we must use an estimate for o-p since it is not known. The maximum likelihood estimate for o-p is
(E(N
yP ) l/p

Table 1. On-line adaptive fixed-point algorithm (A-CMN) for estimating one source.

(20)
Our proposed algorithm, A-CMN, consists of the following two steps using the nonlinearity defined in (16): first, estimate K using (21) and second, use the nonlinearity in the fixed-point algorithm (11). Table 1 enumerates the steps we use in our adaptive fixedpoint algorithm outlined in the previous section for estimating one source. Estimating more then one source requires orthogonalization after each source estimate [1] stemming from the initial whitening of the data. Our implementation uses a Gram-Schmidt procedure after estimating each source resulting in a deflationary orthogonalization scheme.
5. SIMULATIONS We verify the performance of the adaptive fixed-point algorithm presented in this paper, A-CMN, using an aggregate of both complex-valued subgaussian and supergaussian sources with both circular and noncircular distributions. We compare the performance of A-CMN with a a fixed-point nonadaptive CMN algorithm using J(z) = Iz14 as the cost (kurtosis), JADE [13], and c-FastICA [4] using the cost function G(y) = log(.1 + y). The complex-valued mixing matrix A was generated using normally distributed real and imaginary

which is derived by setting the derivative of (18) equal to zero and solving for oup. The estimate of K is found by substituting (20) into (19), setting equal to one due to the unit variance constraint, and taking the log of both sides and finding the roots by Newtons method. The results of these operations provide the update rule for estimating p as
pn+l

pn+l /2

pn2 pn

(0logE{y2})

log E{y2 }
(21)

where E{y2} is defined in (19). Taking advantage of the unit variance constraint provides an estimator that is less susceptible to outliers as an estimate based on higher-order moments and is easier to compute then the numerical MLE since only the first derivative is required. Figure 1 depicts the performance of equation (21) by showing the true p value, straight line with circles, superimposed with the p estimate for ten runs with different realizations of epf variates. Each estimate was based on a sample size of 2000. As seen in the figure, accurate estimates from p = .5 to p = 2 are possible although with a small bias.
82

parts with variance equal to one. Eight sources were used in this study where the real and imaginary parts of each source use the same symmetric distribution as outlined in Table 2. Noncircular sources are generated with different values of real to imaginary asymmetry defined by
E

{(s R) 2}

E {(sI )2}

Source 1. 2. 3. 4. 5. 6. 7. 8.

Distribution Binomial Gamma Poisson Hypergeometric Lognormal Exponential Uniform Geometric

Kurtosis -0.93 3.34 -0.53 -0.2 448.98 3.34 -0.267 14.93

and then by providing a random phase shift to produce real to imaginary correlation. All simulations are averaged over 50 runs with the A matrix and sources realized on each run. The performance of the algorithms are measured using the signal to interference ration (SIR) index, quantifying the distance of the obtained permutation matrix P WA from the optimum, defined as

Table 2. Source types used in simulations.

Algorithm 10 log(SIR)
A-CMN CMN JADE c-FastICA
-4.1 -3.34 -3.2 -3.7

SIR(P)

1N L
N
=

:1
L=

[ N

1 Pk

maXk Pik

where the lower the SIR value the better the separation with zero defining perfect separation. Figure 2 shows the average SIR for the four algorithms as a function of sample size. The sources are outlined in Table 2 where half are made noncircular by setting the asymmetry, , randomly between 1 and 1000. As seen in the figure, A-CMN outperforms the other algorithms by more then 3db indicating that adapting the nonlinearity to the modulus does improve performance especially when confronted with sources having different distributions. This is also directly seen by comparing A-CMN with CMN which is the same algorithm but with a non-adaptive nonlinearity. Figure 3 shows the average SIR as a function asymmetry, i, using the sources in Table 2. The number of samples used in this test is 500. This figure indicates how the algorithms perform as the sources become more noncircular. Again ACMN outperforms the others by a substantial margin, but one can also conclude that the fixed-point algorithm is correctly compensating for the noncircular nature of the sources by both A-CMN and CMN having flat responses. We now test the algorithms using subgaussian sources only. Here eight sources are used consisting of complex sinusoids and chirp signals, simulating a narrow-band application. Figure 4 shows the average SIR for the four algorithms as a function of sample size. Here the results show A-CMN and CMN performing the same indicating that the adaptive algorithm is using the same nonlinearity as CMN, J(z) = Iz14, which is kurtosis. Our last simulation uses supergaussian sources with very high kurtosis values generated from the 2-dimensional Fast Fourier Transform (fft) data of images. The complex-valued data, besides being supergaussian with kurtosis values > 10,000, also exhibits noncircularity with , > 100. Four 128 x 128 black and white images were used with the fft data put into a

Table 3. Results of using sources containing supergaussian image data.

vector of size 16384 and applied to the ICA model. The results of the simulation are outlined in Table 3 where the SIR index is shown for each algorithm. Again the A-CMN was able to adapt to the supergaussian and noncircular nature of the sources.

6. CONCLUSIONS
CMN algorithm using gradient descent updates has been shown to provide reliable separation of both circular and noncircular sources. In this paper, we enhanced the performance of the CMN algorithm by (i) deriving a fixed-point algorithm for general contrasts that does not assume circularity, and (ii) adapting the nonlinearity, on-line, to the distribution of the source's modulus. We found using simulations that the fixed-point algorithm alone is capable of separating noncircular sources and that the performance is invariant to the amount of asymmetry in the source distribution. This result follows from the fixed-point algorithm implicitly modifying the nonlinearity by adding a quadratic term containing the information in the pseudo-covariance matrix. We then added adaptability to the algorithm by adjusting the nonlinearity to match the distribution of the modulus, where we used the exponential power family of distributions as a model. We found that adding adaptability to the fixed-point algorithm improved performance over the non-adaptive algorithm when confronted with a wide range of distributions. We thus show that the combination of adaptability based on the modulus and a fixedpoint algorithm provides a powerful approach to the complex ICA problem.

83

v-

0-2V CD1

0)

250

Fig. 2. Average SIR as a function of data length for a mixture of eight sources, four subgaussian, four super gaussian with 1/2 of each noncircular as outlined in Table 2.

,89
Lt V)
0)
I

-23

10

Fig. 3. Average SIR as a function of the asymmetry of the real to imaginary parts, of the eight sources outlined in Table 2.
,,

:!,j_.
4

-C
-

-2C-AstlCA Ll<I
>: <+>

--CMN W JADE

A-cMN
,
_J_. _

-13
-

-JADE

.~ ~ ~ ~ ~ f

w
0 -21

,_1

CastlCA

.'..
-29
250

.....

..........................

Nuumber of samples

625

Numnber of samples

625

1000

Fig. 4. Average SIR as a function of data length for a mixture of eight subgaussian sources consisting of complex sinusoids.

........

......... ........l i....-..:..::. ~..... lP


....

.f

bt

Jg i llIelI gA - CMN1s

[5] M. Novey and T. Adali, "ICA by maximization of nongaussianity using complex functions," in Proc. MLSP, Mystic, CT, 2004.
.,

-. .-.

~~ ~....

...

....

_,

-::~~~~K
:~~~~~..

11 4

[6] S. Douglas, Jan Eriksson, and Visa Koivunen, "Fixedpoint complex ICA algorithms for the blind separation of sources using their real or imaginary components," in 6th International Conference on ICA and BSS, Charleston S.C., 2006.
[7] A. Mineo and M. Ruggieri, "A software tool for the exponential power distribution: The normalp package," in Journal of Statistical Software, 2005, vol. 12, pp. 124.

it

var(real)/var(irnag)

[8] B. Picinbono, "On circularity," IEEE Trans. Signal Processing, vol. 42, pp. 3473-3482, Dec. 1994.
[9] A. van den Bos, "Complex gradient and Hessian," in IEE Proc. Image Signal Processing, Dec. 1994, vol. 141, pp. 380-382. [10] M. Novey and T. Adali, "Stability analysis of complexvalued nonlinearities for maximization of nongaussianity," in Proc. ICASSP, Toulouse, France, May 2006.

7. REFERENCES

[1] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley, 2001. [2] T. Adali, T. Kim, and V. Calhoun, "Independent component analysis by complex nonlinearities," in Proc. ICASSP, Montreal, Canada, May 2004, vol. 5, pp. 525528. [3] J.-F. Cardoso and T. Adali, "The maximum likelihood approach to complex ICA," in Proc. ICASSP, Toulouse, France, May 2006. [4] E. Bingham and A. Hyvarinen, "A fast fixed-point algorithm for independent component analysis of complex valued signals," Int. J. Neural Systems, vol. 10, pp. 1-8, 2000.
84

[11] G. Lunetta, "Di una generalizzazione dello schema della curva normale," in Annali della Facolta di Economia e Commercio delliUniversita, 1963, vol. 17, pp. 237-244.
[12] M. Bethge, "Factorial coding of natural images: How effective are linear models in removing higher-order dependencies," in J. Opt. Soc., 2005.

[13] J.-F. Cardoso and A. Souloumiac, "Blind beamforming for non-gaussian signals," in IEE Proc. Radar Signal Proc., 1993, vol. 140, pp. 362-370.

You might also like