You are on page 1of 5

2012 IEEE Wireless Communications and Networking Conference: PHY and Fundamentals

A Parallel Processing Algorithm for


Schnorr-Euchner Sphere Decoder
Han-Wen Liang1,2 , Wei-Ho Chung2, , Hongke Zhang3 , Sy-Yen Kuo1,3
Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan1
Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan2
College of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, China3
E-mail:*whc@citi.sinica.edu.tw
AbstractThis paper presents a category of detection schemes
for Multiple-Input Multiple-Output (MIMO) system called Parallel Sphere Decoder (PSD). Compared to the conventional depthfirst sphere decoder with Schnorr-Euchner enumeration (SESD), the proposed PSD algorithms use parallel computations
and achieve approximately 50% searching time reductions under the same amount of computations. Namely, in hardware
implementation, the proposed work provides trade-off between
computational time and computing units. Simulations of the
proposed algorithms in 4 4 16-QAM and 3 3 64-QAM MIMO
systems show the searching time reductions of the proposed
algorithms while maintaining ML performances.

I. I NTRODUCTION
The Multiple-Input Multiple-Output (MIMO) system has
attracted tremendous research interests since it was proposed
[1]. The MIMO system is novel in its capability to exploit
the spacial diversity to obtain the increased channel capacity
without demanding more spectrum. The receiver of MIMO
system observes a linear superposition of transmitted symbols
from each transmitting antenna. How to detect the coupled
symbols is an interesting issue and is widely investigated
[2]. The optimal detection of the transmitted symbols can
be viewed as a discrete closest point search problem [3].
The complexity of the optimal detection algorithm grows
exponentially with the number of transmitting antennas. This
optimal solution can be achieved by sphere decoding algorithm
with reduced complexity.
The sphere decoding algorithm imposes a preprocessing
called QR decomposition to convert MIMO detection to a tree
search problem. According to the search rules in the decision
tree, the sphere decoding algorithms can be categorized into
three types.

Depth-First: The typical depth-first algorithm firstly


searches the nodes in the tree vertically (from top layer to
bottom layer) for the best solution. A top-to-bottom path
is searched in one search cycle, then depth-first algorithm
moves horizontally and searches the other top-to-bottom

Acknowledgment: This research was supported by National Science Council, Taiwan under Grant NSC 99-2221-E-002-106-MY3 and NSC 100-2221-E001-004, National Natural Science Foundation of China (NSFC) under Grant
60833002, and 111 Project under Grant B08002.

978-1-4673-0437-5/12/$31.00 2012 IEEE

path till the best path is obtained [4]. The typical depthfirst algorithm generates an optimal detection. However,
it suffers form long and uncertain terminational time.
Breadth-First: In the same layer of the decision tree, the
best K nodes are kept since which are considered as the
candidates of resulting the best path. The breadth-first
algorithm searches the best path in the kept nodes. As
the K is as large as the number of nodes over a layer,
this algorithm achieves optimal detection. However, the
K is usually chosen to be small to reduce complexity
[5]. This algorithm is preferable in terms of hardware
implementation since it has fixed complexity and memory
usage.
Best-First: The tree search rule of best-first algorithm is
to search the neighboring nodes around the current path
[6]. Thus, this algorithm terminates earlier than the first
algorithm with high probability while keeping the optimal
performance. However, the essential disadvantage of this
algorithm is the need of large memories.

In this paper, we propose three algorithms to attain optimal


detection with reduced searching time compared to the depthfirst algorithm. The proposed algorithms are based on a better
complex-to-real conversion, which benefits to allow parallel
computations in tree search for sphere decoding. The main
contribution of this paper is to combine such complex-toreal conversion and the Schnorr-Euchner (SE) algorithm. This
paper applies the proposed technique in depth-first sphere
decoder, which is called sphere decoder (SD) for simplicity
in the succeeding paragraphs, to demonstrate the reduction of
time. However, the proposed technique can be applied in other
tree search rules as well such as the breadth-first or hybrid tree
search rules [5]. In general, other tree search rules sacrifice
the performance to reduce complexity. We apply the proposed
technique in SD to show that the optimality is not sacrificed
when the proposed technique is applied.
The rest of this paper is organized as follows: Sec. II
models the MIMO system and defines the notations, where the
related work is also described. The preprocessing of proposed
parallel sphere decoders (PSD) is introduced in Sec. III. The
following subsections III-A, III-B and III-C give the details of

613

Root

proposed three algorithms, respectively. The Sec. IV shows


the simulation results of the proposed work compared to
conventional SD. Finally, conclusion of this paper is given
in Sec. V.

layer Nt

II. S YSTEM M ODEL AND P REVIOUS W ORK

layer 1

Leafnodes

We consider Nt transmitting antennas and Nr receiving


antennas in the MIMO system
y=

Hx + n,

(1)

Fig. 1.

metric in decision tree is defined as

where x = [x1 x2 . . . xNt ] is the transmitted symbol vector.


The H is a Nr Nt channel matrix with elements hji , which
are assumed to be independent identically distributed (i.i.d)
zero-mean complex Gaussian random variables and each has
T
unit variance. The n = [n1 n2 . . . nNr ] is the additive white
S
Gaussian noise (AWGN) and n C Nr . The = NEt N
is
0
the signal to noise ratio (SNR) where Es denotes the energy
per transmitted symbol and N0 denotes the noise power. The
T
y = [y1 y2 . . . yNr ] is the received symbol vector. The
transmitted symbol is assumed to be a M 2 -ary symbol.
The optimal detection metric for the received symbol y is
x
= arg min ||y
xNt

Hx||2 ,

(2)

where denotes the set of M 2 -ary constellation points and


|Nt | = M 2Nt . The x
represents the detected symbol vector.
The complexity of searching by (2) grows exponentially
with the number of transmitting antennas. To achieve optimal
detection, the SD is adopted with reduced complexity. The SD
includes two steps:
1) Preprocessing Applying QR decomposition to H,
where Q is an orthonormal matrix (QH Q = I). The
H represents conjugate transpose. The R is an upper
triangular matrix.
2) Tree Search The (2) is transformed to
x
= arg min ||
y
xNt

Rx||2 ,

(3)

where y = QH y. The R converts the exhaustive search in (2)


into a tree search as in Fig. 1. In the decision tree, each node
in layer k represents an element in k . The branch metric in
decision tree is defined as
Tb (xNt , xNt 1 , , xk ) ,

yk

Nt
X
l=k

rk,l xl

!2

(4)

where yk is the k-th element in y and rij is the element in


R. The SD algorithm searches from the root of the decision
tree to the leafnodes. Before searching layer k, SD algorithm
computes Tb (xNt , xNt 1 , , xk ) for all nodes in layer k,
and sorts these nodes by their Tb . The order is used to give
searching priority of the nodes over layer k. The cumulative

Decision tree while SD is used to achieve ML solution.

Tc (xNt , xNt 1 , , xk ) ,Tb (xNt )


+Tb (xNt , xNt 1 )
..
.
+Tb (xNt , xNt 1 , , xk ).

(5)

The SD algorithm firstly makes an initial decision of xNt in


layer Nt , then it moves node-by-node and arrives at a leafnode.
The Tc of the leafnode is kept as an initial radius. If SD
algorithm visits another leafnode which has smallest Tc over
visited leafnodes, the radius is replaced by the new Tc . While
SD algorithm visits a node whose Tc exceeds the radius, this
algorithm will not visit the nodes extended from this node.
More details of SD can be found in [3].
The received complex-value symbol vector is often converted to real-value for the feasibility of hardware implementation of SD [7], [5]. One of the usual conversion used in [7],
[5] and many other works is expressed as

(H) (H)
(x)
(n)
(y)
+
,
=
(H) (H)
(x)
(n)
(y)
(6)
where ( . ) and ( . ) represent the real and imaginary part of
( . ), respectively. However, the other form of complex-to-real
conversion [8] is proposed as

(h11 ) (h11 ) . . .
(y1 )
(x1 )
(y1 ) (h11 ) (h11 ) . . . (x1 )

=
..
..
..
..
.
.
.
.

(n1 )

+ (n1 ) .
(7)
..
.

The conversion using (7) leads to an upper triangular matrix


with regular zero elements, which will be detailed latter. For
simplicity, the real-value matrices after applying conversion
(7) are denoted as

yr = Hr xr + nr .
(8)
The author of [8] adopted another preprocessing rather than
QR decomposition to generate an upper triangular matrix. The
preprocessing decomposes Hr to the product of an upper
triangular matrix and a non-unitary matrix. However, the
non-unitary matrix colors the noise, which complicates the
detection problem.

614

III. P ROPOSED A LGORITHM FOR PSD

the

..
.
(10)
where zeros regularly appear above even diagonal elements (proved in [9]). This property is the same as the
preprocessing result of [8], although QR decomposition
is not adopted in [8].
The preprocessing applies for the proposed PSD. According
to different tree search rules, three versions of PSD are
proposed.
B.
A. The first PSD
If (10) is used for SD, we observe that the Tb in 2k-th
layer is independent with the Tb in (2k 1)-th layer for
k = 1, 2, , Nt . In the decision tree, we call layer 2k and
layer 2k 1 as layer pair k. Layer 2k, layer 2k 1 and layer
pair k are denoted as Lke , Lko and Lk respectively. Since the
Tb for Lke and for Lko are independent, we can compute the
two Tb s simultaneously. This arrangement enables the parallel
processing in SD. With parallel processing, the computational
complexity is maintained at the same level but the computational time is reduced to approximately half of the original
SD. The simulation for the amount of increased computational
complexity and the amount of reduced time will be given latter.
The algorithm for tree search of the first proposed PSD is
given as below:
1) The PSD searches from the root to leafnodes. It visits
two nodes simultaneously; one node is in Lko and the
other is in Lke . The two nodes are defined as a node
pair. The PSD moves from node pair to node pair. If
the visited node pair includes a leafnode, the PSD will
compare the new Tc with radius and replace the radius

m3
m2
m1

m3
m2
m1

n1 n2 n3 ...

...
4 2

n1 n2 n3 ...

(nt mt )

1 5
3

...

where h . i denotes the inner product. As a result,


real-value upper triangular matrix is

he1 , h1 i
0
he1 , h3 i he1 , h4 i

0
he
,
h
i
he
2
2
2 , h3 i he2 , h4 i

0
0
he
0
3 , h3 i
Rp =

0
0
0
he
,
4 h4 i

..
..
.
.

(9)

...

l=1

for k = 2, , 2Nt ,

...

The preprocessing for proposed PSD is divided into two


steps:
1) Complex-to-real Conversion PSD converts the received
complex-value symbol vector and channel matrix to realvalue yr and Hr by using (7).
2) QR Decomposition Applying QR decomposition to
Hr such that Qr Rp = Hr , where Qr is a real-value
orthonormal matrix, Rp is a real-value upper triangular
matrix. Let Hr = [h1 , h2 , , h2Nt ]. The QR decomposition processing can be expressed as
u1
u1 = h1 ,
e1 =
,
||u1 ||
k1
X
uk
,
uk = hk
hel , hk iel ,
ek =
||uk ||

...

with the smaller one. If the PSD visits a node pair whose
Tc exceeds the radius, the PSD will not visit the node
pairs extended from this node pair, and will visit the next
priority or go back to the previous layer pair to visit next
priority. If the PSD visits an unvisited layer pair Lk , the
visiting priority of the node pairs in Lk is detailed in
the following steps. The PSD algorithm terminates until
the minimum Tc in decision tree is derived.
2) When the PSD moves to an unvisited layer pair Lk , it
computes the Tb s for Lke and for Lko simultaneously.
3) Sort the nodes in Lke and in Lko with their Tb s in
ascending order, respectively; denote the sorted nodes
in Lke as n1 , , nM and the sorted nodes in Lko as
m1 , , mM .
4) The visiting priority of node pairs in Lk is given in Fig.
2(a), i.e. (n1 , m1 ), (n1 , m2 ), , (n1 , mM ), (n2 , m1 ),
, (n2 , mM ), , (nM , mM ). If Tc of a node pair
(np , mk ) exceeds the sphere radius, the node pairs
(ni , mj : i p, j k) are removed from visiting
list.

(a) The visiting pri- (b) The visiting pri- (c) The priority of
ority of (ni , mj ) in ority of (ni , mj ) in additive node pairs
the first PSD.
the second PSD.
(nu , mv ) for comparison.
Fig. 2.

Visiting priorities.

The second PSD

The SE algorithm implies that giving higher visiting priority


to the node with less Tb can reduce searching time of SD. The
first PSD adopts this algorithm. The second PSD extends this
algorithm to the layer pair; it gives higher visiting priority to
the node pair with less summation of corresponding Tb s. The
summation of the two Tb s is not needed; instead, the statistical
order of the summation of Tb s is used.
A simple example of the second PSD is shown in Fig.
3, where the nodes in Lke and in Lko are sorted and labeled
for k = Nt and M = 4. The node pairs with the first two
visiting priority are (n1 , m1 ) and (n1 , m2 ); in contrast to the
first PSD which gives the next visiting priority to (n1 , m3 ), the
second PSD gives the next visiting priority to (n2 , m1 ) whose
summation of Tb s is statistically smaller.
The algorithm for the tree search of the second PSD is given
as below:
1) The first three steps are kept the same as in the first
PSD.
2) The visiting priority of node pairs in Lk is given in Fig.
2(b), i.e. (n1 , m1 ),(n1 , m2 ),(n2 , m1 ), . If Tc of a node
pair (np , mk ) exceeds the sphere radius, the node pairs

615

n2

m1

Fig. 3.

4) This step is the same as the second step in the second


PSD.
5) When the third PSD moves horizontally, it compares
an additive node pairs Tc with sphere radius. Denote
the sorted node pair which is to be compared additively
as (nu , mv ). The comparing priority of (nu , mv ) is
determined by the two-phase rule:
a) Denote (na , ma ) as the node pair in visiting list
satisfying that Tb of na is the largest over Tb of ni
and Tb of ma is the largest over Tb of mj . Further,
denote (nb , mb ) as the node pair in the visiting list
satisfying the Tb of nb is the least over Tb of ni
and Tb of mb is the least over Tb of mj . The values
(u, v) of (nu , mv ) in this phase is determined by

n1

m3

m2 m1

The node pairs with different visiting priorities in PSD and PSD2.

(ni , mj : i p, j k) are removed from visiting


list.

a+b
,
(11)
2
where . is a ceiling function. If Tc of (nu , mv )
exceeds sphere radius, remove the node pairs
(ni , mj : i u, j v) from visiting list and replace (na , ma ) with (nu1 , mv1 ) in next additive
comparison; if Tc of (nu , mv ) does not exceeds
sphere radius, replace (nb , mb ) with (nu , mv ) in
next additive comparison. Repeat this process until
a = b, then this phase ends and enters next phase.
b) Let a = b = t, the comparison priority of next
node pair (nu , mv ) in this phase is determined by
the rule shown in Fig. 2(c). If Tc of the node pair
(np , mk ) exceeds sphere radius, remove the node
pairs (ni , mj : i p, j k) from visiting list.
u=v=

C. The third PSD


In the decision tree, there are two searching directions for
PSD including vertical search and horizontal search. When
PSD moves from an upper layer pair to an unvisited layer
pair, it is called vertical search. The PSD has to compute
the Tb s for all nodes in the unvisited layer pair in vertical
search, it is obvious that the computational complexity is high
in vertical search. When PSD visits a node pair whose Tc
exceeds sphere radius, the PSD will visit the node pair over
the same layer pair with next visiting priority; it is called
horizontal search. The PSD only has to compare the Tc of the
visited node pair with sphere radius to decide next searching
direction. The computational complexity of horizontal search
is low since the Tb s of visited node pair are already computed;
there are only addition and comparison in horizontal search.
Thus, the third PSD is proposed to add the horizontal search.
The design aims to reduce the terminational time in PSD. The
original processing in the horizontal search is to visit a node
pair according to visiting priority and compare the Tc of this
node pair with sphere radius; in the additive processing, we
compare one more node pairs Tc with sphere radius. If the Tc
exceeds sphere radius, the PSD will not visit the node pairs
extended from this node pair; if the Tc is less than sphere
radius, the PSD keeps this node pair and will visit this node
pair by the visiting priority described in the second PSD. The
algorithm for tree search of the third PSD is given below:
1) The first two steps are kept the same as in the first PSD.
2) The unvisited layer pair is denoted as Lk , and the
previous layer pair is denoted as Lk+1 . Compare the
summation of Tc of Lk+1 and Tb of Lko with sphere radius. Denote the node whose corresponding summation
exceeds sphere radius as np . Also compare the Tc of Lke
with sphere radius. Denote the node whose Tc exceeds
sphere radius as mk . The node pairs {(ni , mj ) : i p}
and {(ni , mj ) : j k} and their corresponding
extended node pairs are removed from visiting list.
3) This step is the same as the third step in the first PSD
except that the nodes out of visiting list are not needed
to be sorted.

IV. S IMULATION R ESULT


To compare the terminational time of SD, the average
visited node pairs are adopted as the measure. A visited node
pair is the nodes whose Tb s can be computed simultaneously.
In the conventional SD, we can only compute Tb of a node
at a time. Thus, a node pair represents merely a node in
conventional SD. In our simulations, the memory is used
to record the computed Tb s and computed priority to avoid
repeated computation. The conventional SD is converted to
real-value system by (6) where the parallel algorithm can not
be applied.
The average visited node pairs in different SNR of the
proposed PSD and conventional SD are shown in Fig. 4 and
Fig. 5 for 4 4 16-QAM MIMO and 3 3 64-QAM MIMO
respectively. The simulation result shows that terminational
time is greatly reduced in proposed PSD as compared to
conventional SD.
To measure the computational complexity by the average
number of real-value multiplications, we assume multiplier
dominates the computational complexity and ignore addition
and sorting processing. The simulations of computational
complexity are shown in Fig. 6, for 4 4 16-QAM MIMO
and in Fig. 7 for 3 3 64-QAM MIMO.
These results imply the computational complexity of proposed PSD are slightly higher than conventional SD; however,

616

2500

250
SESD
PSD
PSD2
PSD3

2000
average number of multiplications

average visited node pairs

200

150

100

50

SESD
PSD
PSD2
PSD3

1500

1000

500

Fig. 4.

6
SNR (dB)

10

12

The average visited node pairs in 4 4, 16-QAM system

Fig. 6.

6
SNR (dB)

10

12

The computational complexity in 4 4, 16-QAM system

6000
450
SESD
PSD
PSD2
PSD3

350
average visited node pairs

SESD
PSD
PSD2
PSD3

5000
average number of multiplications

400

300
250
200
150
100

4000

3000

2000

1000

50
0
0

Fig. 5.

6
SNR (dB)

10

12

Fig. 7.

The average visited node pairs in 3 3, 64-QAM system

6
SNR (dB)

10

12

The computational complexity in 3 3, 64-QAM system

R EFERENCES
the difference is not obvious. It is noticeable that the computational complexity simulation result of the second PSD
is the same as the third PSD. In fact, the third PSD has
more additions and comparisons than the second PSD; but
both additions and comparisons are ignored for computational
complexity.

V. C ONCLUSION
In this paper we proposed three PSDs, which have about the
same computational complexity but about only half terminational time compared to the conventional SD. The simulation
results show that the reduced terminational time is significant,
particularly in low SNR. Under the same computational complexity, the proposed PSD uses two computing units simultaneously to reduce terminational time for SD. This property
is not achievable in conventional SD since the conventional
SD is not suitable for parallel processing. In the second and
third proposed PSDs, the priority of visited node pairs may be
optimized, which are the possible future directions.

[1] G. Foschini, Layered space-time architecture for wireless communication


in a fading environment when using multi-element antennas, Bell labs
technical journal, vol. 1, no. 2, pp. 4159, 1996.
[2] E. G. Larsson, Mimo detection methods: How they work [lecture notes],
IEEE Signal Process. Mag., vol. 26, no. 3, pp. 9195, May 2009.
[3] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, Closest point search in
lattices, IEEE Trans. Information Theory, vol. 48, no. 8, pp. 22012214,
Aug. 2002.
[4] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and
H. Bolcskei, VLSI implementation of mimo detection using the sphere
decoding algorithm, IEEE J. Solid-State Circuits, vol. 40, no. 7, pp.
15661577, July 2005.
[5] Z. Guo and P. Nilsson, Algorithm and implementation of the k-best
sphere decoding for mimo detection, IEEE J. Sel. Areas Commun.,
vol. 24, no. 3, pp. 491503, Mar. 2006.
[6] C.-H. Liao, T.-P. Wang, and T.-D. Chiueh, A 74.8 mw soft-output
detector ic for 8 8 spatial-multiplexing mimo communications, IEEE
J. Solid-State Circuits, vol. 45, no. 2, pp. 411421, Feb. 2010.
[7] G. Zhan and P. Nilsson, Reduced complexity schnorr-euchner decoding
algorithms for mimo systems, IEEE Commun. Lett., vol. 8, no. 5, pp.
286288, May 2004.
[8] M. Siti and M. P. Fitz, A novel soft-output layered orthogonal lattice
detector for multiple antenna communications, in Proc. IEEE Int. Conf.
Commun. 2006 (ICC06), vol. 4, 2006, pp. 16861691.
[9] L. Azzam and E. Ayanoglu, Reduced complexity sphere decoding via a
reordered lattice representation, IEEE Trans. Commun., vol. 57, no. 9,
pp. 2564 2569, Sept. 2009.

617

You might also like