Machine Learning-Based Antenna Selection in Wireless Communications

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2594776, IEEE
Communications Letters
IEEE COMMUNICATIONS LETTERS, VOL. 0, NO. 0, MONTH 2016 1
Machine Learning-based Antenna Selection in Wireless Communications

Jingon Joung
Abstract—This study is the first attempt to conflate a ma- antennas, respectively. The Tx transfers nd independent data
chine learning technique with wireless communications. Through streams, x ∈ Cnd ×1 , through ns selected antenna(s) to the Rx,
interpreting the antenna selection (AS) in wireless communica- where nd ≤ min{nr , ns } for reliable performance with linear
tions (i.e. an optimization-driven decision (ODD)) to multiclass-
classification learning (i.e. data-driven prediction (DDP)), and processing. Then, the Rx estimates x̃ ∈ Cnd ×1 as follows1 :
through comparing the learning-based AS using k-nearest neigh- x̃ = Wr HWt x + Wr n, where Wr ∈ Cnd ×nr and Wt ∈
bors (k-NN) and support vector machine (SVM) algorithms Cns ×nd are pre- and post-coding matrices, respectively; H ∈
with conventional optimization-driven AS methods in terms of Cnr ×ns is a partial MIMO channel matrix whose ns columns
communications performance, computational complexity, and are selected (i.e. AS) from the nt columns of an original full
feedback overhead, we provide insight into the potential of fusion
of machine learning and wireless communications. MIMO channel matrix Hfull ∈ Cnr ×nt ; and n is the additive
white Gaussian noise (AWGN) at the Rx.
Index Terms—Machine learning, multiclass classification, k- Next, a set of a selected antenna index vector, sn ∈ Rns ×1 ,
NN, SVM, data-driven prediction (DDP), optimization-driven
decision (ODD), antenna selection, MIMO. is defined as S = {s1 , . . . , sS }, where the elements of sn
represent the indices of the selected antennas and S is the
I. I NTRODUCTION number of selection candidates. The optimal AS son in terms
Over the past few decades, advanced analytics techniques, of BER is obtained through solving the optimization problem
such as data analytics, data mining, and machine learning, have below [4]:
attracted the attention of analysts, data scientists, researchers, son = max svnd [Hfull ]sn , (1)
sn ∈S
and engineers in various fields in order to exploit very large
and diverse data sets. For example, recently in information where svnd (A) gives the nd th largest singular value of matrix
communications, a network utilizing big data was demon- A; and [A]sn gives a partial matrix that consists of the selected
strated [1], and networks and wireless communications that columns from A according to the selection indices in sn .
embrace big data have been studied [2], [3]. Then, the pre- and post-precoder matrices are obtained as
This study focuses particularly on the potential of machine Wr = U H and Wt = V , respectively, where U ∈ Cnr ×nd
learning techniques in wireless communications. Specifically, and V H ∈ Cnd ×ns are the left and right singular matrices
we apply multiclass classification, which is a primary task corresponding to the largest nd singular values of H.
in machine learning, into a multiple-input multiple-output Because son in (1) with the corresponding pre- and post-
(MIMO) system with transmit antenna selection (AS) [4]. coding maximizes the minimum of the effective channel gains,
We employ multiclass classification algorithms, i.e. multiclass it yields the optimal BER and is included as a performance
k-nearest neighbors (k-NN) and a support vector machine bound in the BER comparison.
(SVM) [5], to classify the training channel state information III. DDP: DATA -D RIVEN A NTENNA S ELECTION
(CSI) into one of the classes that represent the antenna set that
In order to select the optimal antenna indices among |S|,
provides the best communication performance. From training
where |A| is the cardinality of set A, instead of solving the
with sufficient channels (data), we obtain a classification
combinatorial optimization problem in (1), we consider super-
model and use it to classify a new channel and to accordingly
vised machine-learning algorithms. In particular, we employ
obtain the best antenna set, i.e. data-driven prediction (DDP).
a multiclass classification algorithm in order to classify CSI
We compare the learning-based antenna selection (AS)
into |S| classes, each of which represents the antenna set
systems with a conventional AS system that maximizes the
that provides the best communication performance. With a
minimum of either the eigenvalue or the norm of channels; i.e.
sufficient number of CSI samples, i.e. training data, we can
an optimization-driven decision (ODD). From the comparison,
obtain a classification model and predict a class of new CSI,
we discuss the systems’ advantages and disadvantages in terms
i.e. the best antenna set for a new channel. Here, we interpret
of bit-error-rate (BER), selection complexity, and feedback
a communications system as a learning system, as follows: (i)
overhead. This study provides insight into the fusion of
CSI as training sample (i.e. an instance or observation); (ii)
machine learning and wireless communications.
index n of sn ∈ S as a class label ℓ (i.e. membership); and
II. ODD: O PTIMIZATION -D RIVEN A NTENNA S ELECTION (iii) the set of index n as a set of classes, L (i.e. categories or
groups).
We consider a simple point-to-point communication with
We address this method as a DDP method, and henceforth
a transmitter (Tx) and a receiver (Rx) employing nt and nr
use the terms in communications and machine learning inter-
Manuscript received April 0, 2016. The associate editor coordinating the changeably and introduce two popular multiclass classification
review of this letter and approving it for publication was D. B. da Costa.
The author is with the School of Electrical and Electronics Engineering, 1 In this study, for the feasibility test, we assumed that the Tx and Rx know
Chung-Ang University, Seoul, Korea (e-mail: jgjoung@cau.ac.kr). the CSI. Note that the CSI is not typically required for a practical AS system
Digital Object Identifier S 00.0000/LCOMM.2016.00.000000 because the selection can be performed at the Rx.
1089-7798/10$25.00 © 2016 IEEE
1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2594776, IEEE
algorithms k-NN and SVM to select the antennas in the TABLE I

communications. E XAMPLE OF A M APPING TABLE BETWEEN L ABEL ℓ ∈ L AND A NTENNA
I NDICES sn ∈ S WHEN nt = 6, nr = 2, AND ns = 2.
ℓ = 1: s1 =[1,2] ℓ = 6: s6 =[2,3] ℓ = 11: s11 =[3,5]
A. Training Set Manipulation from Channels ℓ = 2: s2 =[1,3] ℓ = 7: s7 =[2,4] ℓ = 12: s12 =[3,6]
ℓ = 3: s3 =[1,4] ℓ = 8: s8 =[2,5] ℓ = 13: s13 =[4,5]
We perform three procedures (not necessarily in sequence): ℓ = 4: s4 =[1,5] ℓ = 9: s9 =[2,6] ℓ = 14: s14 =[4,6]
(i) design the training samples from the channels, (ii) design ℓ = 5: s5 =[1,6] ℓ = 10: s10 =[3,4] ℓ = 15: s15 =[5,6]
the key performance indicator (KPI), and (iii) declare the

S.6 Generate a class label vector c ∈ RM×1 by setting the
corresponding labels based on the KPI, i.e. labeling.
mth element cm of c by ℓ∗ , which gives the best KPI.
1) Training Set Generation: The training samples are the
S.7 Repeat S.5 and S.6 for CSI samples Hm , for all m.
input of a learning system and are known as input variables,
features, predictors, or attributes. In communications, M nr -
by-nt complex channel matrices, Hm , (or vectors) are used B. Build a Learning System
for the training. Because the training samples are a real-value We now have a manipulated, real-value matrix T ∈ RM×N
vector, the channels must be manipulated for N real-value and a corresponding class-label vector c = [c1 · · · cM ]T .
features, such as angle, magnitude, real and imaginary parts of Using the labeled training data set, i.e. T and c, we build
hij , where hij is the (i, j)th complex-value element of Hm . a learning system, i.e. a trained multiclass classifier, whose
Also, the training samples must be normalized, i.e. feature input is CSI and output is the index of the selected antenna
normalization, in order to avoid significant bias in the training. set. As |L| > 2 in AS systems, we employ |L|-class (i.e.
For example, the procedures used in this study are as follows: multiclass) classification algorithms, i.e. the k-NN and SVM
S.1 Generate the 1-by-N real-value feature vector dm from algorithms. For a simple description of the multiclass classi-
the training CSI matrix Hm . fication algorithms, we denote the mth row vector of T by
S.2 Repeat S.1 for all training CSI sets m ∈ {1, . . . , M }. tr [m].
S.3 Generate a training data matrix D ∈ RM×N by stacking 1) Multiclass k-NN Classification: Among M training
dm ’s as D = [dT1 · · · dTM ]T . samples, a k-NN classifier finds the k-nearest training samples
S.4 Normalize/scale D and generate a normalized feature {tr [m]} from a new observation tr , which is a query sample.
matrix T , such that the (i, j)th element of T is a nor- Then, the k-NN classifier declares the class/label ℓ∗ of tr
malized value of the (i, j)th element of D as tij = based on a majority (the highest representation or a weighted
(dij − Ei dij )/(maxi {dij } − mini {dij }). average) label among the labels, ℓm s, which belong to the
2) KPI Design: A KPI is designed to label the training k-nearest training samples from tr . Here, the ‘nearest’ is
samples. In general, a KPI can be defined as any metric used in defined based on the distance (or dissimilarity measure) among
communications, such as spectral efficiency, energy efficiency, the samples. In this study, we confine ourselves to the most
BER, the norm of an effective channel, the effective received popular measure of distance, i.e. the Euclidian distance defined
signal-to-noise ratio (SNR), received signal power, the max- as d(tr [m], tr ) = ktr [m]−tr k2 . We restrict k to an odd number
imum of the minimum eigenvalue of the effective channel, in order to avoid an ambiguous majority and to determine it
communications latency, and any combination thereof. In this in order to minimize misclassification error through exhaustive
work, we use BER as the KPI. searching from 1 to approximately M/3.
3) Labeling: From the interpretation of the AS and multi- 2) Multiclass SVM Classification: In order to classify mul-
class classification, it is clear that designing L is equivalent to tiple classes using an SVM, we employ |L| binary classifiers,
designing S. Suppose that there are ns combinations from a each of which identifies one category versus the other cat-
nt
set with nt antennas, i.e. |S| = ( ns ). From the communication- egories, i.e. a one-vs.-rest (or one-vs.-all) binary classifying
domain knowledge stating that less correlated antennas are approach. The detailed procedure is as follows.
more likely to be selected for better communication perfor- S.8 Define a sub-training data set {Tℓ }, such that tr [m] is
mance, we design S with the sets of less correlated antennas. located at a row of Tℓ if cm = ℓ, for all m ∈ {1, . . . , M }.
By doing this, the training samples are devoted as equally Similarly, we define {Tℓ } for all ℓ ∈ L. Then, we perform
as possible to each class label, which results in better training an SVM to classify the two training groups Tℓ and Tℓ
performance. Without this knowledge (or if it is known that the where Tℓ is a shrunk matrix, through eliminating the row
channels are uncorrelated), we design S with full combinations vectors of Tℓ from T .
T
of an antenna set. A mapping example between the labels ℓ ∈ L S.9 Generate a binary label vector bℓ = [bℓ [1] · · · bℓ [M ]] for
and sn ∈ S is presented in Table I, when nt = 6, nr = 2, and the ℓth binary classification, such that bℓ [m] = 1 if cm =
ns = 2. Here, we demonstrate the full sn set and highlight the ℓ, otherwise bℓ [m] = 0.
less correlated antennas in bold typeface when the correlation S.10 One(ℓ)-vs.-rest(ℓ) method: Solve an alternative logistic
factors ρt = 0.3 and ρr = 0.1 at Tx and Rx, respectively (refer regression problem with two training groups {Tℓ , Tℓ } and
to the precise correlation model in [6]). Then, we identify a the corresponding binary labels bℓ , as follows:
label of the training samples from the L. This procedure is XM
bℓ [m]g1 θℓT f (tr [m]) + (1 − bℓ [m])

called labeling and is summarized as follows: θℓ =min C
θℓ
S.5 Evaluate a KPI for the mth training CSI Hm with a m=1
× g0 θℓT f (tr [m]) + kθℓ k22 /2.

particular antenna set sn corresponding to label ℓ ∈ L. (2)
In (2), C is a penalty (or cost) parameter that balances the • ODD: MaxMinNorm. Maximize the minimum norm of
bias and over fitting (similar to a regularization factor in the channels such that si = max mink [Hfull ]si k2 .
si ∈S
regression); the cost function gk (·) is defined as gk (z) = •DDP: SVM. Multiclass SVM-based selection.
(−1)k z + 1 if (−1)k z ≥ −1, otherwise 0; θℓ ∈ RM×1 • DDP: k-NN. Multiclass k-NN-based selection.
is a learning parameter vector; and f (tr [m]) ∈ RM×1 • Random selection: Random.
is a Gaussian radial-based kernel function vector that im-
We generated 2 × 103 training CSI samples (i.e. M = 2 × 103)
proves the choice of features, and its qth element function
that have the Rayleigh distribution and considered the spatial
fq (tr [m]) gives a similarity score between tr [q] and tr [m]
correlation factors ρt and ρr for the transmit and receive
as fq (tr [m]) = exp(−ktr [q] − tr [m]k2 /(2σ 2 )). Here, the
antennas, respectively. For each channel realization, 200 16-
penalty parameter C and the variance σ of the Gaussian
quadrature amplitude modulation (QAM) symbols were trans-
radial-based kernel function are design parameters. The
mitted. We set the spatial correlation factors of the channels
Gaussian kernel is a commonly used kernel function in
as ρt = 0.3 and ρr = 0.1. For uncertain CSI, the uncertain
SVMs when the number of features N is small and the
channel power was set to 2% of the average channel power.
number of training sets M is intermediate [5]. Note that
In S.1, |hij |2 was used to construct di because it is effective
the case is typical of the AS system considered.
in learning the cost in (1) from our observations. The other
S.11 Repeat S.10 for all ℓ ∈ {1, . . . , |L|}.
communications and training-and-learning parameters tested
in the evaluation are as follows:
C. Antenna Selection based on a Multiclass Classifier
• Fig. 1(a): {nt , nr , ns , nd } = {8, 1, 1, 1}; N = 8; |L| = 8
Once all θℓ values are obtained, we can build an AS system with si ∈ {1, . . . , 8}; and 15 dB SNR per an Rx antenna.
using the learning function in (2). Upon a new input into a • Fig. 1(b): {nt , nr , ns , nd } = {6, 2, 2, 2}; N = 12; |L| =
channel matrix, we manipulate it to be an input of the learning 6 with highlighted ℓ in Table I; and 20 dB SNR per Rx
machine, i.e. tr , and provide it to a classifier built with {θℓ } in antenna.
order to predict the label of the class, i.e. the selected antenna • Note: Other configurations with possible ns for
index. Here, as the SVM is implemented using an |L|-binary {8(6), 1(2), ns, 1} and {6, 2, ns, 2} have the same trends
classifier, the SVM might predict multiple labels. In this case, as the results in Figs. 1(a) and 1(b), respectively. Thus,
we determine the optimal label ℓ∗ that provides the lowest the results are omitted in this letter.
cost, and an antenna pair corresponding to ℓ∗ will be selected
for communications. A. Classification Performance
In this study, in order to visualize the multiclass classifica-
D. Parameter Optimization tion performance, we illustrate the misclassification rate (error
The classification performance depends on the design of the rate) using a web representation [7] in Figs. 1(a) and 1(b). Each
parameters. For example, the performance of k-NN depends point of a regular polygon represents the misclassification
on the number of neighbors k, the distance function d(·), and denoted by ℓ → ℓ̄, where ∀ℓ̄ ∈ L and ℓ̄ 6= ℓ, and the
the weight of the distance, and the performance of the SVM corresponding misclassification rate is noted on the spoke.
primarily depends on the kernel functions. Furthermore, in When a Tx selects one antenna and transmits a single
communications, a learning system is built offline, and it is stream, the multiclass classification performs better compared
difficult to directly include the numerous practical parameters with the two-antenna selection for two streams. This is because
in the communication environment in the design. Moreover, the dimensions of the feature, N , increase from 8 to 12 with
the parameters also depend on the system configuration, such the limited training set. Furthermore, the magnitude of the
as nt , nr , ns , SNR, channel uncertainty level, and the spatial elements in a vector channel, i.e. di , is sufficient information
correlation factors. Hence, the design of the optimal parame- to capture the KPI, yet it is not sufficient for a matrix channel.
ters remains a subject for future study. In this study, we follow
a heuristic approach, which is typically used to identify better B. Communications Performance
parameters: for k-NN, we perform an exhaustive search for a In Fig. 1, we illustrate the empirical cumulative density
better k; for SVM, we check with the multiple random initial function (CDF) of the BER. Because we have finite training
parameters of σ 2 and C using iterative cross-validation with samples of channels in a large dimension, which is propor-
M/10 (10%) training samples. tional to N , and the samples of random channels would be
outliers with high probability, the k-NN classifier often suffers
IV. P ERFORMANCE E VALUATION WITH T EST C HANNELS from high variations of classification at the decision boundary
In this section, we evaluate the BER performance of as reported in [5], while an SVM performs well with the
the AS systems invoking machine learning-based AS and outliers and can be used in a linear or non-linear manner
optimization-based AS, and their performance is compared with the use of a kernel. As seen in the results, therefore, an
(Table II). Here, for comparison, we consider three bench- SVM classifier is more preferable for communications than
marking systems. The AS criteria/methods compared in the a k-NN classifier. Moreover, better classification yields better
simulation are listed and denoted as follows: communications performance.
• ODD: MaxMinEV. Maximize the minimum eigenvalue When a Tx selects one antenna to send a single stream, all
of the selected channels, (1). methods, with the exception of Random, achieve near-optimal
MaxMinEV MaxMinEV
0.9 0.9
SVM SVM
0.8 k-NN 0.8 k-NN
MaxMinNorm MaxMinNorm
0.7 0.7 Random
Random
estimated CDF
estimated CDF
0.6 1 → 1̄ 0.6 1 → 1̄
0.8 0.8
0.5 8 → 8̄ 2 → 2̄ 0.5 0.6
0.6 6 → 6̄ 2 → 2̄
0.4 0.4
0.4 0.2 0.4 0.2
7 → 7̄ 0 3 → 3̄ 0
0.3 0.3
SVM 5 → 5̄ 3 → 3̄
0.2 k-NN 0.2 SVM
6 → 6̄ 4 → 4̄ k-NN
4 → 4̄
0.1 5 → 5̄ 0.1
0 0
−4 −3 −2 −1 0 −4 −3 −2 −1 0
10 10 10 10 10 10 10 10 10 10
BER BER
(a) (b)
Fig. 1. Empirical CDF of BER. (a) Single-antenna selection for single-stream transmission: {nt , nr , ns , nd } = {8, 1, 1, 1}, N = 8, ρt = 0.3, ρr = 0.1, and
SNR is 15 dB. (b) Two-antenna selection for double-stream transmission: {nt , nr , ns , nd } = {6, 2, 2, 2}, N = 12, ρt = 0.3, ρr = 0.1, and SNR is 20 dB.
TABLE II
nt
C OMPARISON OF AS M ETHODS . |L| ≤ ns
= nt !/(nr !(nt − nr )!)
Optimization-driven decision (ODD) Data-driven prediction (DDP)

Selection method MaxMinEV MaxMinNorm Random SVM k-NN
BER performance Best Good Worst Second best Moderate
Feedback AS at Tx H: nr nt complex values H: nr nt complex values t: N (= nr nt ) real values
None
amount AS at Rx log2 |L| bits log2 |L| bits log2 |L| bits log2 |L| bits
Selection complexity O(|L|nr n2s + |L| log |L|) O(nr nt + |L|(ns − 1)nr + |L| log |L|) O(1) O(N 2 ) = O(n2r n2t ) O(N ) = O(nr nt )
performance as illustrated in Fig. 1(a). In contrast, there is perspective, leaving the training cost aside, verify the feasible
performance loss when multiple antennas are selected for a complexity and performance of learning-based antenna selec-
multi-stream transmission. Here, we note that the learning- tion. This study provides a reference for the use of various
based AS, i.e. the SVM, outperforms the others. machine-learning algorithms for wireless communications (i.e.
DDP for ODD), and it will accelerate their implementation in
C. Feedback Amount and Computational Complexity real-life communications. Interesting areas for further study
Depending on whether the AS decision is made at a Tx or an include the validation of a learning system with realistic
Rx, the required feedback amount varies. If an Rx determines channels and an online learning algorithm to track channels
the selection, all schemes except Random require log2 |L|-bit with time-varying statistics.
feedback to inform the label of si to a Tx. In contrast, if a R EFERENCES
Tx determines the selection, H, i.e. nr nt complex values, is
[1] M. Do and H. Son, “SK Telecom’s fast data platform: T-PANI and
required to be fed back for MaxMinEV and MaxMinNorm; APOLLO,” Korea Communication Review, vol. Q2, pp. 38–42, Apr. 2015.
however, the SVM and k-NN schemes only need t, which is [2] T. T. T. Nguyen and G. Armitage, “A survey of techniques for internet
an N -real value. By designing di in S.2, the learning-based traffic classification using machine learning,” IEEE Commun., Surveys &
Tutorials, vol. 10, no. 4, pp. 56–76, 4th Quarter 2008.
methods can reduce the feedback amount. [3] S. Bi, R. Zhang, Z. Ding, and S. Cui, “Wireless communications in the
The selection complexity is compared in Table II. The selec- era of big data,” IEEE Commun. Mag., vol. 53, pp. 190–199, Oct. 2015.
tion complexity is defined as prediction complexity excluding [4] R. W. Heath, S. Sandhu, and A. Paulraj, “Antenna selection for spatial
multiplexing systems with linear receivers,” IEEE Communications Let-
the training complexity as the training is performed before ters, vol. 5, no. 4, pp. 142–144, April 2001.
the communications offline. Here, it can be seen that the [5] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM-KNN: Discrimi-
selection complexity of the learning-based AS (i.e. DDP) is native nearest neighbor classification for visual category recognition,” in
Proc. IEEE Comput. Soc. Conf. Computer Vision and Pattern Recognition
polynomial, which is clearly lower than that of MaxMinEV (CVPR), New York, USA, Jun. 2006, pp. 2126 – 2136.
and MaxMinNorm (i.e. ODD) using a combinatorial search [6] J. P. Kermoal, L. Schumacher, K. I. Pedersen, P. E. Mogensen, and
across |L| ≤ nnst . F. Frederiksen, “A stochastic MIMO radio channel model with experi-
mental validation,” IEEE J. Sel. Areas Commun., vol. 20, pp. 1211–1226,
V. C ONCLUSION Aug. 2002.
[7] B. Diri and S. Albayrak, “Visualization and analysis of classifiers perfor-
In this letter, we applied a multiclass classification to an mance in multi-class medical data,” Elsevier Science, vol. 34, no. 1, pp.
antenna selection system. The results from a communications 628–634, Jan. 2008.

Machine Learning-Based Antenna Selection in Wireless Communications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning-Based Antenna Selection in Wireless Communications

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

IEEE COMMUNICATIONS LETTERS, VOL. 0, NO. 0, MONTH 2016 1

Machine Learning-based Antenna Selection in Wireless Communications

IEEE COMMUNICATIONS LETTERS, VOL. 0, NO. 0, MONTH 2016 2

algorithms k-NN and SVM to select the antennas in the TABLE I

the key performance indicator (KPI), and (iii) declare the

IEEE COMMUNICATIONS LETTERS, VOL. 0, NO. 0, MONTH 2016 3

IEEE COMMUNICATIONS LETTERS, VOL. 0, NO. 0, MONTH 2016 4

Optimization-driven decision (ODD) Data-driven prediction (DDP)

You might also like