Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2594776, IEEE
Communications Letters
Abstract—This study is the first attempt to conflate a ma- antennas, respectively. The Tx transfers nd independent data
chine learning technique with wireless communications. Through streams, x ∈ Cnd ×1 , through ns selected antenna(s) to the Rx,
interpreting the antenna selection (AS) in wireless communica- where nd ≤ min{nr , ns } for reliable performance with linear
tions (i.e. an optimization-driven decision (ODD)) to multiclass-
classification learning (i.e. data-driven prediction (DDP)), and processing. Then, the Rx estimates x̃ ∈ Cnd ×1 as follows1 :
through comparing the learning-based AS using k-nearest neigh- x̃ = Wr HWt x + Wr n, where Wr ∈ Cnd ×nr and Wt ∈
bors (k-NN) and support vector machine (SVM) algorithms Cns ×nd are pre- and post-coding matrices, respectively; H ∈
with conventional optimization-driven AS methods in terms of Cnr ×ns is a partial MIMO channel matrix whose ns columns
communications performance, computational complexity, and are selected (i.e. AS) from the nt columns of an original full
feedback overhead, we provide insight into the potential of fusion
of machine learning and wireless communications. MIMO channel matrix Hfull ∈ Cnr ×nt ; and n is the additive
white Gaussian noise (AWGN) at the Rx.
Index Terms—Machine learning, multiclass classification, k- Next, a set of a selected antenna index vector, sn ∈ Rns ×1 ,
NN, SVM, data-driven prediction (DDP), optimization-driven
decision (ODD), antenna selection, MIMO. is defined as S = {s1 , . . . , sS }, where the elements of sn
represent the indices of the selected antennas and S is the
I. I NTRODUCTION number of selection candidates. The optimal AS son in terms
Over the past few decades, advanced analytics techniques, of BER is obtained through solving the optimization problem
such as data analytics, data mining, and machine learning, have below [4]:
attracted the attention of analysts, data scientists, researchers, son = max svnd [Hfull ]sn , (1)
sn ∈S
and engineers in various fields in order to exploit very large
and diverse data sets. For example, recently in information where svnd (A) gives the nd th largest singular value of matrix
communications, a network utilizing big data was demon- A; and [A]sn gives a partial matrix that consists of the selected
strated [1], and networks and wireless communications that columns from A according to the selection indices in sn .
embrace big data have been studied [2], [3]. Then, the pre- and post-precoder matrices are obtained as
This study focuses particularly on the potential of machine Wr = U H and Wt = V , respectively, where U ∈ Cnr ×nd
learning techniques in wireless communications. Specifically, and V H ∈ Cnd ×ns are the left and right singular matrices
we apply multiclass classification, which is a primary task corresponding to the largest nd singular values of H.
in machine learning, into a multiple-input multiple-output Because son in (1) with the corresponding pre- and post-
(MIMO) system with transmit antenna selection (AS) [4]. coding maximizes the minimum of the effective channel gains,
We employ multiclass classification algorithms, i.e. multiclass it yields the optimal BER and is included as a performance
k-nearest neighbors (k-NN) and a support vector machine bound in the BER comparison.
(SVM) [5], to classify the training channel state information III. DDP: DATA -D RIVEN A NTENNA S ELECTION
(CSI) into one of the classes that represent the antenna set that
In order to select the optimal antenna indices among |S|,
provides the best communication performance. From training
where |A| is the cardinality of set A, instead of solving the
with sufficient channels (data), we obtain a classification
combinatorial optimization problem in (1), we consider super-
model and use it to classify a new channel and to accordingly
vised machine-learning algorithms. In particular, we employ
obtain the best antenna set, i.e. data-driven prediction (DDP).
a multiclass classification algorithm in order to classify CSI
We compare the learning-based antenna selection (AS)
into |S| classes, each of which represents the antenna set
systems with a conventional AS system that maximizes the
that provides the best communication performance. With a
minimum of either the eigenvalue or the norm of channels; i.e.
sufficient number of CSI samples, i.e. training data, we can
an optimization-driven decision (ODD). From the comparison,
obtain a classification model and predict a class of new CSI,
we discuss the systems’ advantages and disadvantages in terms
i.e. the best antenna set for a new channel. Here, we interpret
of bit-error-rate (BER), selection complexity, and feedback
a communications system as a learning system, as follows: (i)
overhead. This study provides insight into the fusion of
CSI as training sample (i.e. an instance or observation); (ii)
machine learning and wireless communications.
index n of sn ∈ S as a class label ℓ (i.e. membership); and
II. ODD: O PTIMIZATION -D RIVEN A NTENNA S ELECTION (iii) the set of index n as a set of classes, L (i.e. categories or
groups).
We consider a simple point-to-point communication with
We address this method as a DDP method, and henceforth
a transmitter (Tx) and a receiver (Rx) employing nt and nr
use the terms in communications and machine learning inter-
Manuscript received April 0, 2016. The associate editor coordinating the changeably and introduce two popular multiclass classification
review of this letter and approving it for publication was D. B. da Costa.
The author is with the School of Electrical and Electronics Engineering, 1 In this study, for the feasibility test, we assumed that the Tx and Rx know
Chung-Ang University, Seoul, Korea (e-mail: jgjoung@cau.ac.kr). the CSI. Note that the CSI is not typically required for a practical AS system
Digital Object Identifier S 00.0000/LCOMM.2016.00.000000 because the selection can be performed at the Rx.
1089-7798/10$25.00 © 2016 IEEE
1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2594776, IEEE
Communications Letters
1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2594776, IEEE
Communications Letters
In (2), C is a penalty (or cost) parameter that balances the • ODD: MaxMinNorm. Maximize the minimum norm of
bias and over fitting (similar to a regularization factor in the channels such that si = max mink [Hfull ]si k2 .
si ∈S
regression); the cost function gk (·) is defined as gk (z) = •DDP: SVM. Multiclass SVM-based selection.
(−1)k z + 1 if (−1)k z ≥ −1, otherwise 0; θℓ ∈ RM×1 • DDP: k-NN. Multiclass k-NN-based selection.
is a learning parameter vector; and f (tr [m]) ∈ RM×1 • Random selection: Random.
is a Gaussian radial-based kernel function vector that im-
We generated 2 × 103 training CSI samples (i.e. M = 2 × 103)
proves the choice of features, and its qth element function
that have the Rayleigh distribution and considered the spatial
fq (tr [m]) gives a similarity score between tr [q] and tr [m]
correlation factors ρt and ρr for the transmit and receive
as fq (tr [m]) = exp(−ktr [q] − tr [m]k2 /(2σ 2 )). Here, the
antennas, respectively. For each channel realization, 200 16-
penalty parameter C and the variance σ of the Gaussian
quadrature amplitude modulation (QAM) symbols were trans-
radial-based kernel function are design parameters. The
mitted. We set the spatial correlation factors of the channels
Gaussian kernel is a commonly used kernel function in
as ρt = 0.3 and ρr = 0.1. For uncertain CSI, the uncertain
SVMs when the number of features N is small and the
channel power was set to 2% of the average channel power.
number of training sets M is intermediate [5]. Note that
In S.1, |hij |2 was used to construct di because it is effective
the case is typical of the AS system considered.
in learning the cost in (1) from our observations. The other
S.11 Repeat S.10 for all ℓ ∈ {1, . . . , |L|}.
communications and training-and-learning parameters tested
in the evaluation are as follows:
C. Antenna Selection based on a Multiclass Classifier
• Fig. 1(a): {nt , nr , ns , nd } = {8, 1, 1, 1}; N = 8; |L| = 8
Once all θℓ values are obtained, we can build an AS system with si ∈ {1, . . . , 8}; and 15 dB SNR per an Rx antenna.
using the learning function in (2). Upon a new input into a • Fig. 1(b): {nt , nr , ns , nd } = {6, 2, 2, 2}; N = 12; |L| =
channel matrix, we manipulate it to be an input of the learning 6 with highlighted ℓ in Table I; and 20 dB SNR per Rx
machine, i.e. tr , and provide it to a classifier built with {θℓ } in antenna.
order to predict the label of the class, i.e. the selected antenna • Note: Other configurations with possible ns for
index. Here, as the SVM is implemented using an |L|-binary {8(6), 1(2), ns, 1} and {6, 2, ns, 2} have the same trends
classifier, the SVM might predict multiple labels. In this case, as the results in Figs. 1(a) and 1(b), respectively. Thus,
we determine the optimal label ℓ∗ that provides the lowest the results are omitted in this letter.
cost, and an antenna pair corresponding to ℓ∗ will be selected
for communications. A. Classification Performance
In this study, in order to visualize the multiclass classifica-
D. Parameter Optimization tion performance, we illustrate the misclassification rate (error
The classification performance depends on the design of the rate) using a web representation [7] in Figs. 1(a) and 1(b). Each
parameters. For example, the performance of k-NN depends point of a regular polygon represents the misclassification
on the number of neighbors k, the distance function d(·), and denoted by ℓ → ℓ̄, where ∀ℓ̄ ∈ L and ℓ̄ 6= ℓ, and the
the weight of the distance, and the performance of the SVM corresponding misclassification rate is noted on the spoke.
primarily depends on the kernel functions. Furthermore, in When a Tx selects one antenna and transmits a single
communications, a learning system is built offline, and it is stream, the multiclass classification performs better compared
difficult to directly include the numerous practical parameters with the two-antenna selection for two streams. This is because
in the communication environment in the design. Moreover, the dimensions of the feature, N , increase from 8 to 12 with
the parameters also depend on the system configuration, such the limited training set. Furthermore, the magnitude of the
as nt , nr , ns , SNR, channel uncertainty level, and the spatial elements in a vector channel, i.e. di , is sufficient information
correlation factors. Hence, the design of the optimal parame- to capture the KPI, yet it is not sufficient for a matrix channel.
ters remains a subject for future study. In this study, we follow
a heuristic approach, which is typically used to identify better B. Communications Performance
parameters: for k-NN, we perform an exhaustive search for a In Fig. 1, we illustrate the empirical cumulative density
better k; for SVM, we check with the multiple random initial function (CDF) of the BER. Because we have finite training
parameters of σ 2 and C using iterative cross-validation with samples of channels in a large dimension, which is propor-
M/10 (10%) training samples. tional to N , and the samples of random channels would be
outliers with high probability, the k-NN classifier often suffers
IV. P ERFORMANCE E VALUATION WITH T EST C HANNELS from high variations of classification at the decision boundary
In this section, we evaluate the BER performance of as reported in [5], while an SVM performs well with the
the AS systems invoking machine learning-based AS and outliers and can be used in a linear or non-linear manner
optimization-based AS, and their performance is compared with the use of a kernel. As seen in the results, therefore, an
(Table II). Here, for comparison, we consider three bench- SVM classifier is more preferable for communications than
marking systems. The AS criteria/methods compared in the a k-NN classifier. Moreover, better classification yields better
simulation are listed and denoted as follows: communications performance.
• ODD: MaxMinEV. Maximize the minimum eigenvalue When a Tx selects one antenna to send a single stream, all
of the selected channels, (1). methods, with the exception of Random, achieve near-optimal
1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2594776, IEEE
Communications Letters
MaxMinEV MaxMinEV
0.9 0.9
SVM SVM
0.8 k-NN 0.8 k-NN
MaxMinNorm MaxMinNorm
0.7 0.7 Random
Random
estimated CDF
estimated CDF
0.6 1 → 1̄ 0.6 1 → 1̄
0.8 0.8
0.5 8 → 8̄ 2 → 2̄ 0.5 0.6
0.6 6 → 6̄ 2 → 2̄
0.4 0.4
0.4 0.2 0.4 0.2
7 → 7̄ 0 3 → 3̄ 0
0.3 0.3
SVM 5 → 5̄ 3 → 3̄
0.2 k-NN 0.2 SVM
6 → 6̄ 4 → 4̄ k-NN
4 → 4̄
0.1 5 → 5̄ 0.1
0 0
−4 −3 −2 −1 0 −4 −3 −2 −1 0
10 10 10 10 10 10 10 10 10 10
BER BER
(a) (b)
Fig. 1. Empirical CDF of BER. (a) Single-antenna selection for single-stream transmission: {nt , nr , ns , nd } = {8, 1, 1, 1}, N = 8, ρt = 0.3, ρr = 0.1, and
SNR is 15 dB. (b) Two-antenna selection for double-stream transmission: {nt , nr , ns , nd } = {6, 2, 2, 2}, N = 12, ρt = 0.3, ρr = 0.1, and SNR is 20 dB.
TABLE II
nt
C OMPARISON OF AS M ETHODS . |L| ≤ ns
= nt !/(nr !(nt − nr )!)
performance as illustrated in Fig. 1(a). In contrast, there is perspective, leaving the training cost aside, verify the feasible
performance loss when multiple antennas are selected for a complexity and performance of learning-based antenna selec-
multi-stream transmission. Here, we note that the learning- tion. This study provides a reference for the use of various
based AS, i.e. the SVM, outperforms the others. machine-learning algorithms for wireless communications (i.e.
DDP for ODD), and it will accelerate their implementation in
C. Feedback Amount and Computational Complexity real-life communications. Interesting areas for further study
Depending on whether the AS decision is made at a Tx or an include the validation of a learning system with realistic
Rx, the required feedback amount varies. If an Rx determines channels and an online learning algorithm to track channels
the selection, all schemes except Random require log2 |L|-bit with time-varying statistics.
feedback to inform the label of si to a Tx. In contrast, if a R EFERENCES
Tx determines the selection, H, i.e. nr nt complex values, is
[1] M. Do and H. Son, “SK Telecom’s fast data platform: T-PANI and
required to be fed back for MaxMinEV and MaxMinNorm; APOLLO,” Korea Communication Review, vol. Q2, pp. 38–42, Apr. 2015.
however, the SVM and k-NN schemes only need t, which is [2] T. T. T. Nguyen and G. Armitage, “A survey of techniques for internet
an N -real value. By designing di in S.2, the learning-based traffic classification using machine learning,” IEEE Commun., Surveys &
Tutorials, vol. 10, no. 4, pp. 56–76, 4th Quarter 2008.
methods can reduce the feedback amount. [3] S. Bi, R. Zhang, Z. Ding, and S. Cui, “Wireless communications in the
The selection complexity is compared in Table II. The selec- era of big data,” IEEE Commun. Mag., vol. 53, pp. 190–199, Oct. 2015.
tion complexity is defined as prediction complexity excluding [4] R. W. Heath, S. Sandhu, and A. Paulraj, “Antenna selection for spatial
multiplexing systems with linear receivers,” IEEE Communications Let-
the training complexity as the training is performed before ters, vol. 5, no. 4, pp. 142–144, April 2001.
the communications offline. Here, it can be seen that the [5] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM-KNN: Discrimi-
selection complexity of the learning-based AS (i.e. DDP) is native nearest neighbor classification for visual category recognition,” in
Proc. IEEE Comput. Soc. Conf. Computer Vision and Pattern Recognition
polynomial, which is clearly lower than that of MaxMinEV (CVPR), New York, USA, Jun. 2006, pp. 2126 – 2136.
and MaxMinNorm (i.e. ODD) using a combinatorial search [6] J. P. Kermoal, L. Schumacher, K. I. Pedersen, P. E. Mogensen, and
across |L| ≤ nnst . F. Frederiksen, “A stochastic MIMO radio channel model with experi-
mental validation,” IEEE J. Sel. Areas Commun., vol. 20, pp. 1211–1226,
V. C ONCLUSION Aug. 2002.
[7] B. Diri and S. Albayrak, “Visualization and analysis of classifiers perfor-
In this letter, we applied a multiclass classification to an mance in multi-class medical data,” Elsevier Science, vol. 34, no. 1, pp.
antenna selection system. The results from a communications 628–634, Jan. 2008.
1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.