Training-Based MIMO Systems-Part I Performance Comparison

5464
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
Training-Based MIMO SystemsPart I:

Performance Comparison
Mikael Coldrey (Tapio) and Patrik Bohlin
AbstractIn this paper, we make a performance comparison

between two different training-based schemes for multiple-input
multiple-output (MIMO) channel estimation. The two schemes are
the conventional time-multiplexed pilot (CP) scheme and the more
recently suggested superimposed pilot (SIP) scheme. Unlike previous comparisons found in the literature, which are mostly based
on estimation error performances, the performance comparison
in this paper is made by deriving and comparing the maximum
data rate (or rather a tight lower bound on the maximum mutual
information) achieved by each scheme. By using the maximum
mutual information criterion, for each training scheme, we can
optimally allocate the time and power spent on transmission of
training and data sequences. aOnce the system parameters (time
and power) are tuned to give optimal performance, we can comk
pare their respective maximum data rate. The theory is applied
k channel, and it is found that in
to a blockwise flat-fading MIMO
a receive antennas and/or short
certain scenarios (such as many
channel coherence times), it ismbeneficial to use the SIP. In other
scenarios, the SIP scheme suffers from a higher estimation error,
and its gain over the CP scheme is often lost.
Index TermsCapacity, channel

a estimation, multiple-input multiple-output (MIMO), performance
bounds, superimposed pilots,
g
training, wireless communications.
a
a
I. INTRODUCTION
HE use of multiple-input multiple-output (MIMO) ana

tenna systems enables wireless communication with high
b
data rates. Without any increase in bandwidth or power cona
sumption, pioneering work in [1] and [2] show a linear (in
d
the number of antennas) increase in capacity by using MIMO
a
systems over single-antenna systems
for Rayleigh-fading chana scattering, which ensures a
nels. The results rely on rich
full-rank channel matrix, and that the channel matrix is perfectly known at the receiver. In many practical systems, the receiver learns the channel by estimating it using blind methods
or transmission of known pilot sequences. Channel estimation
is a classical problem in communications, and using pilot symbols is the most commonly employed technique today. Thus,
the area is well covered in the literature (see, e.g., [3] and
Manuscript received November 15, 2005; revised January 5, 2007. The associate editor coordinating the review of this manuscript and approving it for
publication was Dr. Michael Buehrer. The majority of this work was carried out
while the authors were with Chalmers University of Technology, Sweden, and
it was financed in part by the Swedish Foundation for Strategic Research, the
Swedish Research Council, and the Personal Computing and Communication
Programmes (PCC and PCC++).
M. Coldrey (Tapio) is with the Ericsson Research, Ericsson AB, SE-417 56
Gteborg, Sweden (e-mail: mikael.coldrey@ericsson.com).
P. Bohlin is with Qamcom Technology AB, SE-421 30 Vstra Frlunda,
Sweden (e-mail: patrik.bohlin@qamcom.se).
Digital Object Identifier 10.1109/TSP.2007.896107
references therein). The problem of channel estimation using

transmission of known pilot sequences can be addressed using
various performance criteria. These criteria can basically be
divided into 1) information theoretic (mutual information and
channel capacity bounds, cutoff rate, etc.) and 2) signal processing [channel mean-square error (MSE), symbol MSE, bit
error rate (BER), etc.]. An excellent recent survey of this area
that contains an extensive reference list is presented in [3].
For a further introduction to MIMO systems and their performances, we refer to, e.g., [4] and [5].
Conventionally, the pilot symbols are transmitted time-multiplexed with the data (see [6] for a detailed analysis of such a
conventional pilot (CP)-based MIMO system). However, other
results suggest that it might be beneficial to superimpose the
pilots to the data, a technique called superimposed pilots (SIP).
The idea of superimposing known sequences to data is closely
related to digital watermarking techniques [7], and the connection between watermarking and communications with side
information has been made in [8]. Superimposing pilots to data
was already suggested in [9] and has been used to solve various
synchronization issues in, e.g., [10] and [11]. The use of SIP
for channel estimation was introduced in [12] and has received
more recent attention in, e.g., [13][19]. The advantage of SIP,
compared with conventional time-multiplexed training, is that
data is transmitted in all time slots. That is, no slots are purely
dedicated to known pilots, which has been argued to lead to
higher spectral efficiency. The drawback of SIP is a degradation in the quality of the channel estimate, which is due to the
presence of unknown data during channel estimation.
The estimation performance of SIP compared with conventional time-multiplexed pilots for channel tracking using
Kalman filters has been studied in [20], where (for both
schemes) a predefined and fixed percentage of the available
transmit power was allocated for pilot and data symbols. The
results showed that SIP outperformed, in terms of channel minimum mean-square error (MMSE) and BER, the conventional
pilot for fast-fading channels and/or low signal-to-noise ratio
(SNR). In order to make a less ad hoc and more fair comparison
between the schemes, an information theoretic approach is
used instead in this paper. Clearly, there is a tradeoff between
achieving a high-quality channel estimate and the information
throughput. On the one hand, to achieve a high-quality channel
estimate, a larger portion of the available time/power needs
to be sacrificed for training, which leaves little time/power
for information transmission. On the other hand, if too little
time/power is spent on training, the channel estimate will be
poor, which also deteriorates the information throughput due
to a mismatched decoder [21][24]. Hence, a performance
1053-587X/$25.00 2007 IEEE
COLDREY (TAPIO) AND BOHLIN: TRAINING-BASED MIMO SYSTEMSPART I: PERFORMANCE COMPARISON
measure that captures this effect is needed when comparing the

two schemes. One such measure is the information theoretic
ergodic channel capacity, which denotes the average maximum
information rate we can achieve with an arbitrarily low probability of error. For discrete time memoryless channels, we know
that the capacity is given by the maximum mutual information
(MI) between the transmitted and received signals. Hence, in
this paper, we will maximize the MI (or rather a tight lower
bound on the MI) of these training-based MIMO systems, since
it captures the fundamental limit of a communication system
(which is the maximum achievable data rate with arbitrarily low
probability of error). We use the capacity as the performance
measure since it can connect the channel estimation error to
the maximum data rate by letting the BER tend to zero and the
time delay tend to infinity. Ideally, one might like to maximize
the data rate for a given BER, finite time delay, and a transmit
power constraint, but unfortunately this problem becomes too
difficult to solve and perhaps even too system specific. Thus, by
studying the capacity, it allows us to use a more general system
description where we have removed most of the practical and
implementation specific aspects that if not removed would have
made our main objective intractable. However, by allowing
us to abstract the problem by removing these kind of system
specific aspects, we can study a clean problem that allows us
to reach our overall objective, which is to make an objective
comparison between the previously proposed SIP scheme and
the classical CP scheme without favoring the one or the other
by using some ad hoc parameter settings. The maximum MI
criterion allows us to find how the available resources (time
and power, see Fig. 1) should optimally be allocated for the
transmission of pilots and data symbols. For each scheme, the
system parameters are tuned to yield optimal operation, i.e.,
such that they maximize the derived expression for the capacity,
and the resulting capacities are then compared. In this paper,
the word capacity is used when referring to the maximum
MI lower bound of the system that we are working with and,
hence, it should not be interpreted as the true Shannon capacity.
Nevertheless, the lower bound is tight for low and high SNRs,
and we have equality when the additive and effective noise is
uncorrelated Gaussian noise.
The work presented here is based on [6], where a tight
lower bound on the capacity for conventional time-multiplexed
training-based MIMO systems is established. Similar tight
lower bounds have also been used to relate channel estimation
errors to capacity to find optimal values of training parameters
for various systems in, e.g., [25][30]. Here, we extend the
models in [6] to incorporate a scheme based on a novel and
more general superimposed pilot model of which the conventional pilot is a special case. This provides us with a very useful
framework to compare the performances of the two schemes.
While the work in previous literature more or less arbitrarily
chooses parameter settings that favor one or the other estimation scheme, our general SIP model provides a framework that
allows us to, given a certain time and power constraint, find the
optimal parameter setting. This way, the comparison becomes
more fair. It is important to point out that we are not suggesting
a novel MIMO estimation method. We are instead concerned
with establishing a more general framework for comparing two
5465
Fig. 1. Symbol block with conventional pilot insertion, overlay pilot, and superimposed pilot. The channel coherence time is T = T + T , and the total
block energy is ( + )T + T = T .
methods that have been previously suggested in the literature,

since these kind of fundamental information theoretic studies
for in-depth understanding of the problem are missing in the
literature.
The contributions of this paper are the introduction of the generalized SIP framework, the corresponding MI derivations, and
the numerical examples that give insights to the benefits and
drawbacks of using SIP compared to CP. Also to simplify the
derivations, we introduce an abstract projection-based problem
viewpoint using vector spaces. By using projections, we find the
optimal number of training symbols and that the CP scheme is,
in general, suboptimal to the SIP-based scheme. However, this
work comprises two parts, with this paper being Part I. In Part II
[31] of this paper, the work is extended to a decision-directed
channel estimation and multiple coding rate scheme. This extension eliminates the SIP schemes drawback of attaining poor
channel estimates, which boosts its performance in terms of
capacity.
The remainder of this paper is organized as follows. Section II
is the main section of the paper, and it introduces the trainingbased data models along with the equivalent projection-based
model. The channel estimation scheme is also described in this
section, it is then followed by the derivation and discussion of
the MI lower bound and optimization of it. Section III contains
numerical evaluations and comments to the performances of the
training schemes. Section IV summarizes the paper.
1) Notation: Upper (lower) boldface letters will be used for
matrices (vectors).
denotes the set of all complex-valued
matrices, superscript denotes transpose, and comis the trace of
plex-conjugate transpose (Hermitian).
the matrix
is the statistical expectation operator, and
is the expected value of the trace of the random matrix
. The symbol is reserved for the Kronecker product, and
is the diagonal matrix with the vector on its main diand
denote the
identity and zero
agonal. Finally,
matrices, respectively.
5466
II. TRAINING-AIDED MIMO CHANNEL ESTIMATION
TABLE I
PARAMETER SETTINGS FOR THE DIFFERENT TRAINING SCHEMES
When transmitting a block of signal vectors over a MIMO

channel with
transmit and
receive antennas, the relation
between the input and output matrices of a narrowband link can
be described by the discrete-time baseband model
(1)
where
is the received signal matrix,
is the complex channel matrix,
is the transmitted
is the noise matrix, and
denotes
signal matrix,
the average transmit power.
is modeled as a stochastic matrix
The channel matrix
with independent, circularly symmetric,
entries (path
gains). The channel is constant for a discrete time interval ,
after which it changes to a new independent realization. Hence,
the channel is memoryless on the block level. The time over
which the channel is constant is referred to as the channel
coherence time. The validity of this model from a practical
point of view can of course be discussed, but the model is commonly used in the literature and can be seen as a coarse discrete
approximation of a time-varying flat-fading continuous channel
and is often referred to as a block-wise Rayleigh fading channel
model [32]. It is reasonable for, e.g., time-division multipleaccess (TDMA), frequency-hopping, or block-interleaved systems. This type of channel is of full-rank (w.p.1) and models
rich scattering environments in the absence of line-of-sight.
Further, each entry in the signal matrix may consist of both a
known and an unknown random symbol. The known symbols
are used for estimating the channel and the random symbols
represent the transmitted information. The entries of the signal
matrix are also normalized to have unit mean square. The
additive noise matrix is modeled as having independent and
identically distributed (i.i.d.) circularly symmetric complex
normal-distributed entries with zero mean and unit variance
and is independent of both the data and the channel. Thus, the
becomes the expected SNR at
average total transmit power
each receive antenna.
modes). Hence, we will not consider schemes with constant average power such as the overlay pilot scheme since it then becomes too inferior to the other two schemes.
1) SIP Mode: During the SIP mode, the training symbols
that are used for channel estimation are transmitted in a block
of length symbols. The SIP part of the received signal (1) can
be written as
A. Superimposed Pilot
is considered deterministic and

stochastic.
Note that
Since we introduced power constraints on the signal covariance
matrices, the transmitted power is accounted for in the paramand
. The special case
corresponds to the
eters
CP scheme, i.e., there is no superposition of training and data
symbols.
2) Data Mode: After transmission of the SIP block, a data
data symbols is transmitted.
block consisting of
Hence, the received signal in the data mode is given by
In the general SIP scheme, the transmission of a block of

symbols is divided into two modes. The first mode is the SIP
mode in which the pilot symbols are superimposed to low-power
symbols. The
data symbols. The length of the SIP block is
second mode is the pure data mode, in which only a block of
information symbols is transmitted. Since the channel is assumed to be constant over the coherence interval and memoryless from block to block, there is no loss in generality by placing
the SIP block at the beginning of each block. The SIP scheme is
illustrated in Fig. 1. Note that the time-multiplexed CP and the
overlay pilot (OP) (see [33]) schemes are special cases, i.e., the
two extremes of the generalized SIP scheme. All parameter settings are summarized in Table I. This choice of SIP setting gives
a useful framework for comparing the different pilot schemes.
In this paper we will only look at the optimal choice of training
length (given that we are free to adjust the power between the
(2)
where
is the received matrix, and
and
denote the transmitted complex-valued
known pilot symbol and random data symbol matrices, respecis the noise matrix, and
and
tively. Further,
are the average transmit powers allocated for training and
data symbols in the SIP block, respectively.
We also define the training covariance matrix
(3)
and the data covariance matrix
(4)
which are normalized to
(5)
(6)
(7)
where
is the random data matrix and

represents the noise matrix.
The covariance matrix of the data is defined as
(8)
and is normalized to
(9)
3) Time and Energy Constraints: The total block length of
symbols is split into two subblocks: the SIP block and the
data block. The length of the SIP block is , the length of the
data block is , and the total transmitted energy is
, where
is the mean power spent on the transmitted total block. This
energy is shared between the SIP block and the data block. Thus,
we have the following constraints:
(10)
(11)
5467
(which we will call the training subspace), and

is the projection matrix onto the orthogonal complement of the column
. This subspace will be respace of , i.e.,
ferred to as the data subspace. Note that we have assumed that
is of full column rank, which is a requirement for channel
identifiability. All the information symbols are contained here
. In this model, the parameter
in the data matrix
adjusts how much of the total data power
is
allocated to data in the training subspace versus data in the data
subspace, i.e., the relation between the previously introduced
and
variables. For
, we have the special case when
no data is present in the training subspace.
Using this model of the transmitted signal and denoting
and
, (12) becomes
The relation between these parameters is also depicted in Fig. 1.

(16)
B. Equivalent Projection-Based Model
To gain further insights into this data model, in this section
we will adopt a more abstract viewpoint, where the intuitive
physical aspects such as antennas and time slots are removed,
and instead the signals are viewed as vectors in a linear vector
space.
Consider again the MIMO system with transmit and receive antennas, transmitting over a frequency-flat block-fading
channel where each channel coefficient is modeled as an i.i.d.
complex Gaussian variable with unit variance. Since the coherof the channel determines the dimensionality of
ence length
the space, we will be working in
.
The received discrete baseband signal can be written as
(12)
If we consider the singular value decomposition (SVD) of the

, where and are
and
training signal
unitary matrices, respectively, is a
matrix
with the
nonzero singular values
, on its
main diagonal, we can write the projection matrices as
(17)
(18)
The MI between the transmitted and received signal remains unchanged when we introduce the following unitary
transformation:
denotes the matrix of complex transwhere

denotes the channel matrix.
mitted signals, and
The spatially and temporally white additive noise
is assumed to be complex Gaussian distributed with zero mean
and unit variance. Now, since we are looking at a training-based
system, the transmitted signal can be decomposed into two parts
where the training part of the signal is known to the receiver and
the data part is unknown. This is expressed as
(13)
(19)
where
and
are the training respective total data powers,
and the projection matrix is given by
(14)
Here
(15)
is the projection matrix onto the subspace occupied by the
training signal, i.e., spanned by the columns of
where
, and
.
Now, since is complex Gaussian distributed with i.i.d. entries,
and hence rotationally invariant, has the same distribution as
. Similarly, since the noise is i.i.d. and complex Gaussian, we
has the same distributions
also realize that the noise matrix
(i.i.d. and complex Gaussian) as , respectively. By using this
model, we see that the model discussed previously is the special
case when the time slots are chosen as the coordinate base, but
here we are not restricted to time multiplexing of the symbols
and can choose any orthogonal scheme that we like.
5468
Now, we can switch back to our original notation, and con, which occupies the entire training subsider
, where
, which occupies
space, and
the data subspace. Then, we can write
(20)
(21)
where
rows of
is the diagonal
On the other hand, the equivalent projection matrix for the

CP-scheme would be the following:
(29)
. This matrix imposes the additional conwhere
straint on the CP scheme that no data can be transmitted along
the same dimensions (time slots) as the training dimensions.
Both systems will have the identical training data given by (20),
rendering identical channel estimates, and the difference lies in
that the projection-based system will receive the data signal
matrix containing the nonzero
, i.e.,
(30)
, and where
we have the following normalization:
while the CP-based system will receive the data signal

(22)
(31)
(23)
(24)
(25)
(26)
where
and
denotes the expected SNR at each
receive antenna. Thus, the different power levels are accounted
, and .
for in the variables
The results of this section are summarized in the following
theorems.
Theorem 1: Consider the data model given by (12) and (13),
and let the channel matrix
, data matrix
, and the noise matrix
be composed of
i.i.d. complex Gaussian random variables. Let the deterministic
training matrix, normalized according to (22), be given by
(27)
and has full-column rank . Then, in the
where
MI sense, it holds that for any such that
, there
exists an equivalent data model where the training matrix is a
matrix.
real-valued and diagonal
Proof: The theorem follows directly from inserting the
zero-padded training matrix (27) into the derivations of the diagonalized model (20).
The above theorem states that there is no loss or gain in MI between the transmitted and received signal by varying the number
.
of training vectors given that
and the same conditions as in
Theorem 2: Given
Theorem 1, the projection-based signaling scheme (16) will always have better performance (meaning larger MI) than CP for
. They have identical performance for
,
since the two schemes then are equivalent in the MI sense.
Proof: Consider the zero-padded training matrix (27) and
in (14), i.e.,
let
(28)
The proof is concluded by noticing that the CP-based system,

with the same channel and power, transmits
data sym, the
bols less than the projection-based system, and if
systems have identical performance.
Corollary: It is optimal for CP to choose
.
Proof: The corollary is a direct result of Theorems 1
and 2.
So what have we then gained by these little exercises? To answer this question, we first consider a conventional time-multiplexed pilot (CP) scheme, such as the one analyzed in [6]. In
such a scheme, the pilot symbols are transmitted exclusively in
the time slots allocated for training. This means that the subspace spanned by the pilot signals (which always is restricted
to have dimension ) will lie in a -dimensional subspace of
. So if
, we will have
dimensions that are
free but cannot be used for data, since the data is constrained
dimensions that are given by the training-free
to be in the
time slots. Considering the projection-based approach discussed
above, we see that all the free dimensions can be utilized for data
time slots.
even if we want to transmit pilots in more than
Hence, we can conclude that no performance is lost or gained
(or any other number
)
by transmitting
pilot symbols as long as we can adjust the power accordingly,
since additional training power is the only thing that is gained
by increasing the number of time slots allocated for additional
training symbols. On the other hand, if there are restrictions on
the power allocations, such as for a system with a constant power
in order
requirement, it might be necessary to consider
to get a sufficiently good channel estimate. These kind of systems, such as the previously mentioned overlay pilot scheme,
are not considered in this paper, and from now on only
will be considered.
C. Channel Estimation
A natural choice of channel estimator is the MMSE estimator, since it renders the lowest channel estimation error.
This estimator also has the desired property that the estimate
becomes uncorrelated with the estimation error. Unfortunately,
the problem with this estimator is that the analysis becomes too
complicated for any analytical expressions for the problem at

hand. Hence, we will limit ourselves to the class of linear estimators, whereof the popular and widely used linear minimum
mean-square error (LMMSE) estimator is used to estimate
the MIMO channel. It is used since the channel is considered
random and, besides having a closed-form solution, along with
the MMSE estimate it also has the desirable property that the
resulting estimate is uncorrelated with the estimation error. The
orthogonality property becomes very useful in the forthcoming
derivations of the optimal training matrix. To form the LMMSE
estimator, the data in (20) is treated as noise. Thus, the fact
that the noise contains a part that is colored by the channel is
neglected, and it makes the estimator suboptimal.
As previously stated, the data in (20) is incorporated into
the noise and the resulting total noise is uncorrelated with the
channel matrix. The LMMSE estimate of the channel given
by [34]
(32)
where
(33)
(34)
is introduced here for notational conThe diagonal matrix
venience since this matrix will appear frequently in the forthcoming derivations, and the elements of are given by
(35)
To arrive at the second equality in (32), we have assumed
has zero mean and is uncorrelated with , and that
that
. It is later argued that choosing the data
to be spatiotemporally uncorrelated maximizes the mutual
information between the transmitted and the received signal
matrices, which is also a reasonable signaling choice since the
channel is unknown in the transmitter.
After removing the known pilot symbols and using the
channel estimate as if it was the true channel, the following
signal model is used in the SIP mode:
(36)
, and is
where is the channel estimation error, i.e.,
zero-mean and uncorrelated with the LMMSE channel estimate
. The average noise power is given by the following result.
5469
Theorem 3: Let be the vectorized version of the noise madefined in (36), i.e.,
. Then, the elements
trix
of the noise matrix are uncorrelated and the average noise power
is given by
(37)
Proof: See Appendix A.
The SIP scheme ignores the fact that the total noise in (20),
besides data, also contains channel information, when it forms
the LMMSE channel estimate. Hence, in our case the MMSE
estimate is not linear, and therefore the LMMSE estimate is not
the MMSE estimate. Nevertheless, this is a conscious choice of
estimator made by the SIP scheme, and the data is simply incorporated in the noise term, essentially rendering a low-complexity channel estimator at the expense of a lower SNR.
For the CP case analyzed in [6], the LMMSE estimator is
also the MMSE estimator which makes its effective noise (the
) uncorrelated with the data. This, together
equivalent to our
with a Gaussian assumption, is the worst case noise in the sense
that it minimizes the capacity [6]. Here, the LMMSE estimator
is not the MMSE estimator and, hence, the effective noise
becomes correlated with the data. This means that if we replace
with another noise matrix that is
our effective noise matrix
uncorrelated with the data and has the same variance, it gives
only an approximate result, although, numerical evaluations indicate that the capacity is quite insensitive to doing so. In fact,
the aforementioned correlation becomes quite small since the
data power during training is most often small relative to the
differs
training power. Further, the actual SNR in the data
for different realizations of the channel matrix estimate, but we
approximate the capacity by instead using the average noise
. To allow some informality, one might even be able
power
to apply Jensens inequality [35] to argue that using the average
noise power will only give an even lower bound on the capacity
due to the convexity (with respect to the noise) of the capacity
with a
formula [2], [6]. Nevertheless, we replace the noise
and the
Gaussian noise matrix that has the same variance
property of being uncorrelated (conditioned on
and ) with
the data. This choice of noise matrix is not necessarily the worst
case noise but serves a good enough approximation of the actual
noise.
Using the channel estimate as if it was the true channel, we
have the following signal model in the data mode:
(38)
5470
where the effective noise matrix

and the variance
has uncorrelated elements
The idea of using a training-based system is to use the channel

estimate as if it was the true channel (certainty equivalence principle). Hence, we use the lower bounds of the MI that are only
conditioned on the channel estimate, and, finally, the SIP-based
system capacity is given by the maximum, with respect to the
and , of the MI
training matrix and the distributions of
lower bound
(50)
s.t.
s.t.
(39)
where
is defined as the variance of
the channel estimation error. Again, it is assumed that the data is
. As for the SIP part, the effective
white, i.e.,
noise
is replaced by a Gaussian noise matrix with the same
and uncorrelated (conditioned on
and ) with
variance
. We want to point out that there is a fundamental difference
between the models given by (20) and (21), and their counterparts (36) and (38). In (20) and (21), the channel is unknown,
while in (36) and (38) the channel is known. In the next section,
we will find expressions for the capacity for the known channel
case.
D. Capacity Bounds and Optimization
be the channel
1) Mutual Information: Let
estimate, formed by treating the data
as noise. Using the data
processing inequality and the chain rule [35], the MI between
what is known and observed in the receiver and the unknown
transmitted signals can be lower bounded and written as
(40)
(41)
(42)
We continue with lower bounding by using the chain rule and

the non-negativeness property of the MI
(43)
(44)
(45)
(46)
(47)
where the equality (45) follows from the fact that
is independent of
, and
.
Finally, we also lower bound by using the chain rule and
the data processing inequality as
(48)
(49)
(51)
2) Bounds on the Capacity: The worst case noise and optimal signal distribution in an MI sense is stated in the following
theorem.
Theorem 4: Consider the following multiple-antenna channel
transmit and receive antennas
with
(52)
where
is the received signal vector,
is
is the transmitted signal
the known channel matrix,
vector, and
is the additive noise. Let the signal and
noise satisfy the following power constraints:
and
(53)
and be uncorrelated
(54)
Further, let
and
denote the
respective correlation matrices. Then the worst case noise (in the
sense that it minimizes the MI between and ) has a zero mean
, where
complex Gaussian distribution, i.e.,
is the minimizing noise covariance matrix. When the
distribution on the channel matrix is right rotationally invariant,
i.e., the probability density function (PDF)
for all unitary matrices
, then
(55)
The MI maximizing signal is also zero-mean and complex
, where
is
Gaussian distributed, i.e.,
the maximizing signal covariance matrix. When the distribution
, then
of is left rotationally invariant, i.e.,
(56)
Hence, a zero-mean uncorrelated complex Gaussian signal
maximizes the lower bound (which is given by a zero-mean uncorrelated complex Gaussian noise vector) on the MI between
the input and output.
Proof: See [6].
This signaling choice is also shown to be optimal in [2]. If
the distribution of the channel matrix (in our case ) can be
shown to be both left and right rotationally invariant, we can
apply Theorem 4.
The following theorem shows that the channel estimate in
(32) is rotationally invariant.
Theorem 5: Assume that

is rotationally invariant and
given in (32). If the training
consider the channel estimate
is chosen as a multiple of the identity matrix, i.e.,
matrix
, the PDF of
is both left and right rotationally
invariant.
Proof: See Appendix B.
Hence, and
can be chosen such that becomes rotationally invariant. In other words, we choose such a signaling
scheme, and in that case we also know that it is optimal to let
and
be multiples of the identity matrix. It is also later
shown that this choice of training matrix also maximizes the effective SNR in the data mode.
To summarize, we can, on the one hand, argue that the noise
(with a given power constraint) that yields the lowest MI is
uncorrelated with the transmitted signal, has a zero-mean circularly symmetric complex Gaussian distribution, and is spatiotemporally white. This worst case noise will, thus, render
a lower bound on the MI. On the other hand, the distribution
of the transmitted signal that maximizes the MI in an additive
Gaussian noise channel is also Gaussian distributed with zero
mean and no space or time correlation. This also implies the
use of Gaussian codebooks where the codes stretch over multiple independently fading coherence blocks.
The lower bound on the capacity will then be composed of
two terms: one for the SIP mode and one for the data mode.
Hence, the capacity expression for the SIP scheme is given by
5471
3) Optimization Over : First, the criterion function (57) is

to be concentrated with respect to the training matrix, . If the
same maximizes both and , that will clearly maximize
the capacity. Using the orthogonality of the LMMSE estimate,
, where
is the variance of the
we can write
channel estimation error. Related to this, a recent study on optimal training signals for MIMO channel estimation is also reported in [36], where the optimal training signal is designed to
solely minimize the squared error criterion.
So far, to find the variance of the channel estimation error,
and
we let
. Then,
is the LMMSE estimate of
, which is just the vectorized version of (32), where
and where we have used that

[37], and
(60)
bits/channel use
(57)
where the elements of the normalized channel estimate
will be uncorrelated with zero mean and unit variance,
and have a distribution that is approximately Gaussian. The noris given by
malization constant
and the effective SNRs are given by
(58)
(61)
Finally, the channel estimate can be identified as
(62)
(59)
Equality holds in the above expression (57) if the effective noise
terms in (36) and (38) are made up of uncorrelated Gaussian
noises. In our case, the effective noises are very Gaussian-like
since they are made up of the sum of Gaussian matrices and
products between Gaussian matrices. If that is not the case, then
the above rates represent the lower bound which is given by
the worst case noise. A comment on the tightness of the above
bound is also in place, and one can apply the same argument
used in [6] to argue that the bound is tight at low and high SNRs.
Since at low and high SNRs, the SIP scheme converges asymptotically to the CP scheme that was analyzed in [6], and it was
shown to render a tight bound at low and high SNRs. To find the
capacity, (57) has to be optimized with respect to the following
and .
parameters:
Since
, the error covariance matrix is given by
(63)
5472
Hence,
is given by
(64)
When solving for , we see that all s equal the same con, the solution
stant. Since
follows from the constraints and yields
. This
choice of training matrix does not only maximize the effective
left rotationally invariant, which also is
SNR, it also makes
in agreement with the choice made in [6].
By inserting this training matrix into (64), the variance evaluates to
(69)
where
.
We want to choose the training matrix such that it maximizes
and
in (58) and (59). Since it is difthe effective SNRs
ficult to analytically evaluate the effective SNR during the SIP
mode, we will resort to only maximize the SNR during the data
mode. Nevertheless, it seems natural that the same choice of
training matrix, which turns out to be a scaled identity matrix,
also should maximize the SNR during the SIP mode since the
channel, data, and noise all are white.
To show that minimizing the channel estimation error also
, and
maximizes the effective SNR, we use that
start by rewriting (59) as
(65)
From (65), we conclude that the effective SNR is maximized
by minimizing the variance of the channel estimation error
,
which is done next.
is minimized. The problem
We need to choose such that
can be stated as
(66)
where
is a real-valued constant. This is a standard convex
optimization problem and may be solved by using, e.g., Lagrange multipliers. The Lagrangian is given by
(67)
Differentiating with respect to
zero yields
and setting the derivative to
(68)
Hence, by employing the result of Theorem 3 and (39) together

, the effective SNRs in (58) and (59) evaluate
with
to (70) and (71), shown at the bottom of the page.
: With and
given
4) Optimization Over
by (70) and (71), numerical optimization of (57), subject to the
time (10) and energy (11) constraints, is used to find the optimal
values of the remaining parameters.
It might seem counterintuitive to optimize over the number
of transmit antennas, since when the channel is known, the capacity is known to be an increasing function of the number
of transmit antennas [2]. The reason why there will be an optimum is that the more transmit antennas we use, the more time
we have to spend on transmission of noninformation carrying
training symbols. To build an estimate of the channel matrix,
we need at least as many training measurements as we have
complex elements in ).
channel parameters (there are
One received vector measurement yields complex measuretraining
ments and, therefore, we need to transmit at least
symbols. Since we have previously shown that there is no gain
beyond
(given that the training power can
in increasing
be adjusted accordingly), we let the number of training vectors
, in all of the upcoming
equal the smallest possible, i.e.,
examples.
III. NUMERICAL EXAMPLES
To illustrate the theories described in this paper, we evaluate
the capacity expression for a number of different cases. Only
have been
channels with relatively short coherence time
studied, since the optimal SIP scheme degenerates to a CP
scheme for large . Unless otherwise stated, an average SNR
.
of 10 dB is considered throughout the examples, i.e.,
This particular choice of SNR has little affect on the results.
A. Optimization Over All Parameters
In these examples, we optimize over all the parameters
without imposing any additional restrictions such as fixed
number of transmit antennas.
(70)
(71)
Fig. 2. Capacity versus SNR when N = 20; T = 10, and with optimal M
and optimal power allocation.
Fig. 3. Capacity versus transmit antennas M when N = 20;

SNR = 20 dB, and with optimal power allocation.
T = 10, and
Fig. 2 shows the attained capacity versus SNR when

and
, which is a typical case when it is advantageous
to use SIP. The capacity of the SIP and CP schemes increases
logaritmically in the high-SNR region. The slope of the curves
in the high SNR region indicates that both schemes are able to
use all the available degrees of freedom (effective subchannels)
[38]. The change in the SIP capacity curvature around 6 dB is
likely due to that the SIP scheme in the low SNR region can utilize more degrees of freedom than the CP scheme while it in the
high SNR region is limited by the available degrees of freedom.
We remind that the results are attained by also optimizing over
the number of transmit antennas.
The capacity versus the number of transmit antennas is shown
in Fig. 3. We see that the choice of number of transmit antennas
has great impact on the capacity, since the more transmit antennas we have, the more time has to be dedicated to transmission of training symbols. Since the CP scheme cannot transmit
5473
Fig. 4. Capacity ratio between CP and SIP training, C =C , versus

channel coherence time T and number of receivers N , SNR = 10 dB, and
with optimal M and optimal power allocation.
information symbols during training, its performance becomes

more sensitive with regard to the number of antennas.
It is not specifically shown here, but we have also seen in
examples that the optimal number of transmit antennas sometimes is larger for the SIP-based system than for the CP-based
one. When one additional transmit antenna is introduced, the
system needs to transmit one additional training symbol. The
conventional training scheme suffers more than the SIP scheme
from this extra training symbol, since it cannot transmit any data
during training. At low SNR, it is optimal to use fewer transmit
antennas, i.e., the available transmit power is distributed only
over a few antennas, which renders a higher SNR for those particular subchannels, in accordance with the waterfilling principle for parallel channels [35]. We also notice that for high
SNR, the optimal number of transmit antennas seems to equal
for high SNR, which
half the coherence time, i.e.,
is in agreement with the CP scheme analyzed in [6]. In order
to compare the performance of the CP-based scheme to that
is shown in
of SIP, the relative performance gain
Fig. 4. It is found that the CP-based scheme will (almost) reach
the SIP-based capacity as the coherence time increases or the
number of receive antennas decreases. This is because the pure
training interval penalty becomes smaller for the CP scheme as
the coherence length is increased and the SNR penalty becomes
larger for the SIP scheme as the number of receive antennas is
decreased. Also notice the nonsmoothness in the relative performance for small numbers of receivers and short coherence
times. These irregularities are due to that the optimization is
also over the (discrete) number of transmit antennas . As
and is varied, the optimal number of transmit antennas is for
each scheme also varied. Since the capacity is extra sensitive to
the number of transmitters (requiring more or less training symbols) at small and values, the irregularities will also occur
in that particular region.
B. Fixed Number of Antennas
In this example we study a scenario where the numbers of
transmit and receive antennas are fixed. This way, the system is
forced to use all the available transmit antennas even if it might
5474
Fig. 5. Capacity versus coherence time T when M =

dB, and with optimal power allocation.
N = 6, SNR = 10
estimate a frequency-flat, block-wise Rayleigh-fading MIMO

channel. When using the maximum reliable information rate as a
performance measure, it was found that SIP showed promising
performance for fast-fading channels and for many receive
antennas. In other cases, the SIP schemes low-quality channel
estimate punished the information transmission and there was
little or no gain over the CP scheme (a decision-directed estimation and multiple coding rate scheme that eliminates the poor
channel estimates is introduced in Part II [31] of this paper).
By using an equivalent projection-based model, it was shown
that increasing the number of transmitted training symbols
beyond the number of transmit antennas in order to achieve
better channel estimates could instead be achieved by keeping
the number of training symbols intact and instead increasing the
transmitted training power. The main conclusion, though, is that
SIP show little gain over CP unless the channel experiences very
fast fading. Besides, the little gain that SIP has might even be lost
in the imperfections that comes with a practical implementation.
Therefore, one might also want to consider other noncoherent
schemes for such fast-fading channels, although, they may have
other imperfections such as, e.g., high error floors.
APPENDIX A
PROOF OF THEOREM 3: EFFECTIVE NOISE
COVARIANCE MATRIX
In this Appendix, we will calculate the effective noise variance that is used in Section II-D. Hence, the aim is to calculate
(72)
is defined in (36). By also introducing the
where
and the
vectorized versions of the channel error
, we get
background noise
Fig. 6. Optimal power allocation versus coherence time T when M = N = 6,

SNR = 10 dB.
(73)
The covariance matrix can therefore be decomposed as
be suboptimal in some cases. The case

is displayed
in Fig. 5. Again, the CP scheme suffers from short coherence
times, which leaves it with little time for data transmission. This
loss is eliminated for large coherence times where the difference
in performance gets small (the curves will merge for large ).
Fig. 6 displays the optimal power allocation for the two
schemes versus the coherence time. It is noticed that the power
allocated for data in the SIP block decays quite quickly to
allocated
almost zero. Further, we also find that the power
for training symbols increases with the coherence time, which
states that it is better to build a good channel estimate as the
coherence time increases.
(74)
where
IV. CONCLUSION AND DISCUSSION

One of the aims of this paper was to find answers to if and
when it is beneficial to use superimposed pilots (SIP) over
conventional time-multiplexed pilots (CP). We studied the
case where transmission of known pilot sequences was used to
(75)
The calculation of all these terms is tedious but straightforward.

The only term that does not end up as the expectation over one,
, where the following
two, or three Gaussian variables is
lemma becomes useful.
matrix with i.i.d. GaussianLemma 1: Let be an
distributed elements with zero mean and unit variance, and
be a deterministic diagonal matrix. Then, it holds that
(76)
Proof: This is an extension to [39].
The different terms are now evaluated as
5475
where is the diagonal training matrix, and

data given by (20)
is the received
The right rotational invariance follows directly from the fact

is right rotationally invariant. That
is right rotationthat
are Gaussian
ally invariant follows easily, since both and
distributed with i.i.d. elements, i.e., they are rotationally inbe
variant. To show this, let the rotated version of
. The right rotational
are roinvariance follows now from the fact that both and
and
tationally invariant, i.e.,
. The right rotational invariance implies that the worst
case noise is spatio-temporally white noise [6].
is chosen as a
Next, we prove that if the training matrix
, the channel
multiple of the identity matrix, i.e.,
estimate will also be left rotationally invariant. This result shall
not come as a surprise, since by choosing the training matrix as
the identity matrix we will allocate the same amount of power
to each transmit antenna and, hence, all the channel parameters
are estimated at the same SNR. Further, by using orthogonal
training sequences we will not get any cross interferences, and
since everything else is white, it is natural that the estimate becomes invariant.
Again starting from (32) but now also taking the assumption
on the training matrix into account (32) becomes
and again we can conclude that is rotationally invariant (this

time from the left) if
is left rotationally invariant. That
has this property can be seen if we insert
into (20)
Hence
and look at the rotated version of
The proof follows first by noticing that

finally, by invoking the trace-operator.
(77)
(78)
is diagonal, and,
and by realizing that

has the same distribution as
has the same as
and, finally, that
has the
, the proof is concluded.
same distribution as
APPENDIX B
PROOF OF THEOREM 5: ROTATIONAL INVARIANCE OF
To this end, the LMMSE estimate (32) is given by
REFERENCES
[1] G. J. Foschini, Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas,
Bell Labs Tech. J., pp. 4159, Autumn 1996.
[2] I. E. Telatar, Capacity of multi-antenna Gaussian channels, Eur.
Trans. Telecom., vol. 10, pp. 585595, Nov. 1999.
[3] L. Tong, B. M. Sadler, and M. Dong, Pilot-assisted wireless transmissions, IEEE Signal Process. Mag., vol. 21, no. 6, pp. 1225, Nov.
2004.
5476
[4] A. J. Paulraj, D. A. Gore, R. U. Nabar, and H. Blcskei, An Overview

of MIMO CommunicationsA Key to Gigabit Wireless, Proc. IEEE,
vol. 92, no. 2, pp. 198218, Feb. 2004.
[5] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, Capacity
limits of MIMO channels, IEEE Journal on Selected Areas in Communications, vol. 21, no. 5, pp. 684702, Jun. 2003.
[6] B. Hassibi and B. M. Hochwald, How much training is needed in multiple-antenna wireless links?, IEEE Trans. Inf. Theory, vol. 49, no. 4,
pp. 951963, Apr. 2003.
[7] F. Hartung and M. Kutter, Multimedia watermarking techniques,
Proc. IEEE, vol. 87, no. 7, pp. 10791107, Jul. 1999.
[8] I. J. Cox, M. L. Miller, and A. L. McKellips, Watermarking as communications with side information, Proc. IEEE, vol. 87, no. 7, pp.
11271141, Jul. 1999.
[9] D. Makrakis and K. Feher, A novel pilot insertion-extraction method
based on spread spectrum techniques, presented at the Miami Technicon, Miami, FL, 1987.
[10] T. P. Holden and K. Feher, A spread spectrum based system technique
for synchronization of digital mobile communication systems, IEEE
Trans. Broadcast., vol. 36, no. 3, pp. 185194, Sep. 1990.
[11] F. Tufvesson, M. Faulkner, P. Hoeher, and O. Edfors, OFDM time
and frequency synchronization by spread spectrum pilot technique,
in Proc. 8th Communication Theory Mini-Conf. in Conjunction with
IEEE Int. Conf. on Communications (ICC), Vancouver, Canada, Jun.
1999, pp. 115119.
[12] B. Farhang-Boroujeny, Pilot-based channel identification: Proposal
for semi-blind identification of communication channels, Electron.
Lett., vol. 31, no. 13, pp. 10441046, Jun. 1995.
[13] P. Hoeher and F. Tufvesson, Channel estimation with superimposed pilot sequence, in Proc. IEEE GLOBECOM, 1999, vol. 4, pp.
21622166.
[14] F. Mazzenga, Channel estimation and equalization for M-QAM transmission with a hidden pilot sequence, IEEE Trans. Broadcast., vol. 46,
no. 2, pp. 170176, Jun. 2000.
[15] G. T. Zhou, M. Viberg, and T. McKelvey, A first-order statistical
method for channel estimation, IEEE Signal Process. Lett., vol. 10,
pp. 5760, Mar. 2003.
[16] H. Zhu and B. Farhang-Boroujeny, Pilot embedding for joint channel
estimation and data detection in MIMO communication systems,
IEEE Commun. Lett., vol. 7, no. 1, pp. 3032, Jan. 2003.
[17] J. K. Tugnait and W. Luo, On channel estimation using superimposed
training and first-order statistics, IEEE Commun. Lett., vol. 7, no. 9,
pp. 413415, Sep. 2003.
[18] A. G. Orozco-Lugo, M. M. Lara, and D. C. McLernon, Channel estimation using implicit training, IEEE Trans. Signal Process., vol. 52,
no. 1, pp. 240254, Jan. 2004.
[19] M. Ghogho, D. McLernon, E. Alameda-Hernandez, and A. Swami,
Channel estimation and symbol detection for block transmission using
data-dependent superimposed training, IEEE Signal Process. Lett.,
vol. 12, no. 3, pp. 226229, Mar. 2005.
[20] M. Dong, L. Tong, and B. M. Sadler, Optimal insertion of pilot symbols for transmissions over time-varying flat fading channels, IEEE
Trans. Signal Process., vol. 52, no. 5, pp. 14031418, May 2004.
[21] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai, On information
rates for mismatched decoders, IEEE Trans. Inf. Theory, vol. 40, no.
6, pp. 19531967, Nov. 1994.
[22] A. Lapidoth and P. Narayan, Reliable communication under channel
uncertainty, IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 21482177,
Oct. 1998.
[23] M. Mdard, The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel, IEEE Trans.
Inf. Theory, vol. 46, no. 3, pp. 933946, May 2000.
[24] A. Lapidoth and S. Shamai, Fading channels: How perfect need Perfect side information be?, IEEE Trans. Inf. Theory, vol. 48, no. 5, pp.
11181134, May 2002.
[25] S. Adireddy, L. Tong, and H. Viswanathan, Optimal placement of
training for frequency-selective block-fading channels, IEEE Trans.
Inf. Theory, vol. 48, no. 8, pp. 23382353, Aug. 2002.
[26] D. Samardzija and N. Mandayam, Pilot-assisted estimation of MIMO
fading channel response and achievable data rates, IEEE Trans. Signal
Process., vol. 51, no. 11, pp. 28822890, Nov. 2003.
[27] X. Ma, G. B. Giannakis, and S. Ohno, Optimal training for block transmissions over doubly selective wireless fading channels, IEEE Trans.
Signal Process., vol. 51, no. 5, pp. 13511366, May 2003.
[28] H. Vikalo, B. Hassibi, B. Hochwald, and T. Kailath, On the capacity of
frequency-selective channels in training-based transmission schemes,
IEEE Trans. Signal Process., vol. 52, no. 9, pp. 25722583, Sep. 2004.
[29] S. Ohno and G. B. Giannakis, Capacity maximizing MMSE-optimal pilots for wireless OFDM over frequency-selective block
Rayleigh-fading channels, IEEE Trans. Inf. Theory, vol. 50, no. 9, pp.
21382145, Sep. 2004.
[30] X. Ma, G. B. Giannakis, and S. Ohno, Optimal training for MIMO
frequency-selective fading channels, IEEE Trans. Wireless Commun.,
vol. 4, no. 2, pp. 453466, Mar. 2005.
[31] M. Coldrey (Tapio) and P. Bohlin, Training-based MIMO systems:
Part IIImprovements using detected symbol information, IEEE
Trans. Signal Process., Nov. 2005, submitted for publication.
[32] E. Biglieri, J. Proakis, and S. Shamai, Fading channels: Informationtheoretic and communications aspects, IEEE Trans. Inf. Theory, vol.
44, no. 6, pp. 26192692, Oct. 1998.
[33] V. Jungnickel, T. Haustein, E. Jorswieck, V. Pohl, and C. von Helmot,
Performance of a MIMO system with overlay pilots, in Proc. IEEE
GLOBECOM, Nov. 2001, vol. 1, pp. 594598.
[34] S. M. Kay, Fundamentals of Statistical Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[35] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
York: Wiley, 1991.
[36] M. Biguesh and A. B. Gershman, Training-based MIMO channel estimation: A study of estimator tradeoffs and optimal training signals,
IEEE Trans. Signal Process., vol. 54, no. 3, pp. 884893, Mar. 2006.
[37] H. Ltkepohl, Handbook of Matrices. West Sussex, U.K.: Wiley,
1996.
[38] L. Zheng and D. N. C. Tse, Communication on the Grassman
manifold: A geometric approach to the noncoherent multiple-antenna
channel, IEEE Trans. Inf. Theory, vol. 48, no. 2, pp. 359383, Feb.
2002.
[39] P. H. Janssen and P. Stoica, On the expectation of the product of four
matrix-valued Gaussian random variables, IEEE Trans. Autom. Control, vol. 33, no. 9, pp. 867870, Sep. 1988.
Mikael Coldrey (Tapio) was born in Bors, Sweden.

He received the B.S. degree in electrical engineering
from the University College of Bors, in 1996,
the M.S. degree in applied physics and electrical
engineering from Linkping University, Linkping,
Sweden, in 2000, and the Lic.Eng. degree and the
Ph.D. degree in electrical engineering from Chalmers
University of Technology, Gteborg, Sweden, in
2004 and 2006, respectively.
In 2006, Dr. Coldrey joined Ericsson Research, Ericsson AB, Sweden, and his research interests include
statistical and sensor array signal processing with applications to wireless communications and multiple-antenna systems.
Patrik Bohlin was born in Bors, Sweden. He received the M.S. degree in applied physics, the Lic.
Eng. and Ph.D. degrees in electrical engineering, all
from Chalmers University of Technology, Gteborg,
Sweden, in 1998, 2001, and 2005, respectively,
Since 2001, he has also been CTO of Qamcom
Technology AB, Gteborg, Sweden. His research
interests include statistical signal processing and
information theory and their applications in adaptive
antenna and MIMO systems.

Training-Based MIMO Systems-Part I Performance Comparison

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Training-Based MIMO Systems-Part I Performance Comparison

Uploaded by

Copyright:

Available Formats

5464

Training-Based MIMO SystemsPart I:

AbstractIn this paper, we make a performance comparison

Index TermsCapacity, channel

HE use of multiple-input multiple-output (MIMO) ana

references therein). The problem of channel estimation using

1053-587X/$25.00 2007 IEEE

COLDREY (TAPIO) AND BOHLIN: TRAINING-BASED MIMO SYSTEMSPART I: PERFORMANCE COMPARISON

measure that captures this effect is needed when comparing the

methods that have been previously suggested in the literature,

II. TRAINING-AIDED MIMO CHANNEL ESTIMATION

When transmitting a block of signal vectors over a MIMO

is considered deterministic and

In the general SIP scheme, the transmission of a block of

is the random data matrix and

COLDREY (TAPIO) AND BOHLIN: TRAINING-BASED MIMO SYSTEMSPART I: PERFORMANCE COMPARISON

(which we will call the training subspace), and

The relation between these parameters is also depicted in Fig. 1.

If we consider the singular value decomposition (SVD) of the

denotes the matrix of complex transwhere

On the other hand, the equivalent projection matrix for the

matrix containing the nonzero

we have the following normalization:

while the CP-based system will receive the data signal

The proof is concluded by noticing that the CP-based system,

COLDREY (TAPIO) AND BOHLIN: TRAINING-BASED MIMO SYSTEMSPART I: PERFORMANCE COMPARISON

complicated for any analytical expressions for the problem at

where the effective noise matrix

has uncorrelated elements

The idea of using a training-based system is to use the channel

We continue with lower bounding by using the chain rule and

COLDREY (TAPIO) AND BOHLIN: TRAINING-BASED MIMO SYSTEMSPART I: PERFORMANCE COMPARISON

Theorem 5: Assume that

3) Optimization Over : First, the criterion function (57) is

and where we have used that

, the error covariance matrix is given by

and setting the derivative to

Hence, by employing the result of Theorem 3 and (39) together

COLDREY (TAPIO) AND BOHLIN: TRAINING-BASED MIMO SYSTEMSPART I: PERFORMANCE COMPARISON

Fig. 3. Capacity versus transmit antennas M when N = 20;

Fig. 2 shows the attained capacity versus SNR when

Fig. 4. Capacity ratio between CP and SIP training, C =C , versus

information symbols during training, its performance becomes

Fig. 5. Capacity versus coherence time T when M =

estimate a frequency-flat, block-wise Rayleigh-fading MIMO

Fig. 6. Optimal power allocation versus coherence time T when M = N = 6,

be suboptimal in some cases. The case

IV. CONCLUSION AND DISCUSSION

COLDREY (TAPIO) AND BOHLIN: TRAINING-BASED MIMO SYSTEMSPART I: PERFORMANCE COMPARISON

The calculation of all these terms is tedious but straightforward.

where is the diagonal training matrix, and

The right rotational invariance follows directly from the fact

and again we can conclude that is rotationally invariant (this

and look at the rotated version of

The proof follows first by noticing that

and by realizing that

[4] A. J. Paulraj, D. A. Gore, R. U. Nabar, and H. Blcskei, An Overview

Mikael Coldrey (Tapio) was born in Bors, Sweden.

You might also like