Professional Documents
Culture Documents
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
a
a
I. INTRODUCTION
5465
Fig. 1. Symbol block with conventional pilot insertion, overlay pilot, and superimposed pilot. The channel coherence time is T = T + T , and the total
block energy is ( + )T + T = T .
5466
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
TABLE I
PARAMETER SETTINGS FOR THE DIFFERENT TRAINING SCHEMES
modes). Hence, we will not consider schemes with constant average power such as the overlay pilot scheme since it then becomes too inferior to the other two schemes.
1) SIP Mode: During the SIP mode, the training symbols
that are used for channel estimation are transmitted in a block
of length symbols. The SIP part of the received signal (1) can
be written as
A. Superimposed Pilot
(2)
where
is the received matrix, and
and
denote the transmitted complex-valued
known pilot symbol and random data symbol matrices, respecis the noise matrix, and
and
tively. Further,
are the average transmit powers allocated for training and
data symbols in the SIP block, respectively.
We also define the training covariance matrix
(3)
and the data covariance matrix
(4)
which are normalized to
(5)
(6)
(7)
where
and is normalized to
(9)
3) Time and Energy Constraints: The total block length of
symbols is split into two subblocks: the SIP block and the
data block. The length of the SIP block is , the length of the
data block is , and the total transmitted energy is
, where
is the mean power spent on the transmitted total block. This
energy is shared between the SIP block and the data block. Thus,
we have the following constraints:
(10)
(11)
5467
where
, and
.
Now, since is complex Gaussian distributed with i.i.d. entries,
and hence rotationally invariant, has the same distribution as
. Similarly, since the noise is i.i.d. and complex Gaussian, we
has the same distributions
also realize that the noise matrix
(i.i.d. and complex Gaussian) as , respectively. By using this
model, we see that the model discussed previously is the special
case when the time slots are chosen as the coordinate base, but
here we are not restricted to time multiplexing of the symbols
and can choose any orthogonal scheme that we like.
5468
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
Now, we can switch back to our original notation, and con, which occupies the entire training subsider
, where
, which occupies
space, and
the data subspace. Then, we can write
(20)
(21)
where
rows of
is the diagonal
, i.e.,
(30)
, and where
(31)
(23)
(24)
(25)
(26)
where
and
denotes the expected SNR at each
receive antenna. Thus, the different power levels are accounted
, and .
for in the variables
The results of this section are summarized in the following
theorems.
Theorem 1: Consider the data model given by (12) and (13),
and let the channel matrix
, data matrix
, and the noise matrix
be composed of
i.i.d. complex Gaussian random variables. Let the deterministic
training matrix, normalized according to (22), be given by
(27)
and has full-column rank . Then, in the
where
MI sense, it holds that for any such that
, there
exists an equivalent data model where the training matrix is a
matrix.
real-valued and diagonal
Proof: The theorem follows directly from inserting the
zero-padded training matrix (27) into the derivations of the diagonalized model (20).
The above theorem states that there is no loss or gain in MI between the transmitted and received signal by varying the number
.
of training vectors given that
and the same conditions as in
Theorem 2: Given
Theorem 1, the projection-based signaling scheme (16) will always have better performance (meaning larger MI) than CP for
. They have identical performance for
,
since the two schemes then are equivalent in the MI sense.
Proof: Consider the zero-padded training matrix (27) and
in (14), i.e.,
let
(28)
(32)
where
(33)
(34)
is introduced here for notational conThe diagonal matrix
venience since this matrix will appear frequently in the forthcoming derivations, and the elements of are given by
(35)
To arrive at the second equality in (32), we have assumed
has zero mean and is uncorrelated with , and that
that
. It is later argued that choosing the data
to be spatiotemporally uncorrelated maximizes the mutual
information between the transmitted and the received signal
matrices, which is also a reasonable signaling choice since the
channel is unknown in the transmitter.
After removing the known pilot symbols and using the
channel estimate as if it was the true channel, the following
signal model is used in the SIP mode:
(36)
, and is
where is the channel estimation error, i.e.,
zero-mean and uncorrelated with the LMMSE channel estimate
. The average noise power is given by the following result.
5469
Theorem 3: Let be the vectorized version of the noise madefined in (36), i.e.,
. Then, the elements
trix
of the noise matrix are uncorrelated and the average noise power
is given by
(37)
Proof: See Appendix A.
The SIP scheme ignores the fact that the total noise in (20),
besides data, also contains channel information, when it forms
the LMMSE channel estimate. Hence, in our case the MMSE
estimate is not linear, and therefore the LMMSE estimate is not
the MMSE estimate. Nevertheless, this is a conscious choice of
estimator made by the SIP scheme, and the data is simply incorporated in the noise term, essentially rendering a low-complexity channel estimator at the expense of a lower SNR.
For the CP case analyzed in [6], the LMMSE estimator is
also the MMSE estimator which makes its effective noise (the
) uncorrelated with the data. This, together
equivalent to our
with a Gaussian assumption, is the worst case noise in the sense
that it minimizes the capacity [6]. Here, the LMMSE estimator
is not the MMSE estimator and, hence, the effective noise
becomes correlated with the data. This means that if we replace
with another noise matrix that is
our effective noise matrix
uncorrelated with the data and has the same variance, it gives
only an approximate result, although, numerical evaluations indicate that the capacity is quite insensitive to doing so. In fact,
the aforementioned correlation becomes quite small since the
data power during training is most often small relative to the
differs
training power. Further, the actual SNR in the data
for different realizations of the channel matrix estimate, but we
approximate the capacity by instead using the average noise
. To allow some informality, one might even be able
power
to apply Jensens inequality [35] to argue that using the average
noise power will only give an even lower bound on the capacity
due to the convexity (with respect to the noise) of the capacity
with a
formula [2], [6]. Nevertheless, we replace the noise
and the
Gaussian noise matrix that has the same variance
property of being uncorrelated (conditioned on
and ) with
the data. This choice of noise matrix is not necessarily the worst
case noise but serves a good enough approximation of the actual
noise.
Using the channel estimate as if it was the true channel, we
have the following signal model in the data mode:
(38)
5470
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
(39)
where
is defined as the variance of
the channel estimation error. Again, it is assumed that the data is
. As for the SIP part, the effective
white, i.e.,
noise
is replaced by a Gaussian noise matrix with the same
and uncorrelated (conditioned on
and ) with
variance
. We want to point out that there is a fundamental difference
between the models given by (20) and (21), and their counterparts (36) and (38). In (20) and (21), the channel is unknown,
while in (36) and (38) the channel is known. In the next section,
we will find expressions for the capacity for the known channel
case.
D. Capacity Bounds and Optimization
be the channel
1) Mutual Information: Let
estimate, formed by treating the data
as noise. Using the data
processing inequality and the chain rule [35], the MI between
what is known and observed in the receiver and the unknown
transmitted signals can be lower bounded and written as
(40)
(41)
(42)
(51)
2) Bounds on the Capacity: The worst case noise and optimal signal distribution in an MI sense is stated in the following
theorem.
Theorem 4: Consider the following multiple-antenna channel
transmit and receive antennas
with
(52)
where
is the received signal vector,
is
is the transmitted signal
the known channel matrix,
vector, and
is the additive noise. Let the signal and
noise satisfy the following power constraints:
and
(53)
and be uncorrelated
(54)
Further, let
and
denote the
respective correlation matrices. Then the worst case noise (in the
sense that it minimizes the MI between and ) has a zero mean
, where
complex Gaussian distribution, i.e.,
is the minimizing noise covariance matrix. When the
distribution on the channel matrix is right rotationally invariant,
i.e., the probability density function (PDF)
for all unitary matrices
, then
(55)
The MI maximizing signal is also zero-mean and complex
, where
is
Gaussian distributed, i.e.,
the maximizing signal covariance matrix. When the distribution
, then
of is left rotationally invariant, i.e.,
(56)
Hence, a zero-mean uncorrelated complex Gaussian signal
maximizes the lower bound (which is given by a zero-mean uncorrelated complex Gaussian noise vector) on the MI between
the input and output.
Proof: See [6].
This signaling choice is also shown to be optimal in [2]. If
the distribution of the channel matrix (in our case ) can be
shown to be both left and right rotationally invariant, we can
apply Theorem 4.
The following theorem shows that the channel estimate in
(32) is rotationally invariant.
5471
(60)
bits/channel use
(57)
where the elements of the normalized channel estimate
will be uncorrelated with zero mean and unit variance,
and have a distribution that is approximately Gaussian. The noris given by
malization constant
and the effective SNRs are given by
(58)
(61)
Finally, the channel estimate can be identified as
(62)
(59)
Equality holds in the above expression (57) if the effective noise
terms in (36) and (38) are made up of uncorrelated Gaussian
noises. In our case, the effective noises are very Gaussian-like
since they are made up of the sum of Gaussian matrices and
products between Gaussian matrices. If that is not the case, then
the above rates represent the lower bound which is given by
the worst case noise. A comment on the tightness of the above
bound is also in place, and one can apply the same argument
used in [6] to argue that the bound is tight at low and high SNRs.
Since at low and high SNRs, the SIP scheme converges asymptotically to the CP scheme that was analyzed in [6], and it was
shown to render a tight bound at low and high SNRs. To find the
capacity, (57) has to be optimized with respect to the following
and .
parameters:
Since
(63)
5472
Hence,
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
is given by
(64)
When solving for , we see that all s equal the same con, the solution
stant. Since
follows from the constraints and yields
. This
choice of training matrix does not only maximize the effective
left rotationally invariant, which also is
SNR, it also makes
in agreement with the choice made in [6].
By inserting this training matrix into (64), the variance evaluates to
(69)
where
.
We want to choose the training matrix such that it maximizes
and
in (58) and (59). Since it is difthe effective SNRs
ficult to analytically evaluate the effective SNR during the SIP
mode, we will resort to only maximize the SNR during the data
mode. Nevertheless, it seems natural that the same choice of
training matrix, which turns out to be a scaled identity matrix,
also should maximize the SNR during the SIP mode since the
channel, data, and noise all are white.
To show that minimizing the channel estimation error also
, and
maximizes the effective SNR, we use that
start by rewriting (59) as
(65)
From (65), we conclude that the effective SNR is maximized
by minimizing the variance of the channel estimation error
,
which is done next.
is minimized. The problem
We need to choose such that
can be stated as
(66)
where
is a real-valued constant. This is a standard convex
optimization problem and may be solved by using, e.g., Lagrange multipliers. The Lagrangian is given by
(67)
Differentiating with respect to
zero yields
(68)
(70)
(71)
Fig. 2. Capacity versus SNR when N = 20; T = 10, and with optimal M
and optimal power allocation.
T = 10, and
5473
5474
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
N = 6, SNR = 10
(73)
The covariance matrix can therefore be decomposed as
(74)
where
(75)
5475
is the received
Hence
(77)
(78)
is diagonal, and,
APPENDIX B
PROOF OF THEOREM 5: ROTATIONAL INVARIANCE OF
To this end, the LMMSE estimate (32) is given by
REFERENCES
[1] G. J. Foschini, Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas,
Bell Labs Tech. J., pp. 4159, Autumn 1996.
[2] I. E. Telatar, Capacity of multi-antenna Gaussian channels, Eur.
Trans. Telecom., vol. 10, pp. 585595, Nov. 1999.
[3] L. Tong, B. M. Sadler, and M. Dong, Pilot-assisted wireless transmissions, IEEE Signal Process. Mag., vol. 21, no. 6, pp. 1225, Nov.
2004.
5476
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007
[27] X. Ma, G. B. Giannakis, and S. Ohno, Optimal training for block transmissions over doubly selective wireless fading channels, IEEE Trans.
Signal Process., vol. 51, no. 5, pp. 13511366, May 2003.
[28] H. Vikalo, B. Hassibi, B. Hochwald, and T. Kailath, On the capacity of
frequency-selective channels in training-based transmission schemes,
IEEE Trans. Signal Process., vol. 52, no. 9, pp. 25722583, Sep. 2004.
[29] S. Ohno and G. B. Giannakis, Capacity maximizing MMSE-optimal pilots for wireless OFDM over frequency-selective block
Rayleigh-fading channels, IEEE Trans. Inf. Theory, vol. 50, no. 9, pp.
21382145, Sep. 2004.
[30] X. Ma, G. B. Giannakis, and S. Ohno, Optimal training for MIMO
frequency-selective fading channels, IEEE Trans. Wireless Commun.,
vol. 4, no. 2, pp. 453466, Mar. 2005.
[31] M. Coldrey (Tapio) and P. Bohlin, Training-based MIMO systems:
Part IIImprovements using detected symbol information, IEEE
Trans. Signal Process., Nov. 2005, submitted for publication.
[32] E. Biglieri, J. Proakis, and S. Shamai, Fading channels: Informationtheoretic and communications aspects, IEEE Trans. Inf. Theory, vol.
44, no. 6, pp. 26192692, Oct. 1998.
[33] V. Jungnickel, T. Haustein, E. Jorswieck, V. Pohl, and C. von Helmot,
Performance of a MIMO system with overlay pilots, in Proc. IEEE
GLOBECOM, Nov. 2001, vol. 1, pp. 594598.
[34] S. M. Kay, Fundamentals of Statistical Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[35] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
York: Wiley, 1991.
[36] M. Biguesh and A. B. Gershman, Training-based MIMO channel estimation: A study of estimator tradeoffs and optimal training signals,
IEEE Trans. Signal Process., vol. 54, no. 3, pp. 884893, Mar. 2006.
[37] H. Ltkepohl, Handbook of Matrices. West Sussex, U.K.: Wiley,
1996.
[38] L. Zheng and D. N. C. Tse, Communication on the Grassman
manifold: A geometric approach to the noncoherent multiple-antenna
channel, IEEE Trans. Inf. Theory, vol. 48, no. 2, pp. 359383, Feb.
2002.
[39] P. H. Janssen and P. Stoica, On the expectation of the product of four
matrix-valued Gaussian random variables, IEEE Trans. Autom. Control, vol. 33, no. 9, pp. 867870, Sep. 1988.
Patrik Bohlin was born in Bors, Sweden. He received the M.S. degree in applied physics, the Lic.
Eng. and Ph.D. degrees in electrical engineering, all
from Chalmers University of Technology, Gteborg,
Sweden, in 1998, 2001, and 2005, respectively,
Since 2001, he has also been CTO of Qamcom
Technology AB, Gteborg, Sweden. His research
interests include statistical signal processing and
information theory and their applications in adaptive
antenna and MIMO systems.