Re SH Mi

Low Power Synchronization for Wireless Communication
by
Marcy Josephine Ammer
B.S. (Massachusetts Institute of Technology) 1997

M. Eng. (Massachusetts Institute of Technology) 1999
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in
Engineering – Electrical Engineering and Computer Sciences
in the
GRADUATE DIVISION
Of the
UNIVERSITY OF CALIFORNIA, BERKELEY
Committee in charge:
Professor Jan Rabaey, Chair

Professor Heinrich Meyr
Professor Borivoje Nikolic
Professor Paul Wright
Fall 2004
The dissertation of Marcy Josephine Ammer is approved:
Chair Date
Date
Date
Date
University of California, Berkeley
Fall 2004
Copyright 2004
by

Abstract
by
Doctor of Philosophy in Engineering – Electrical Engineering and Computer Sciences
University of California, Berkeley
Professor Jan Rabaey, Chair
Synchronization is increasingly important in wireless communication devices.
Synchronization performance is critical to system performance and, it is where a large amount of
design time and receiver area and power is spent.
Not only is synchronization important, but the relevance is increasing due to four factors:
1. Decreased transmit distances use lower transmit power and, therefore, receiver power
begins to dominate.
2. The wireless channel is more frequency selective at higher transmission speeds which
require increased synchronization functionality.
3. Trends toward higher bandwidth efficiency moves modulation to higher order
constellations where synchronization specifications are tighter.
4. The push for integration moves RF functionality to digital CMOS processes with low
supply voltages forcing the synchronization system to contend with more front-end
nonidealities.
There are few places where the whole topic of synchronization is covered and fewer still
where the power consumption is considered. This research shows that significant system power
savings can be realized through systematic exploration of synchronization power consumption.
1
This dissertation sets up a framework for the systematic exploration of power consumption in
synchronization systems, applies this framework to a few representative problems, and uses some
system examples to show the impact of this type of exploration.
At the component level, frequency estimation and interpolation are investigated. It is shown
that frequency estimation power reductions of up to 4x are possible while simultaneously
decreasing convergence time by up to 4x. For interpolation, it is shown that proper parameter
selection can result in a 10x reduction in power consumption.
At the system level, two non standards-based communication systems are considered. PNII is
a 1.6 Mbps personal area network system for wireless intercom type applications over short
distances (10-30 m). The original system’s frequency and phase estimation blocks are redesigned
using the framework developed here. Simultaneous reductions of 66% in synchronization energy
consumption and 72% in convergence time are achieved. PN3 is a 50 Kbps system designed for
use in wireless sensor network applications. A 300uW synchronization system was designed for
PN3. This is low enough so that further reduction has very little impact on system energy
consumption.
2
Acknowledgements
I would like to thank my advisor, Jan Rabaey, for his grand vision and subtle guidance (except
when otherwise required). If I am half as successful in my career as he has been in his, I will be
fulfilled. I would also like to thank him, in conjunction with Bob Brodersen, for creating the
Berkeley Wireless Research Center. It has been a true gift to be able to earn my Ph.D. in such a
rich environment.
Thanks to Bora Nikoloic for setting such high standards in his EE225C class. My final project
in that class was the genesis of this work. Thanks also for being the much needed harsh critic and
for the constant encouragement to do good work.
To Heinrich Meyr for his encouragement and for teaching his seminar on Digital
Communication Receivers where I first got hooked on synchronization. Much of my work grows
out of the fundamentals in his two volumes on communication receivers. Vielen Dank.
Thanks to Paul Wright for adding a different perspective with flair.
Thanks to Tom Knight, my original advisor at MIT, for supporting my move to Berkeley and
for his constant wisdom, both technical and non-technical.
To my lab-mates: Mike Sheets, Ian O’Donnell, Dave Sobel, and Johan Vanderhaegen.
Thanks, Mike, for being such a good friend as well as constantly saving me from the tools.
Thanks, Ian, for your endless supply of interesting conversations, your sense of humor, and your
cynical opinions. Thanks, Dave, for your clarity of thought, for always reminding me to be
methodical, and for helping me locate those pesky factors-of-two I always seem to be missing.
i
Thanks, Johan, for being so incredibly smart and precise. When we no longer work together, I
will sorely miss the safety net your knowledge provides.
To the old Bob-and-Jan group core: Varghese George, Marlene Wan, Vandana Prahbu, and
Jeff Gilbert. George showed me the ropes of Berkeley and all the good coffee shops and Indian
restaurants in town. Marlene is a never ending source of fun times and the inspiration that it is
possible to “do it all”. Vandana never fails to make you smile and remind you to be carefree.
And, Jeff is the sage. Thanks also to all of you for showing me what life looks like on the other
side of graduation.
A special thanks to Rhett Davis for all his untiring work on the first chip.
To my housemates: Sunny, Carol, Marina, and Sandon. Thanks for letting me be a part of
your lives, and teaching me to squeeze every ounce out of every experience.
To Tony Gray and Olin Shivers. Two great men, whose advice I always recall when things
look bad.
To my family for all the years of love and support. Especially my sisters. I will always
remember finishing my thesis as the time when Erin stopped being my little sister and became my
friend. Marissa, for having the courage to be yourself. I am so proud of your accomplishments.
And Candy, for being the consummate enduring friend. My Mom deserves a special
acknowledgement. Her constant support and pride, even in my smallest accomplishments, has
motivated me throughout.
Finally, I would like to thank Misha. I do not have the words to describe to what extent this is
all not possible without you. ‘Here’s to being speechless and those who make us so.’
Bali, Indonesia
August 28, 2004
ii
Table of Contents
Acknowledgements ..................................................................................................................... i
Table of Contents ...................................................................................................................... iii
List of Figures .......................................................................................................................... vii
List of Tables.............................................................................................................................. x
1 Introduction ............................................................................................................................. 1
2 Background ............................................................................................................................. 7
2.1 Introduction ...................................................................................................................... 7
2.2 Synchronization................................................................................................................ 8
2.3 Metrics for Comparing Algorithms................................................................................ 13
2.4 Wireless Channel Models............................................................................................... 16
3 Evaluation and Exploration Environment ............................................................................. 22
3.1 Introduction .................................................................................................................... 22
3.2 Simulation and HDL Description of Algorithms ........................................................... 24
3.3 Gate-level Power Estimation.......................................................................................... 26
3.4 Analog to Digital Converter Power Estimation ............................................................. 32
3.5 System Power Estimation Tool ...................................................................................... 33
3.6 Conclusion...................................................................................................................... 34
4 PNII System .......................................................................................................................... 35
4.1 Introduction .................................................................................................................... 35
iii
4.2 System Details................................................................................................................ 36
4.3 Synchronization System................................................................................................. 37
4.3.1 Timing Recovery..................................................................................................... 38
4.3.2 Course Timing Estimation ...................................................................................... 39
4.3.3 Fine Timing and Frequency Estimation .................................................................. 40
4.3.4 Frequency Correction and Timing Tracking ........................................................... 41
4.3.5 Phase Estimation and Correction ............................................................................ 42
4.3.6 Synchronization System Performance..................................................................... 43
4.4 Results and Conclusion .................................................................................................. 44
5 Frequency Estimation............................................................................................................ 46
5.1 Introduction .................................................................................................................... 46
5.2 Frequency Estimation Algorithms ................................................................................. 47
5.3 Power Estimation Methodology..................................................................................... 52
5.4 Algorithm Comparison and Results ............................................................................... 53
5.5 Conclusion...................................................................................................................... 57
5.6 Postscript: Application to DSSS Systems ...................................................................... 57
6 PNII System Refinement....................................................................................................... 60
6.1 Introduction .................................................................................................................... 60
6.2 Frequency Estimation Refinement ................................................................................. 61
6.3 Frequency and Phase Estimation Redesign.................................................................... 63
6.3.1 Differential Modulation Penalty.............................................................................. 64
6.3.2 Phase Error vs. SNR Degradation ........................................................................... 65
6.3.3 Feed-Forward Phase Estimation.............................................................................. 69
6.3.4 Frequency and Phase Estimation Redesign............................................................. 70
6.3.5 System Results ........................................................................................................ 74
iv
7 Interpolation .......................................................................................................................... 77
7.1 Introduction .................................................................................................................... 77
7.2 Interpolation Background............................................................................................... 78
7.3 Farrow Interpolator Exploration..................................................................................... 81
7.4 Achieving the Timing Resolution Specification ............................................................ 86
7.5 Achieving the Output SNR Specification ...................................................................... 88
7.6 Conclusion...................................................................................................................... 90
7.7 Postscript: Interpolator Hardware Implementation Specifics......................................... 91
8 PN3 System Design............................................................................................................... 92
8.1 Simplification of Synchronization ................................................................................. 93
8.2 Analog vs. Digital Implementation ................................................................................ 96
8.3 Matched Filtering ........................................................................................................... 97
8.4 Amplitude Estimation .................................................................................................... 98
8.5 Timing Estimation.......................................................................................................... 99
8.6 Digital Synchronization Scheme Summary ................................................................. 102
8.7 Analog Synchronization Scheme Summary................................................................. 103
8.8 Comparison of Synchronization Schemes.................................................................... 104
8.9 Conclusion and Future Work ....................................................................................... 106
8.10 Postscript: Simulation Environment........................................................................... 107
9 Conclusion and Future Work .............................................................................................. 109
A Power Estimation Scripts ................................................................................................... 112
A.1 Makefile ...................................................................................................................... 112
A.2 Netlist Script................................................................................................................ 115
A.3 Reporting Script .......................................................................................................... 117
A.4 Testbench .................................................................................................................... 118
v
A.5 Simulate Script ............................................................................................................ 120
A.6 Synthesis Script ........................................................................................................... 121
vi
List of Figures
Figure 1-1: Area and power of digital synchronization functions as a portion of PHY layer. a,b)
Bluetooth c) PNII d) 802.11a .................................................................................................... 2
Figure 2-1: Illustration of a typical communication system............................................................ 8
Figure 2-2: Realistic communication system includes synchronization system ............................. 9
Figure 2-3: Illustration of timing error ............................................................................................ 9
Figure 2-4: Illustration of frequency error..................................................................................... 10
Figure 2-5: Feed-forward vs. feedback estimation........................................................................ 11
Figure 2-6: Synchronization algorithm classification. Highlighted blocks are those addressed in
this thesis.................................................................................................................................. 13
Figure 3-1: Flow diagram of tools used in this thesis.................................................................... 23
Figure 3-2: Example synchronization system in Simulink............................................................ 24
Figure 3-3: Estimation accuracy requirements.............................................................................. 28
Figure 3-4: Accurate gate-level power estimation flow ................................................................ 31
Figure 3-5: Proposed power estimation method comes within 15% of the EP method for a wide
range of block sizes. ................................................................................................................. 32
Figure 4-1: PNII system block diagram. ....................................................................................... 37
Figure 4-2: Flow diagram of the PNII synchronization system .................................................... 38
Figure 4-3: Coarse timing block diagram...................................................................................... 40
Figure 4-4: Joint frequency and fine timing estimation ................................................................ 40
vii
Figure 4-5: Power loss in correlator with frequency offset ........................................................... 41
Figure 5-1: Meyr and Kay weighted and unweighted performance. ............................................. 49
Figure 5-2: Meyr weighted D = {1, 2} performance..................................................................... 51
Figure 5-3: Block diagram of the weighted Kay estimator ........................................................... 51
Figure 5-4: Block diagram of the weighted Meyr estimator ......................................................... 52
Figure 5-5: Meyr weighted vs. unweighted comparison ............................................................... 55
Figure 5-6: Kay weighted vs. unweighted comparison ................................................................. 55
Figure 5-7: Meyr vs. Kay weighted comparison ........................................................................... 56
Figure 5-8: Meyr weighted D = 1 vs. D = 2 comparison............................................................... 56
Figure 5-9: Variance of frequency estimation applied to chips versus symbols for 802.11b-like
symbols .................................................................................................................................... 58
Figure 6-1: Convergence time of different frequency estimators.................................................. 62
Figure 6-2: BER of coherent and differential QPSK, BPSK......................................................... 65
Figure 6-3: QPSK BER with Gaussian and fixed phase errors ..................................................... 67
Figure 6-4: QPSK BER with Gaussian phase error....................................................................... 68
Figure 6-5: BER vs. SNR with uniform phase error in the range of [0..lim] ................................ 69
Figure 6-6: Phase estimation variance vs. L for different SNRs ................................................... 70
Figure 6-7: System power consumption for different schemes ..................................................... 76
Figure 7-1: Block diagram of the Farrow interpolator. ................................................................. 80
Figure 7-2: Tap error (dB) vs. (N, M) and WT ............................................................................... 84
Figure 7-3: Interpolation performance for timing resolution of 1/16 ............................................ 86
Figure 7-4: Interpolation performance for timing resolution of 1/64 ............................................ 87
Figure 7-5: Interpolation performance for timing resolution of 1/1024 ........................................ 87
Figure 7-6: Interpolator performance for Wµ = 2.......................................................................... 89
viii
Figure 8-1: Digital (a) and analog (b) synchronization header structure..................................... 102
Figure 8-2: Performance breakdown of the digital synchronization scheme .............................. 103
Figure 8-3: Energy-per-useful-bit vs. packet length of analog and digital schemes ................... 104
Figure 8-4: Energy savings of 0-bit and 9-bit headers vs. 18-bit headers ................................... 106
Figure 8-5: Digital algorithm high level simulation and digital synchronization block.............. 107
Figure 8-6: Simulation results of the digital synchronization system timing correlator ............. 108
ix
List of Tables
Table 2-1: SNR degradation due to carrier phase and timing errors for PSK and QAM modulation
.................................................................................................................................................. 16
Table 2-2: Average path loss parameters for an indoor office environment at 2 GHz [ITU] ....... 18
Table 2-3: R.m.s. delay spread for 2 GHz indoor office environment [ITU]................................ 19
Table 2-4: JTC indoor office environment channel models [JTC]................................................ 20
Table 4-1: Implementation losses in PNII synchronization and detection .................................... 43
Table 4-2: BBP statistics ............................................................................................................... 44
Table 4-3: Physical layer receiver power consumption................................................................. 44
Table 6-1: New and old frequency estimation methods ................................................................ 63
Table 6-2: Frequency/phase estimation methods to be considered ............................................... 72
Table 6-3: Comparison of different Frequency/Phase Estimation Schemes ................................. 74
Table 6-4: Parameters used in system exploration ........................................................................ 74
Table 7-1: MMSE coefficients for λ = 4, (N, M) = (2, 2).............................................................. 81
Table 7-2: Vo coefficients for λ = 4, (N, M) = (2, 2) .................................................................... 81
Table 8-1: Summary of synchronization requirements for the PN3 system.................................. 96
Table 8-2: Target synchronization implementation losses ............................................................ 97
x
Introduction
1
“Assuming perfect synchronization…”
- Arbitrary Communication Text
Synchronization is an increasingly important component in a wireless communication device.
Synchronization performance is critical to system performance and it is where a large amount of
design time and receiver area and power is spent. There are few places where the whole topic of
synchronization is covered. In fact, in most texts on digital communication, the topic of
synchronization is examined very briefly if at all. Further, very few sources examine the
implementation costs of synchronization, especially the power consumption. This research shows
that significant system power savings can be realized through systematic exploration of
synchronization power consumption.
Synchronization is a significant component of wireless communication devices. Figure 1-1
highlights the area and power attributed to digital synchronization functions in three commercial
radio chips ((a) [KOK], (b) [CHA], and (d) [THO]) and one academic radio from this work (c). It
is shown that synchronization can consume up to 45% of the physical layer area.
1
Figure 1-1: Area and power of digital synchronization functions as a portion of PHY layer.
a,b) Bluetooth c) PNII d) 802.11a
The synchronization system typically has the highest clock rates and duty cycles of all digital
blocks. This, coupled with the large area, indicates that power consumption of synchronization
blocks is a significant component of physical layer power. Indeed, it will be shown in Chapter 4,
despite efforts to reduce power, the synchronization system still consumed 18% of the physical
layer power.
Not only is synchronization important, but the relevance is increasing due to four factors:
1) Decreased transmit distances use lower transmit power and, therefore, receiver power
begins to dominate.
2) The wireless channel is more frequency selective at higher transmission speeds which
require increased synchronization functionality.
3) Trends toward higher bandwidth efficiency moves modulation to higher order
constellations where synchronization specifications are tighter.
2
4) The push for integration moves RF functionality to digital CMOS processes with low
supply voltages forcing the synchronization system to contend with more front-end
nonidealities.
There are few authoritative sources where the whole topic of synchronization is addressed as a
cohesive unit. The seminal volumes by Meyr [MEY] are a noted exception. Further, very little
work on synchronization systematically considers implementation issues (again, the Meyr
volumes are an exception). Often existing research stops at complexity bound approximations.
This is to be expected. Development of synchronization algorithms requires a deep
understanding of communication and estimation theory. It is rare that someone with these skills
also has a deep understanding of circuit implementation technologies.
Most important of all, nowhere to the author’s knowledge is the power consumption of
different algorithms systematically compared. However, power consumption is one of the most
critical factors in the design of un-tethered wireless devices. Most notably, in the emerging field
of wireless sensor networks, power consumption is the most important factor [RAB2].
The topic of synchronization power consumption is too large to be solved in one dissertation.
Rather, this dissertation sets up a framework for the systematic exploration of power consumption
in synchronization systems, applies this framework to a few representative problems, and uses
some system examples to show the impact of this type of exploration. The two wireless
communication systems considered here are non-standards based systems called PNII and PN3.
PNII is a 1.6 Mbps personal area network system designed to carry voice over short distances
(10-30 m) for wireless intercom type applications [AMM]. PN3 is a 50 Kbps system designed for
use in wireless sensor network applications [SHE].
While the main focus of this work is on power consumption, it is not the only significant
metric. Circuit area, convergence time, and component cost are also important. Indeed, these
metrics are not orthogonal; often they are intricately linked. Therefore, it would be simplistic to
consider power consumption in isolation from the other criteria. Certainly, the framework
3
developed in this thesis is applicable to these other metrics. Wherever power consumption is
considered in this work, the effect on other metrics is noted. Although sometimes gains in one
metric must be traded for losses in another; sometimes both can be simultaneously improved.
Specific contributions of this thesis are:
• Definition of a framework for the systematic exploration of power consumption
in synchronization systems.
• Development of a fast and accurate method for power estimation that is within
15% accurate of the best available method and over 50 times faster. This is an
enabling step in being able to systematically characterize synchronization power
consumption over a meaningfully sized parameter space.
• A systematic exploration of feed-forward data-aided frequency estimation
algorithms that resulted in the development of straight-forward rules for which
algorithm to choose for a given system specification. Simultaneous reductions in
energy consumption and convergence time of more than a factor of 4 are possible
in some scenarios.
• A systematic exploration of the Farrow-style interpolating filter which is critical
to the future systematic exploration of most timing recovery algorithms. Joint
optimization of interpolation and ADC power consumption illustrate the ability
of this framework to lower system power consumption, not just block power
consumption.
• Application of the frequency estimation exploration results to reduce the energy
consumption of the frequency estimation unit of the PNII system by 84% and the
convergence time by 50%.
• Systematic comparison of 4 different phase and frequency synchronization
methods for the PNII system including considering differential versus coherent
4
modulation schemes. Synchronization energy consumption was reduced by 66%
resulting in a system energy consumption reduction of 7% for coherent schemes,
and it was determined at what packet lengths it makes sense to move to
differential modulation. This is the first instance, to the author’s knowledge, in
which differential versus coherent modulation was systematically evaluated from
a system power consumption standpoint.
• Using the framework developed here, a synchronization system was designed for
a wireless sensor network application that consumes 300uW (including ADC
power). This is low enough so that further reduction has very little impact on
total system energy consumption.
While full characterization of the synchronization space is not completed in this work,
following it through to completion is a worthwhile goal. Results of this work show that this type
of exploration has meaningful impact on system performance. Completion of this research will
have a few fundamental ramifications. First, it will instruct proper algorithm selection for given
synchronization parameters. Second, it will illustrate which synchronization parameters are the
most difficult to estimate in terms of power consumption or convergence time. Third, it will
highlight areas where existing algorithms are inefficient. These answers most likely change in
different channel environments and over different modulation schemes and data rates. This
information can highlight promising areas for new algorithm development. It can assist in
producing the most efficient implementation for existing wireless communication standards. And
finally, it can assist in the creation of new wireless communication standards to meet the quality
of service goals with the lowest power or lowest synchronization overhead.
The remainder of this dissertation is as follows: Chapter 2 details the background information
necessary to understand this work including defining a classification of synchronization
algorithms, and the metrics on which synchronization algorithms are evaluated. Chapter 3
describes the tools used for system simulation and analysis, implementation, and power
5
consumption estimation. Chapter 4 describes the PNII system, shown in Figure 1-1c to motivate
the necessity of this research and to provide a system example for illustrating the improvements
possible with this research. Chapters 5 and 7 delve into systematic exploration of the power
consumption of specific classes of synchronization algorithms (frequency estimation, and timing
recovery respectively). These serve as examples for how algorithm exploration should be
conducted, and what information is necessary for these results to be used in a system design.
Chapters 6 and 8 move back up the system level to apply the techniques developed here to show
the significance of the results at the system level. First, in Chapter 6, the results of Chapter 5 are
used to improve the PNII system described in Chapter 4. Second, in Chapter 8, the framework is
applied to a wireless sensor network system (where power is the primary concern) to reduce the
power consumption of the synchronization system and show the impact on the system power
consumption.
6
Background
2
2.1 Introduction
This chapter details all the background information required to understand this thesis. For
some topics the reader is referred to canonical references. Topics are more fully described here
when canonical sources don’t exist, the information is used in a unique way for this work, or the
information is deemed too fundamental to this work to be omitted.
It is assumed that the reader is familiar with basic digital communication theory to the extent
described in the text by Proakis [PRO]. In particular, familiarity with the standard modulation
schemes such as OOK, M-PSK, M-QAM, and DSSS are required. The concept of theoretical
bounds on BER versus SNR for different modulation schemes is assumed known; however
specific bounds are reiterated when used. The reader is expected to be familiar with the use of
transmit filters such as the root-raised-cosine (RRC). Basic channel concepts such as multi-path,
frequency selective vs. frequency flat fading and the basic techniques used to combat these
effects such as AGC and equalizers are assumed. Familiarity with the basic network protocol
stack (especially physical, data-link, network, and application layers) including basics of media
access control (MAC) is also helpful [ISO].
7
It is assumed that the reader is familiar with basic low power digital design principles to the
extent described in [RAB]. While no esoteric low power circuit techniques are used in this thesis,
these techniques can be applied orthogonally to these algorithms for further power reduction. It is
assumed that designers make use of the standard low power techniques available as built-in
functionality to industry standard tools, specifically, parallelization, optimizing out fixed
parameters from logic, gated clocks, low-leakage standard cell libraries, and using the lowest
supply voltage required for correct circuit operation.
The remainder of this chapter sets out to describe three other pieces of background
information. First, synchronization is described within the context used in this thesis. Second,
the metrics for comparing different synchronization algorithms are discussed. Last, the indoor
wireless channel model used for many examples throughout the thesis is defined.
2.2 Synchronization
A canonical communication system (Figure 2-1) typically considers the source and channel
coders (classified as outer receiver functionality), and some channel that perturbs symbols.
However, a realistic communication system (Figure 2-2) also considers what is called the inner
receiver consisting of the modulator, a waveform channel (one that perturbs transmitted
waveforms, not the more simplistic one that just perturbs symbols), and the synchronization
system in the receiver.
Figure 2-1: Illustration of a typical communication system
8
Figure 2-2: Realistic communication system includes synchronization system
The four salient synchronization parameters are timing (θε), phase (θφ), frequency (θΩ), and
amplitude (θΑ) (some of which may include multipath effects). Timing errors occur because of
the small mismatches in the transmitter and receiver oscillators and from the unknown time of
flight between transmitter and receiver (Figure 2-3).
Figure 2-3: Illustration of timing error
Phase errors occur because of mismatches in the transmitter and receiver carrier references
and from the unknown time of flight between the transmitter and receiver. In multipath channels,
each multipath arrival has a different time of flight, and therefore a different phase error to be
estimated.
Amplitude errors arise mostly because of attenuation in the channel, but also are contributed to
by mismatches in the transmitter and receiver front-end gain stages. As with phase errors, in
multipath channels, each multipath arrival takes a different path through the channel and therefore
has a different attenuation.
Frequency errors, more correctly termed carrier frequency errors, are cause by a frequency
mismatch in the transmitter and receiver carrier references (Figure 2-4). Frequency errors show
9
up as a rotating phase error in the received signal. While, it is possible to lump frequency errors
in with phase errors, most systems correct for frequency separately from phase, and therefore, it is
classified as a separate synchronization parameter.
Figure 2-4: Illustration of frequency error
With all four parameters, there is a notion of the rate of change being either slowly-varying or
static. What is important is whether the parameter varies enough to matter over the observation
interval. If not, it can be treated as static for the purposes of synchronization. While static
parameters can be estimated once and that estimate used for the interval over which the parameter
is deemed to be static, varying parameters need to be either continually re-estimated or
continuously tracked. Of course, re-estimating or tracking parameters costs more power and area
(and potentially more synchronization preamble bits) than estimating static parameters once.
Sometimes system design can be used to reduce the number of varying parameters, and therefore
the power consumption of the synchronization system. One instance of this is using clock
references with tighter specifications so the variation over, say, one packet is negligible. Here is
where the channel model (including the variation of clock references, and front-end components)
is critical for specifying the required functionality of the synchronization system.
Estimation algorithms can be classified according to their type along two axes: the
configuration of the estimator and parameter adjustment blocks, and what additional information
is used to achieve the estimation. There are two configurations for the estimation and parameter
adjustment blocks: feed-forward (FF) and feed-back (FB) (Figure 2-5). In FF systems, the
estimator receives the input signal and computes the parameter estimate which is fed to the
10
parameter adjustment block. In FB systems, the estimator receives the output of the parameter
adjustment bloc and computes an error which is fed back to the parameter adjustment block.
Figure 2-5: Feed-forward vs. feedback estimation
There are three categories of what additional information is used to achieve the estimation:
non-data-aided (NDA), data-aided (DA), and decision-directed (DD). When no additional
information other than the input signal is used, the estimation is termed non-data-aided. When
known data symbols are sent (such as within a synchronization header, or pilot symbols
interspersed with the data), and these known data symbols are used to help the estimation, it is
called data-aided estimation. When no known data is sent, but detected symbols are used in the
place of known-data symbols, the estimation is called decision-directed. Non-data-aided and
data-aided estimation can be performed in a feedback or feed-forward configuration, however,
since detected symbols can only be know after parameter adjustment has been made, decision-
directed estimation can only be performed in a feedback configuration.
All together, there are 20 different algorithm classifications (4 parameters x 5 estimation
types). Each classification can contain tens of algorithms that have been proposed in the
literature in addition to any new algorithms that are developed in the future. This thesis addresses
8 of these classifications in varying degrees (Figure 2-6). First, Chapter 5 performs a complete
exploration of 4 different feed-forward data-aided frequency estimation algorithms. The results
of this exploration are twofold. First, it is determined which among these four algorithms
achieves the lowest power for a given input SNR and variance requirement. Second, absolute
11
numbers for power consumption and convergence time are determined which allow these
algorithms to be evaluated in a system-level framework. Chapter 5 serves as a model for how
these comparisons should be conducted and the results that are needed to allow a system level
designer to make use of the information.
The component exploration in Chapter 5 is continued in Chapter 7. A major component of
most timing recovery algorithms of any type is a timing interpolator to perform the parameter
adjustment. Therefore, a study of timing recovery algorithms relies on accurate power
consumption estimates of interpolators of various sizes and performance. Chapter 7 performs a
thorough study of the commonly used Farrow type of interpolator over a wide range of
parameters. The results of this work can be used to conduct the study of timing recovery
algorithms of all types.
The other three chapters explore entire synchronization systems rather than just a single block.
Within these chapters, several types of synchronization algorithms are used. In Chapter 4, timing
is performed in two steps. The coarse estimation is done with a feed-forward data-aided
algorithm. The fine timing estimation is done jointly with the frequency estimation and uses a
different feed-forward data-aided algorithm. Timing tracking is done with a non-data-aided feed-
forward algorithm. Phase acquisition is performed using a data-aided feed-back algorithm, and
phase tracking is done with a feed-back decision-directed algorithm. In Chapter 6, different
frequency and phase estimation methods are compared. In addition to the method used in
Chapter 4, a feed-forward non-data-aided phase estimation method is explored for initial
estimation and tracking. In Chapter 8 a feed-forward data-aided timing recovery method is
compared to a feed-back data-aided method. In addition, a feed-forward non-data-aided
algorithm is used for amplitude estimation.
12
Figure 2-6: Synchronization algorithm classification. Highlighted blocks are those
addressed in this thesis.
2.3 Metrics for Comparing Algorithms
Systems in this thesis compared on a cost vs. performance basis. For synchronization
algorithms, cost is a multi-faceted metric. Three interrelated components usually are considered:
power consumption, area, and component cost. Area and component cost are usually inextricably
tied because each square millimeter of silicon area costs more money. However, area also
determines how small the package and potentially how small the ultimate system. Component
cost also includes the cost of external components, such as off-chip filters and crystal oscillators
(whose cost scales with required accuracy). Power consumption affects size and component cost
through the size of the battery, or in cooling mechanisms to dissipate the generated heat. Power
consumption also affects quality of service, in that the device may need the batteries recharged
more often.
Variance and convergence time are the main metrics used to measure the performance of a
synchronization algorithm. Specifically, the variance measured is that of the parameter estimate
produced (assuming the estimation is unbiased). If the estimation is biased, MMSE may be a
more appropriate metric. Convergence time is the number of symbols required to achieve that
variance. Bounds, called the Cramer-Rao bounds (CRB), are available to determine what
13
variance is theoretically possible for different synchronization parameters given the input SNR
and convergence time. The actual Cramer-Rao bounds, especially for timing estimation, depend
on the actual received waveform, so are dependent on modulation rate among other things, and
can be difficult to calculate exactly. Approximations are available, called modified Cramer-Rao
bounds (MCRB), given in [DAN] for phase and frequency.
1
MCRB (φ ) = (2-1)
2 N ( Es N 0 )
6
MCRB(Ω) =
( )
N N − 1 ( Es N 0 )
2
(2-2)
where N is the estimation length or convergence time, and Es

N0 is the signal to noise ratio for
symbols.
Tighter bounds are given in [TAV] for M-PSK signals, but are more difficult to compute.
There are algorithms for phase and frequency estimation that are known to achieve the CRB at
high SNR. The CRB for timing is given in Meyr [MEY] under some realistic simplifying
assumptions: 1) independent noise samples, 2) signal pulse shape, g(t), is real, and 3) random
data.
+∞
∫−∞ G(ω ) dω
2
1
CRB (ε ) = (2-3)
2 N ( Es N 0 ) T 2 + ∞ω 2 G (ω ) 2 dω
∫
−∞
Of course, the SNR gives a direct measure of the amplitude variance for one symbol. Therefore,
CRB ( A) = 1
N ( Es N 0 )
.
Next, the block-level metrics of variance and convergence time are translated to system-level
metrics. Convergence time is the easiest, since the convergence time for all synchronization
blocks can be summed to get the total convergence time (assuming no synchronization blocks
operate in parallel). However, translating the different variances for each synchronization
parameter into a global system specification is more difficult.
14
The official goal of the inner receiver system, as defined by Meyr [MEY], is to produce output
Y such that the outer receiver performance is as close as possible to the case where the estimated
values are equal to the actual values, i.e.
{θˆε ,θˆφ , θˆA ,θˆΩ } = {θε , θφ ,θ A ,θ Ω } . (2-4)
This combined effect can not be evaluated until the entire system is designed and simulated
together because it includes interactions between the synchronization parameters and the coding
used in the outer receiver. For this reason, it is impossible to partition separately amongst the
different synchronization blocks. Instead, the SNR margin metric is used in practice. Typically,
a communication system will specify a data rate and uncoded BER requirement. The input SNR
to the inner receiver will contain some margin over the theoretical SNR required to achieve this
BER. This SNR margin is typically how synchronization systems are specified and evaluated.
The total SNR margin is usually divided amongst the synchronization blocks using designer
experience to get an initial partitioning, and iterating once preliminary design of the different
synchronization blocks is completed. This ad-hoc process is not guaranteed to achieve the
optimal system design, but it is the method available using current information. The results of a
complete exploration of the synchronization space would allow this process to be deterministic
and achieve the optimal design. However, this process is currently prohibitive for any practical
system.
Formulas that compute the SNR degradation versus the variance of different synchronization
algorithms are used to convert between the metric of the algorithm: variance; and the metric for
the system: SNR loss. The BER degradation due to amplitude is easy to calculate since it can be
directly tied to SNR. The BER degradation for timing and phase errors is more difficult and is
treated in [MEY] for M-PSK, M-PAM, and M2-QAM modulation. [MEY] gives approximations
to the degradation, D (measured in dB) defined as the increase in Es/N0 required to maintain the
same BER as the receiver without synchronization errors. The approximation,
15
10
D= (A + 2B ⋅ Es N0 ) var[ψ ] (dB) . (2-5)
ln 10
is officially valid for BER degradations < 0.2 dB (but is pretty accurate in most scenarios for D<1
dB). Table 2-1 gives the parameters A and B for degradation due to carrier phase and timing
errors for M-PSK, M-PAM, and M2-QAM.
Table 2-1: SNR degradation due to carrier phase and timing errors for PSK and QAM
modulation
Carrier Phase Errors Timing Errors

A B A B
M-PSK 1 ( M)
cos 2 π − h′′(0)T 2 ∑ (h′(mT )T )
m
2
M =2
∑ (h′(mT )T )
2
1
2 m
M >2
M- 1 0 − h′′(0)T 2 ∑ (h′(mT )T )
2
PAM m
M2 - 1 ½ − h′′(0)T 2 ∑ (h′(mT )T )
1 2
2
QAM m
The quantity, A, accounts for a reduction in the useful signal. The quantity, B, accounts for an
increase in the variance at the input of the decision device. Observe that the degradation due to
timing errors is dependent on the transmit pulse shape, h(t).
2.4 Wireless Channel Models
The simplest channel is the additive white Gaussian noise (AWGN) channel where noise with
a Gaussian distribution of zero-mean and variance σ2 is added to symbols in the channel. This
channel is often used when exploring outer receiver functionality. To explore inner receiver
functionality, a more complicated channel must be considered. This channel model must include
the effects of the transmitter and receiver front-ends in addition to the effects of the channel (over
the air).
Effects of the transmitter and receiver local oscillators and carrier references can be modeled
in a straight-forward manner using just one offset that is the sum of the errors in both the
16
transmitter and receiver. To model timing offset in simulation, an interpolation filter can be used.
To model carrier frequency offset, the modulated waveform is multiplied by a rotating phasor.
The clock accuracy (specified in parts per million or ppm) is an important parameter because
it determines how often the timing needs to be re-estimated. If the required timing estimation
resolution is εT, where T is the symbol period, clocks can drift up to ½ εT over the course of the
packet before needing to be re-estimated. If the crystal accuracy ppm is lower than 1e6*ε/2N,
then no timing tracking is needed. Where N is the maximum number of symbols in a packet, and
ε is the fractional timing resolution requirement. A similar equation can be used for determining
whether the frequency estimation is essentially static over the course of the packet.
Amplitude and phase models are more complicated since they depend on a combination of
factors in the wireless environment. This work uses the approach given in [ITU], reproducing
here the general channel modeling equations. However, in the interest of brevity, only the actual
coefficients for a 2GHz indoor office channel are given because that is the one used in this thesis
wherever a channel model is required.
Path loss effects are divided into two effects: average path loss, and associated shadow fading
statistics. Average path loss is that loss that is common to all multipath arrivals and is given by
Ltotal = 20 log10 f + P log10 d + L f (n) − 28 dB (2-6)
where P is the distance power loss coefficient, f is the frequency (in MHz), d is the separation
distance in meters between the two terminals (d > 1 m), Lf is the floor penetration loss factor in
decibels and n is the number of floors in a multi-story building between the two terminals (only
included when n ≥ 1 ). Table 2-2 outlines the parameter values used for the indoor office
environment at 2GHz.
17
Table 2-2: Average path loss parameters for an indoor office environment at 2 GHz [ITU]
Parameter Value
P 30
f 2,000 MHz
d 1-100 m
Lf 15+4(n-1)
Paths with a line of sight (LOS) component are dominated by free-space loss and have P=20.
The indoor shadow fading statistics are log-normal with a standard deviation of 10dB for our
channel.
The radio propagation channel varies in time and with spatial displacement. Even in the static
case where the transmitter and receiver locations are fixed, the channel can be dynamic since
scatters and reflectors are likely to be in motion. The term multipath arises from the fact that,
through reflection, diffraction, and scattering, radio waves can travel from a transmitter to a
receiver by multiple paths. There is a time delay associated with each of these paths that is
proportional to the path length. Each delayed signal has an associated amplitude (with real and
imaginary parts) and together they form a linear filter with time-varying characteristics. Since the
radio channel is linear; it is fully described by its impulse response. The impulse response is
usually represented as a power density that is a function of excess delay, relative to the first
detectable signal.
Although the r.m.s. delay spread is very widely used, it is not always a sufficient
characterization of the delay profile. However, if an exponentially decaying profile can be
assumed, it is sufficient to express the r.m.s. delay spread instead of the power delay profile. In
this case, the impulse response can be reconstructed approximately as:
⎧e − t /τ r . m . s . for 0 ≤ t ≤ t max
h(t ) = ⎨ (2-7)
⎩ 0 otherwise
18
where τ r .m.s . is the r.m.s delay spread and tmax is the maximum delay ( t max >> τ r .m.s . ). Table 2-3
outlines the r.m.s. delay spreads used for the example channel. Within a given building, the delay
spread tends to increase as the distance between antennas increases.
Table 2-3: R.m.s. delay spread for 2 GHz indoor office environment [ITU]
Low value Median value High value

appearing frequently appearing frequently appearing rarely
τ r . m. s . 35 ns 100 ns 460 ns
One way to model the statistical nature of the channel is to replace the many scattered paths
that may exist in a real channel with only a few N multipath components in the model. With this
method, a complex Gaussian time variant process gn(t) models the superposition of unresolved
multipath components arriving from different angles with different delays close to the delay τn of
the n-th multipath component. Then, the impulse response h(t) is given by:
h(t ) = ∑n =1 pn g n (t )δ (t − τ n ) ,
N
(2-8)
where pn is the received power of the n-th model multipath component.
The JTC channel models [JTC] give three different instantiations of the channel for
simulations of indoor office environments. Channel A is the least severe, Channel B is
intermediate, and Channel C is extremely severe. The coefficients for the model are given in
Table 2-4. Note the indoor channel models use a flat Doppler spectrum, whereas models for an
outdoor channel usually use the Jakes Doppler Spectrum [DEN] to determine the correlation in
channel coefficients over time.
19
Table 2-4: JTC indoor office environment channel models [JTC]
Channel A Channel B Channel C

Relative Average Relative Average Relative Average Doppler
Delay Power Delay Power Delay Power Spectrum
Tap (ns) (dB) (ns) (dB) (ns) (dB)
1 0 0 0 0 0 0 Flat
2 50 -3.6 50 -1.6 100 -0.9 Flat
3 100 -7.2 150 -4.7 150 -1.4 Flat
4 325 -10.1 500 -2.6 Flat
5 550 -17.1 550 -5.0 Flat
6 700 -21.7 1125 -1.2 Flat
7 1650 -10.0 Flat
8 2375 -21.7 Flat
τ r . m. s . 43 116 598
The Doppler spectrum (whether Jakes or flat) is defined by the maximum Doppler frequency
shift in the channel given by:
f D max = v ⋅ f c / c (2-9)
where fc is the carrier frequency, c is the speed of light, and v is the maximum speed of objects in
the channel (whether the transmitter, receiver, or scattering or reflecting elements in the channel).
For the 2 GHz indoor channel, 10 Hz is a common value for fDmax (translating to a speed of
around 6km/hr).
The maximum Doppler frequency is an important parameter because it dictates how quickly
the channel is changing and therefore whether the phase and amplitude synchronization
parameters are static or slowly varying. Specifically, 1 / f D max is the coherence time of the
channel, or the time at which channel estimates become uncorrelated with each other. Therefore,
if the estimate made at the start of the packet is to be, say, x% correlated with the last symbol in
the packet, the packet length (tpacket) must be shorter than
t packet < (1 − x%) / f D max . (2-10)
The r.m.s. delay spread is also an important parameter because it determines whether the
channel is frequency selective or frequency non-selective (flat). A synchronization system for a
20
frequency selective channel must combat multipath effects (for instance with the use of an
equalizer or RAKE receiver), but no such complexity is required for the flat channel.
Specifically, 1 / τ r .m.s is the coherence bandwidth of the channel, or the frequency difference over
which channel estimates become uncorrelated with each other. Therefore, if the bandwidth of
the signal is less than 10% of the coherence bandwidth, we say the channel is flat and multipath
effects need not be considered. However, if the signal bandwidth is greater than 10% of the
coherence bandwidth, the channel is frequency-selective, and multipath effects must be taken into
account:
⎧< 0.1(1 / τ r .m.s. ) flat

BW ⎨ . (2-11)
⎩> 0.1(1 / τ r .m.s. ) frequency selective
21
Evaluation and Exploration Environment
3
3.1 Introduction
Simulation and implementation tools are an important component of this research. First, a
rich simulation environment for communication algorithms is required. Second, the ability to
move quickly from simulated algorithm to implementation is also desired. Lastly, two levels of
power estimation are needed. The first is to accurately estimate the absolute power consumption
of an algorithm. The second is to compare different synchronization systems in framework that
considers total system power consumption. A flow diagram of the different tools used in this
research is show in Figure 3-1. To compare two algorithms, only relatively accurate estimations
are required. However, an absolutely accurate power estimation, though more difficult to
achieve, is necessary to be able to use in the system framework where power consumption of
other components is included.
22
Figure 3-1: Flow diagram of tools used in this thesis
Packet-based communication systems often require re-synchronization with every packet.
Especially in ad-hoc networks where transmitters communicate with different receivers at
different times, the synchronization parameters are different every time and therefore can not be
stored between packets. In this case, the synchronization convergence time can be a significant
portion of the packet length. The energy expended in the synchronization along with the energy
transmitting and receiving the synchronization header must be calculated into a system-level
metric. Different synchronization algorithms may take different amounts of time to converge to
the required accuracy. In this case, the algorithms must be compared in a system framework.
Higher power algorithms with shorter convergence time may be favored over lower power
algorithms with longer convergence times. In order for the designer to make the appropriate
trade-off, the power estimates must be absolutely accurate, and the power consumption of other
subsystems, such as the transmitter and receiver front-end power, must be known.
23
3.2 Simulation and HDL Description of Algorithms
Synchronization algorithm implementation costs (area and power) are often dominated by
datapath operations such as multipliers and adders, with relatively simple control requirements.
Mathworks Simulink [MAT] was chosen for algorithm simulation. It is a graphical data flow tool
with many provided library functions which make it easy to simulate and analyze communication
systems. An additional program, Stateflow [MAT], is integrated into Simulink to allow graphical
entry of state machines for programming control functions. An example of a Simulink
synchronization system simulation is shown in Figure 3-2.
Figure 3-2: Example synchronization system in Simulink
For hardware coding, Synopsys Module Compiler [SYN] was chosen as the entry point for the
datapath portions of the algorithms. Its high-level HDL language allows an algorithm to be
parameterized and later synthesized in different configurations. (For instance, it’s possible to
synthesize a frequency estimation algorithm for different input SNRs and estimation lengths.) It
is built to optimize datapath operations with features such as allowing adder implementations to
24
be easily customized between carry-save and ripple-carry. It is known to achieve better area for
datapath blocks than standard HDL synthesizers [HAI].
The use of Module Compiler enables re-use of many smaller modules within larger designs.
Some built-in functions in module compiler have facilitated easy implementation of
communication algorithms in this thesis:
• Various adder types (carry-save, ripple-carry, etc.)
• Various multiplier types (booth, signed/unsigned, +/- A*B, A*(B+C) where C is one
bit, pipelined/unpipelined, etc.)
• Square (special multiplier for two identical inputs)
• Scalar Multiply ACcumulate (MAC)
• Comparators/Muxes/Selectors
• Shift registers
A small library of the following parameterized blocks built on the basic blocks has served to
implement most blocks in this thesis:
• Filters (fixed and adaptive coefficients are automatically detected by Module
Compiler)
• CORDIC (a single parameterized CORDIC slice is arrayed in several configurations
to implement iterative/pipelined angle-finder/rotator blocks)
• Complex MAC
By creating a Simulink library of corresponding parameterized blocks, larger designs can be
implemented and simulated in Simulink with good assurance that they can be quickly translated
to the equivalent behavior in hardware. Verification test-benches ensure that the Simulink and
hardware versions are equivalent through simulation.
For control flow, an automated tool, called SF2VHD [CAM], automatically converts Matlab
Stateflow diagrams into VHDL for synthesis. Since the control is usually a small part of the
25
synchronization algorithm, no effort was spent optimizing these state machine implementations
beyond compilation in standard synthesis tools.
For power comparison, each algorithm is coded as a parameterized module in Module
Compiler. Each module is synthesized as a gate-level VHDL netlist in Module Compiler for a
range of parameters, such as input SNR and estimation length. Realistic input vectors for each
block are synthesized in MATLAB by simulating the block inside a realistic system and capturing
the inputs. Each synthesized VHDL netlist from Module Compiler is sent through the gate-level
power estimation tool using the input vectors from MATLAB.
For the examples in this work, power estimation is done assuming a 0.13um technology. In
the component exploration sections of this thesis (Chapter 5 and 6), the impact of changing
technology on the presented results are discussed. In all cases, the highly automated flow allows
automatic re-characterization in a new process once new libraries are available.
3.3 Gate-level Power Estimation
The most accurate power estimation method available with current tools is to extract parasitics
from a post-placed-and-routed design and simulate in a switch-level simulator like Power-Mill or
Nano-Sim (called Extracted Physical or EP estimation method). Our own experience and reports
from our foundry show this method of power estimation to be within 15% accurate compared to
power consumption of actual chips. However, placing and routing a design can take considerable
time, and switch-level simulation is very slow. It can take up to two days to complete the
placement, routing, extraction, and simulation of a moderately sized block with today’s
computers and tools. Since this research relies on the accurate power estimation of several
algorithms across many different parameter sets (for instance over 100 frequency estimation
blocks), this research would be impossible with power estimation this slow. Therefore, a faster
power estimation method was required. The method should automatically characterize the same
algorithm over a set of parameters, and make as much use as possible of existing tools. In this
26
way, this power estimation flow benefits from the constant improvements made in the existing
tools.
Faster methods of power consumption than the EP method are available, but typically incur
errors in proportion to their estimation speed. Therefore, to get the fastest estimation feasible for
this research, it is necessary to examine the required power estimation accuracy. To reach the
correct conclusion when comparing two items, the accuracy of the estimate must be better than
the difference between the two items being compared. As stated in Chapter 1 synchronization
systems can consume around 15% of the physical layer power. In order to make an impact on
system power consumption (say greater than 5%), synchronization power consumption has to
improve by at least 30%. Estimation accuracy should be on the order of (or better than) this
desired improvement. Figure 3-3 shows an estimation accuracy of 30% and the desired
improvement of the original versus the revised system of 30%. In order to guarantee that the
actual revised system is at least 30% better than the original system, the estimates have to show
an improvement of almost a factor of two (y=50%). Since test chips are not available for
comparison, the proposed power estimation method will be compared to the EP estimation
method. Therefore, a method which is within 15% accurate of the EP method is required.
27
Figure 3-3: Estimation accuracy requirements
The fastest, but least accurate power estimation methods are statistical gate-level methods
(called PG for probabilistic gate-level). Here, the gate-level netlist is analyzed assuming
statistical activity factors on the inputs, which are propagated throughout the design to produce a
statistical activity factor for each net. Statistical activity factors are multiplied by statistical wire
load models, and statistical switching probabilities of the gates to produce a power estimate.
Because communication data is often highly correlated, these statistical methods, which assume
randomness, are not accurate enough for our purposes. For instance, in an illustrative experiment
a complex MAC with 8-bit inputs and 23-bit outputs, consumes 22 uW with random inputs, but
only 10uW with realistic inputs as would occur in a frequency estimator of a communication
system.
To capture the power savings from correlations in the data stream, the design must be
simulated to determine the actual activity factors on each net and within each gate. Gate level
simulation is around 50 times faster than switch level simulation not including time to place and
route, and requires fewer tools. Existing synthesis tools, such as Synopsys Design/Power
28
Compiler have the built-in capability to use gate-level simulation information to produce a power
estimate. However, typical gate-level power estimation with simulation (called SG for simulated
gate-level) is still not accurate enough because there are some critical components missing from
all gate-level estimation methods.
A typical flow for taking a gate-level netlist to a placed-and-routed netlist is:
• Place the gates in standard cell rows
• Insert a clock tree and route the clock net
• Insert hold time buffers to eliminate race conditions between register stages
• Route the signal nets
In comparison to the EP method, the SG power estimation is missing 3 pieces of information
which make the estimations less accurate. First and most important is the clock tree, which often
accounts for 30-50% of the block power. Second, the power of the hold time buffers can be
significant especially where there is little combinational logic between registers (such as in
communication systems components like filters and delay chains. Third, the exact wire loads are
not known until the design is placed and routed.
An accurate gate-level power estimation method (called AG for accurate gate-level) has to
address these three issues. The easiest issue to address is the wire loads. Although the exact
length of each wire is unknown before placement and routing, current tools do a good job of
estimating the average load of a wire in the system. These estimates are based on the technology
and the number of gates in the block. Since placement tools don’t use information about the
activity factor on the nets, they are just as likely to force long routes on high-activity wires as
low-activity wires. Therefore, the statistical wire load models are used. The second issue to
address is the hold time buffers. Hold time buffers are averted if there is enough combinational
circuit delay or wire delay between registers. Placing hold time buffers are placed assuming
statistical wire load models. Insertion of hold-time buffers is achieved in Synopsys Design
29
Compiler with a built-in function that fixes hold times on specified nodes. The last issue to
address is the clock tree insertion. It turns out that the exact clock tree is not necessary for power
estimation purposes. It is possible to force the tools to insert a “good enough” clock tree into the
gate level netlist. This is achieved by tagging the clock as a high-fanout node in Synopsys Design
Compiler. By placing constraints on the rise and fall times of the clock net, the tool inserts a
“good enough” clock tree into the design. By addressing these three issues, gate-level power
estimation accuracy can be within 15% of the EP method as will be shown below. Of course, the
accuracy of the estimation relies on the accuracy of the standard cell library characterization. To
achieve these results, no extra characterization was required. The foundry-supplied libraries were
characterized well enough for to meet the power estimation accuracy goals.
As process technology scales, leakage power is becoming a significant source of power
consumption both when blocks are in use and when they are in standby mode. Because leakage
power can be significant, it is included in the power consumption estimates produced by the AG
method. In standby mode, aggressive low power designs have block-level gated clocks and
power rails. By gating both the clocks and power rails, standby power is reduced to near zero and
need not be considered in the system power framework.
The new AG estimation flow is shown in Figure 3-4. Each VHDL netlist is incrementally
compiled in Synopsys Design Compiler to insert a clock tree and to add buffer delays to fix hold
time violations. The block is then simulated at the gate level in ModelSim using realistic input
vectors to verify functionality and to determine the switching activity on each node. Synopsys
Power Compiler is used to estimate the power consumption of the block using the back annotated
switching activity and statistical wire load models.
30
Figure 3-4: Accurate gate-level power estimation flow
Five frequency estimation blocks with a wide range of parameters were compared using the
AG method versus the EP method. The results are shown in Figure 3-5 along with the SG power
estimation method. Over a wide range of block sizes, the AG estimation is within 15% of the EP
estimation (see error bars) however, the SG method had errors of 30-50%. The Makefile and
scripts for running the AG power estimation for a range of frequency estimation blocks are given
in Appendix A.
31
EP vs. AG Power Estimation Method Group 1
160.00 (EP)
140.00
Group 2
(EP)
120.00
100.00 Group 1
Power (uW)
(AG)
80.00
60.00
Group 2
(AG)
40.00
Group 1
20.00 (SG)
0.00
Group 2
1 2 3
(SG)
L
Figure 3-5: Proposed power estimation method comes within 15% of the EP method for a
wide range of block sizes.
The AG power estimation method is over 50 times faster than EP method (not including the
time required to place-and-route the block and thereby extract accurate parasitics). The total time
to characterize 21 different chosen instantiations of a frequency estimation algorithm is less than
3 hours using the AG method. Execution time will vary with the size of the block, the duration of
the simulation interval, and different server processor and memory configurations.
3.4 Analog to Digital Converter Power Estimation
The analog to digital converter (ADC) is often a significant power-consuming component of a
communication system. Because different synchronization systems place different requirements
on the ADC, a method to estimate the power consumption of ADCs with different specifications
is required. In a survey of over 100 ADCs published in the literature from 1978 to 1999 [WAL],
a simple but accurate architecture-independent figure of merit (FOM) is determined for
comparing them:
32
2 SNRbits ⋅ f samp
FOM = . (3-1)
Pdiss
Here fsamp is the sampling rate, Pdiss is the power dissipation, and SNRbits is the equivalent
number of bits given by:
SNRbits = ( SNR (dB) − 1.76) / 6.02 . (3-2)
FOMs from the surveyed ADCs range between 1x1010 and 1.2x1012 with a mean around
1x1011. Given that the designer does the best design possible with the given process technology,
the FOM is dependent on how extreme are the given ADC specs relative to the fundamental
process capabilities. For instance, an fsamp that is closer to the maximum frequency of a process is
likely to achieve a lower FOM than one that has a much lower fsamp. Therefore, to predict the
power consumption of an ADC with arbitrary specifications, one needs to find an appropriate
FOM. This can be achieved by finding a similar ADC in the literature and using the same FOM,
or by extrapolating an FOM by determining how extreme the required specs are relative to the
fundamental process capabilities.
3.5 System Power Estimation Tool
To compare algorithms with different convergence times, a system-level power estimation
tool is required. For the purposes of this work, two communication system variables are
generally considered: the length of the header, and the transmit power. Other variables, such as
the number of bits per packet, and the required BER are typically fixed for a given scenario.
Because the synchronization system is well within the physical layer, a sensible metric is energy-
per-useful-bit (EPUB), taking energy over the physical layer components. EPUB may not be the
right metric for upper levels of the protocol stack, like the network or MAC layer (where network
uptime or latency may also be considered). For instance, the number of packet collisions
increases with increasing packet length. Therefore, packets with more header bits will incur more
packet collisions, and therefore, more energy per useful bit. However, for comparisons where the
33
difference in packet lengths is within 10%, the increased power consumption due to increased
packet collisions can be safely ignored. Therefore, EPUB is used because it is simple and
adequate for the purposes of this research.
The energy consumed by the system per packet includes the power in the transmitter and
receiver and is equal to:
EP = ( BS + BD )( PDiss ,TX + PDiss , RX ) + BS PS + BD PD (3-3)
Where BS is the number of synchronization bits, BD is the number of data bits, PDiss,TX is the
transmitter power dissipation including radiated power, PDiss,RX is the receiver front-end power,
PS is the baseband power when synchronizing, and PD is the baseband power when receiving
data. Energy per useful bit is computed by dividing EP by BD.
EPUB = EP (3-4)
BD
3.6 Conclusion
The MATLAB Simulink and Stateflow tools are used for simulation and analysis of
communication algorithms. SF2VHD and the developed libraries in Synopsys Module Compiler
allow quick translation into implementation. An accurate and fast power estimation method has
been developed. The key steps to getting accurate power estimation at the gate-level are to add a
clock tree, hold time buffers, and to simulate with realistic input vectors. These steps are
achieved using Synopsys Designs Compiler, Power Compiler, and ModelSim. This method has
proven to be within 15% accurate versus the EP method, and believed to be 30% accurate versus
real chip measurements.
Use of parameterized modules in Simulink and Module Compiler allow one hardware
description to be automatically synthesized, verified, and characterized over a wide parameter
space. Because the block-level estimation is absolutely accurate, it is possible to compare
algorithms in a system-level framework using the EPUB metric.
34
PNII System
4
4.1 Introduction
The PNII system is a 1.6 Mbps personal area network system designed to carry voice over
short distances (10-30 m) for wireless intercom type applications [AMM]. PNII was the impetus
for this research on low power synchronization. Much effort was expended to make PNII a low
power synchronization system. Despite these efforts, the synchronization system still consumed
18% of the physical layer power. Most of the power reduction effort was centered on circuit
implementation, such as choosing the right adder types and complex multiply structures, using
the lowest possible supply voltage, and gating clocks on unused blocks. Therefore, it was
determined that to further reduce synchronization power consumption it was necessary to move to
higher levels of design, such as up to algorithm selection or system design.
The preliminary design of the synchronization system was documented in [HUS]. Much of
the structure of the physical layer, from the data rate, modulation scheme, and ADC oversampling
rate was dictated by the front-end and system designers [YEE]. This is not an uncommon
scenario in radio design. Often the synchronization system is designed to accommodate
35
constraints dictated by other radio subsystems rather than the other way around. One goal of this
thesis is to show that this is not always an advantageous design methodology from a system
energy perspective.
This chapter is devoted to describing the original PNII synchronization system and some of
the power saving implementation methods employed. This is not to say that this system is in any
way optimal. In fact, parts of the system are provably suboptimal (as will be shown in Chapter
7). Rather, the goals here are threefold: 1) To provide an example of the design and
implementation a complete synchronization system, 2) to provide a design example for future
refinement gains to be illustrated, and 3) as motivation for the necessity of this research.
4.2 System Details
The protocol used in PNII, called Intercom, allows for ad-hoc peer-to-peer communication of
64kbps uplink/downlink between 20 sensor/communicator nodes [AMM]. A data rate of
1.6Mbps and a BER of 1e-5 is required to support this functionality.
The physical layer is made compatible with a commercially available RF front-end
(performing carrier up/down conversion), ADC, and DAC. Although the commercial
components have high power consumption resulting from their tight design specs, the PHY
accommodates significantly relaxed specs for integration with a custom, low-power front end
[YEE] (e.g. by only requiring a free-running clock with 50 ppm accuracy). The chip integrates
all other PHY receiver and transmitter functions, such as carrier detect, synchronization, and
detection.
The air-interface is direct sequence spread spectrum (DSSS) with a length 31 spreading code
at 25 Mcps (Million Chips per Second) and QPSK modulation resulting in a raw data rate of 1.6
Mbps. The primary receiver specifications are ± 100 KHz maximum carrier frequency offset
(+/-50 ppm from a 2GHz carrier reference), and a 50ppm ADC sample clock. The transmit filter
is root-raised cosine with alpha=0.3. The minimum input SNR (per chip) at the input the ADC is
36
-2.9 dB for a SNR per symbol of 12dB1. Ideal detection of QPSK symbols requires 9.6dB to
achieve a BER of 1e-5. Therefore, the 12dB input SNR gives a realistic (if overly generous)
2.4dB implementation target. The PNII supports a typical indoor frequency-selective wireless
channel with mobile units traveling at foot speeds as described in Chapter 2.
4.3 Synchronization System
Figure 4-1: PNII system block diagram.
A block diagram is shown in Figure 4-1. The RX/TX Controller interfaces with the protocol
processor and controls the data flow from one data path block to another. During receive, the
baseband signal is sampled by dual off-chip 8-bit ADCs at 100 Msps (4 samples per chip) using a
free-running clock. These 100 MHz streams are each split into 4 parallel streams of 25 MHz each
so that the BBP could operate off the slower 25 MHz chip clock reducing power by allowing a
1
The original synchronization design required an input SNR per chip of 5dB for a SNR per symbol of 19.9dB. However, as the
system specs were dictated to require 1e-5 BER, it was determined that the original SNR spec was grossly wasteful, and a lower input
SNR could achieve the design goals.
37
lower operating voltage. Parallel filter techniques interpolate the streams to increase the receiver
timing resolution to 8 samples per chip. Performing on-chip interpolation of the signal is lower
power than running the ADC at twice the rate. However, further reduction of the ADC sampling
rate is prohibited by the front-end filter specs.
4.3.1 Timing Recovery
Figure 4-2: Flow diagram of the PNII synchronization system
A flow diagram of the synchronization system is shown in Figure 4-2. The overall goal of the
timing recovery unit is to select the best of 8 timing instances per chip. This is completed in two
steps: a coarse timing estimation which estimates the timing to within 3/8 chip and a fine timing
estimation which estimates timing to within 1/8 chip. The timing variance due to quantization is
(
var(ε ) Q = 1
2OSR
)
2
(4-1)
where OSR is the relative symbol oversampling ratio. In the final timing estimation step, the
OSR is 248 (8 samples per chip * 31 of chips per symbol). Therefore, the variance for the final
timing estimate is 4.1e-6. Whereas, in the initial timing estimation step, the OSR is 8/3 * 31 =
83, for a variance of 3.7e-5. The variance of the selection process must be lower than the
variance caused by the quantization in order for the final result to be quantization-limited. If the
system is not quantization-limited, energy has been wasted in the ADC, interpolation filter, and
synchronization hardware to accommodate the unnecessarily high oversampling ratio. The SNR
38
degradation due to timing recovery for this DSSS signal with root-raised cosine data with
alpha=0.3 is determined by simulation to be 0.3 dB.
4.3.2 Course Timing Estimation
Before coarse timing estimation, the system performs carrier detect using an algorithm that
compares the code-matched filter output to an adaptive threshold, set using the RSSI
measurement. If the code-matched filter output exceeds the threshold twice with a delay of one
symbol between threshold crossings, it is assumed that the correct code is being sent and carrier
detect status is declared. The coarse timing block then estimates timing to within 3/8 chip by
selecting the best of streams 2, 4, and 7 using a data-aided feed-forward algorithm (Figure 4-3).
The variance of this algorithm is treated in [MEY], and for root-raised cosine data with α=0.3 is
given by:
1 1 0.3 1
var(ε )T = 2
( * + 2 * 0.8) (4-2)
L C 2SNR C
where C is the number of chips used in the estimate, L is the number of chips per symbol, and
SNR is given per chip. The L2 factor is due to the estimate being produced in fractions of chips,
whereas we are interested in fractions of symbols. A variance of 1.1x10-5 is achieved with
estimation performed over one symbol (C=31 chips). This is lower than the quantization error of
3.7x10-5 for this stage, so the performance is sufficiently quantization-limited.
39
Figure 4-3: Coarse timing block diagram
4.3.3 Fine Timing and Frequency Estimation
The fine timing block estimates timing to within 1/8 chip and the carrier frequency offset to
within 2.5 Hz using the unweighted Meyr algorithm (Figure 4-4). While this algorithm is
typically used solely for frequency estimation, Meyr suggests its use as a joint frequency and
timing estimator [MEY]. The timing variance of this method is not computed analytically by
Meyr, but simulation shows it to be lower than 1e-6 under worst case frequency offset conditions.
This is sufficiently smaller than the required variance of 4.1e-6. The variance of the frequency
estimation is 4.5e-5 with 35 symbol estimation, giving a 3-sigma residual offset of less than the
2.5 KHz required by the pull-in range of the PLL.
Figure 4-4: Joint frequency and fine timing estimation
40
4.3.4 Frequency Correction and Timing Tracking
The rotate and correlate block corrects the frequency offset, correlates the incoming signal
with the spreading code, and performs early/late detection to track the optimal timing instant
(using a FF NDA algorithm to choose the best of the chosen stream or one if its direct neighbors).
Since 50ppm clocks are used, the system should switch streams no more frequently than once
every 40 symbols.
The coarse frequency offset needs to be corrected before entering the PLL for two reasons.
First, the pull-in range of the PLL is limited. Second, the frequency offset must be corrected
before entering the code correlator to avoid the power loss associated with correlation in the
presence of a large frequency offset. Figure 4-5 shows post-correlation power loss as a function
of frequency offset. The power loss with a 200HKz offset is 0.45dB, while the power loss with a
2.5KHz offset is less than 0.01dB.
Figure 4-5: Power loss in correlator with frequency offset
The early-late detection circuit was a late addition so that the system could work with an off
the shelf radio. The original custom radio used the same clock reference to derive the carrier
reference and the sample clock. Therefore, once the carrier frequency offset was resolved, the
41
timing offset was known and separate timing tracking was unnecessary. This is one example of
how system-level design can greatly reduce the power consumption of the synchronization
system.
4.3.5 Phase Estimation and Correction
A digital phase locked loop (PLL) corrects the phase error of the correlated symbols using
feedback and the QPSK symbols are demodulated. During acquisition, the PLL operates in data-
aided mode using known header bits to lock to the correct phase. During data reception, the PLL
operates in decision-directed mode where sliced symbol phase is compared to the received
symbol phase to produce an error signal. As with all decision-directed algorithms, there is the
possibility of error-propagation when an incorrect decision is made. And, with all PLLs, there is
the chance of cycle-slip. Both of these occurrences typically have the catastrophic effect of
ruining the remainder of the packet. Whereas recovery from decision errors is dependent on the
loop filter coefficients, recovery from cycle-slip is highly unlikely. Where possible, coefficients
were restricted to factors of two, so that shift-and-add operations could be used instead of the
more power hungry multiplication operations.
While the complete details of optimal PLL design are beyond the scope of this thesis, the
integral and direct coefficients chosen for the second-order PLL are 1
8 and 1
2 respectively. This
results in a normalized natural frequency (ωnT) of 0.35 and a dampening factor of 0.5. The loop
bandwidth is 140 KHz. The convergence time is 19 symbols. This PLL design achieves a 1 dB
SNR degradation including cycle slipping and error propagation effects (assuming 512-bits per
packet). As expected, this performance degrades with packet length. For in-depth analysis of
PLLs, the reader is referred to [STE].
42
4.3.6 Synchronization System Performance
A breakdown in the implementation losses in the synchronization and detection system are
shown in Table 4-1. The loss from phase estimation process is worse than timing or frequency.
However, total losses are much better than the allotted 2.4dB.
Table 4-1: Implementation losses in PNII synchronization and detection
Subcomponent Loss (dB)

Timing Variance 0.3 dB
Correlation Loss w/ Frequency 0.01 dB
Offset
PLL (incl. phase noise, cycle 1 dB
slip, and error propagation)
Total Loss 1.31 dB
Total convergence time (and therefore header length) is 57 symbols. This breaks down as 2
symbols for carrier detect and coarse timing, 35 for joint frequency estimation and fine timing,
one symbol of overhead in switching from frequency estimation to phase estimation, and 19 for
the PLL convergence. This constitutes an 11% overhead on the typical packets of 512 data bits.
In the transmit mode, data bits are mapped into QPSK symbols, spread by a dual-channel
spreader, raised-cosine filtered (25-taps, alpha = 0.30), and passed to dual off-chip DACs.
In addition to the power savings already mentioned, the PNII uses other methods to save
power. The PNII incorporates 5 gated clock domains that are adaptively switched on by the
RX/TX Controller for maximal energy efficiency. Adder types were chosen between ripple-
carry, carry-save, and carry-look ahead for lowest power operation. Several structures for
complex multiply accumulate were explored and one chosen to minimize power. For code
correlation, several structures were explored for selective negation and accumulation of the chips
to form symbols including choosing the adder type in the accumulator.
43
4.4 Results and Conclusion
Average power consumption is measured from actual chips while sending a short packet
consisting of the header and 40 data bits. Longer packets have lower average power consumption
because the system consumes more power during synchronization than during data reception. A
chip plot of the PNII system was show in Figure 1-1c. The PNII chip statistics are detailed in
Table 4-2.
Table 4-2: BBP statistics
Process Tripple-well, 0.18 u digital

CMOS, with 6 metal layers
# Transistors 600K
Area Core: 2.2mm2
Die: 14.5mm2
Package 208-pin PGA
Core Power Supply 1V
I/O Voltage 1.8 V
Clock Frequency 25MHz
Avg. power/node (3 3mW (15% duty cycle)
node network)
Physical layer receiver on-state power is shown in Table 4-3. Even after a concerted effort to
reduce power, synchronization accounts for 18% of the physical layer receiver power. The fact
that most of the low power techniques were focused on the circuit level is suggestive that it is
necessary to move to higher levels of design, such as algorithm selection or system design, to
further reduce power.
Table 4-3: Physical layer receiver power consumption.
Analog RF 70 mW
Synchronization 15 mW
Percentage Synchronization 18%
This design example illustrates the need for new synchronization design methodologies
especially for types of applications like wireless sensor networks, where power is primarily
44
important. To highlight the drastic improvements that are necessary, it is illustrative to explore
scaling this system down to data rates used in a sensor network. It is estimated that after scaling
this system down to 100Kbps, it would consume 2mW, which is twice the entire wireless sensor
network node power budget [RAB2]. In addition, the header length of 57 symbols would impose
almost a 400% overhead on the frequently-used control packets of 30 bits (15 QPSK symbols). It
will be shown in Chapter 8 how the use of the tools and methodologies developed in this thesis
can produce a synchronization system for a wireless sensor network node which meets the
stringent low power goals.
If systematic exploration of the synchronization space is to be conducted, it makes sense to
start where there has the potential to be the biggest impact. In the PNII design, the frequency
estimation has the largest cost, requiring 35 symbols of the 57 symbol convergence time (over
60%). Therefore, frequency estimation was chosen as the first synchronization parameter to
explore (see Chapter 5). The combined frequency and phase estimation of PNII takes 55 of the
57 symbols. Overhaul of this part of the design is conducted in Chapter 6 which uses the results
of Chapter 5 to significantly reduce the power consumption and convergence time of the PNII
system.
45
Frequency Estimation
5
5.1 Introduction
The impetus for conducting this experiment on frequency estimation was that in the PNII
system (Chapter 4), frequency estimation required the longest convergence time of all
synchronization parameters (35 out of 57 total symbols). Therefore, potentially the largest
system improvement could be achieved by reducing the frequency estimation convergence time
and power consumption. Feed-forward data-aided estimation was chosen for initial exploration
because it is most common type of frequency estimation used in actual systems (probably because
of its relatively fast convergence time compared to FB estimation), so the results of this study
could be used by a wide variety of systems as a drop-in replacement for an existing feed-forward
data-aided frequency estimation block.
This work examines four feed-forward data-aided frequency offset estimation algorithms and
systematically compares the estimation performance and power consumption of each over
estimation length and input SNR. A modification of these algorithms is also presented that
simultaneously achieves lower power and faster convergence time.
46
In this study, it is determined which among these four algorithms achieves the lowest power
for a given input SNR and variance requirement. In addition, absolute numbers for power
consumption and convergence time are determined which allow these algorithms to be evaluated
in a system-level framework. This chapter serves as a model for how these algorithm
explorations should be conducted and the results that are needed to allow a system level designer
to make use of the information.
5.2 Frequency Estimation Algorithms
In a typical wireless communication system, imperfect up- and down-conversion caused by
nonidealities in the transmitter and receiver local oscillators (LO) result in a carrier offset at the
receiver. This offset causes a continuous rotation of the signal constellation, and must be
corrected for reliable demodulation of the received signal.
As described in [MEY] and [TAV], in the absence of ISI and with moderate frequency offset
(less than ~15% of the symbol rate, because larger offsets incur large power loss in the matched
filter), the sampled output of the matched filter (at one sample per symbol), assuming perfect
symbol synchronization, is given by
rn = an e j (φ + n∆ωT ) + wn , (5-1)
where an is the n-th (complex) data symbol, φ is the carrier phase, ∆ω is the carrier frequency
offset, T is the symbol duration, and wn is a complex Gaussian white noise process with
independent, zero-mean real and imaginary parts each with variance σ 2 = N 0 /(2 E s ) where E s is
the symbol energy and N 0 , the one-sided spectral density of the noise. Also, as in [MEY], the
convenient notation of normalized frequency offset is used, defined as Ω = ∆ωT .
Since this study is examining the case of data-aided estimation, the known data symbols are
canceled before frequency estimation. This reduces to the problem of frequency estimation with
47
an unmodulated carrier [TAV]. This is the most common use of frequency estimation in systems
where a synchronization header is used, such as 802.11b.
There are two well-known algorithms for frequency estimation operating with timing
information derived from the maximum likelihood (ML) equations. The difference between the
two depends on whether the angle of rn is taken inside or outside the averaging function. If the
angle is taken inside, the result is the Kay estimator [TAV],
{ }
L −1
ˆ = ∑ b arg r r *
Ω n n n −1
n =1 , (5-2)
if the angle is taken outside, the result is the Meyr estimator [MEY],
ˆ = arg ⎧⎨∑ b (r r * )⎫⎬

L −1
Ω n n n −1
⎩ n =1 ⎭, (5-3)
where
6 n (L − n )
bn =
L( L2 − 1) . (5-4)
Neither algorithm requires phase unwrapping (to disambiguate phases that may have “wrapped”
around π), and both are limited to frequency offsets that obey
Ω <π
(5-5)
It should be noted that, while different weighting functions, bn, can be used, the one given in
(5-4) is optimal. A simplification, suggested in [MEY], that is often used in practice, substitutes
an integrate-and-dump filter (bn=1/L) that computes an unweighted average, for the filter function
in (5-4) that computes a weighted average. This simplification is applied to both the Meyr and
Kay estimators to expand the number of estimators considered here to be four. The variance of
the weighted Ω̂ Mw and unweighted Ω̂ Mu versions of the Meyr estimator are given in [Meyr] as,
48
[ ] L(L12− 1) 2E 1/ N
ˆ
Var Ω Mw = 2
+
12 1 L2 + 1 1
5 L L2 − 1 (2 E s / N 0 )2
s 0
(5-6)
and
ˆ
Var Ω [ ] 1
Mu = 2
2
+
1 2
L 2 E s / N 0 L (2 E s / N 0 )2
. (5-7)
The simulated performance of the four estimators (Meyr, weighted and unweighted, and Kay,
weighted and unweighted) is shown in Figure 5-1.
1.E-01
Meyr Meyr Kay Kay
1.E-02 Weighted Unweighted Weighted Unweighted
1.E-03
SNR=12dB SNR=12dB
1.E-04
Variance
1.E-05
1.E-06 SNR=24dB
1.E-07
1.E-08
SNR=48dB SNR=24dB
1.E-09
1.E-10
1.E-11
0 50 L 100 150 0 50 L 100 150
Figure 5-1: Meyr and Kay weighted and unweighted performance.
The simulations match the performance predicted by Meyr very closely. As expected, at high
SNR, the performance of the two weighted estimators approach the modified Cramer-Rao bound
given in (2-2) as
6
MCRB (Ω) =
L( L − 1)( E s / N 0 ) .
2
(5-8)
While these algorithms have been derived for the flat fading channel, in practice, they also work
for the frequency selective fading channel.
49
The improved convergence time is achieved by exploiting an underutilized modification to
these algorithms described in [MEY]. In the estimator equations, the product (rn rn*−1 ) is replaced
with (rn rn*−D ) , so instead of using the current and previous symbol, the current and D-th previous is
used. While this is not a new result, it is often ignored in the literature. The variance of the
estimator is improved roughly as D. For instance, the performance of the unweighted Meyr
estimator is given in [MEY] by
[ ]
ˆ
Var Ω Mu =
1 ⎛⎜ D
2 ⎜ 2
2
+
1 2
D ⎝ L 2 E s / N 0 L (2 E s / N 0 )2
⎞
⎟
⎟
⎠. (5-9)
The algorithms are now limited to frequency offsets that obey
ΩD < π
 (5-10)
In practice, many systems can tolerate D > 1. If following the rule of thumb that frequency
offset should be less than 15% of the symbol rate, then D ≤ 3 is possible. In 802.11b, with a
25ppm carrier offset from a 2.4GHz reference, the maximum frequency offset is ±120 KHz,
allowing D = 4 to be used. Figure 5-2 shows that even D = 2 yields a huge improvement
(decrease) in L for a given variance.
50
1.E-01
D=1 D=2
1.E-02
1.E-03
1.E-04 SNR=12dB
Variance
1.E-05
1.E-06 SNR=24dB
1.E-07
1.E-08
SNR=48dB
1.E-09
1.E-10
0 50 L 100 150
Figure 5-2: Meyr weighted D = {1, 2} performance
The block diagrams for the Kay and Meyr weighted estimators are shown in Figure 5-3 and
Figure 5-4. To implement the unweighted estimators, one or two scalar multipliers are removed
from the Kay or Meyr estimators respectively.
I
Q Z-D complex Rect
X to
Polar
X SUM
Ω̂ Kw
bn clear
Figure 5-3: Block diagram of the weighted Kay estimator
51
I
Z-D complex
XX S
Q Rect Ω̂ Mw
X SUM to
Polar
bn clear
Figure 5-4: Block diagram of the weighted Meyr estimator
The goal is to choose the frequency estimation algorithm that gives the lowest system power
for the required variance. The Mery and Kay algorithms seem to have similar hardware
complexity upon first glance because they consist of the same operations but in a different order.
However, the ordering of operations in hardware can have a large impact on the power
consumption. The simplification suggested in [MEY] of bn=1/L, reduces the hardware, but incurs
a performance penalty. It is to be investigated at what point, if any, this hardware simplification
will actually decrease energy consumption. Increasing D requires marginally more hardware but
gives a significant improvement in performance. It is expected that this will be a good trade-off
because the hardware cost is so small. It is the goal of this study to provide the information,
beyond convergence time and variance, that is required to choose the best algorithm for a low
power system.
5.3 Power Estimation Methodology
Each frequency estimation algorithm was coded as a parameterized module in Synopsys
Module Compiler. Each module was synthesized in Module Compiler for a range of parameters,
such as input SNR and estimation length. It was then placed through the block estimation tool
described in Chapter 3.
Energy, rather than power, is used as the cost metric for each block. This is because the
frequency estimation takes a different number of cycles depending on the input SNR, required
estimation variance, and which algorithm is selected. Aggressive low power designs will gate the
clock and power rails to the frequency estimation block when not in use. Therefore, the way to
52
fairly compare different blocks is the energy consumption, which is the power consumed when
the block is on times the amount of time the block needs to be on to achieve the desired variance.
The energy consumption reported here is for a 0.13um CMOS process. While the actual
energy consumption will change for different processes, the comparison of one algorithm vs.
another is valid for most contemporary processes. Obviously, the ratio of leakage power to
switching power and power consumed in the wires will vary between process and this will alter
the crossover points of the curves; however the general results will remain true.
5.4 Algorithm Comparison and Results
For each implementation, it is assumed that the number of bits at the input to the estimator is
scaled depending on the input SNR. This is a reasonable assumption because most systems
would not pay the cost penalty of implementing an ADC that converted more bits than necessary
nor a frequency estimator that achieved better precision than was needed. The bit widths are
scaled up in subsequent blocks to accommodate the growing precision. The accumulators are
pre-scaled to accommodate the summation of L samples, and the precision of the weighting taps,
bn, is increased with L. The bn coefficients are hard-wired before synthesis for the lowest power
operation. The rectangular-to-polar conversion is performed by a CORDIC [TUR] and the
number of CORDIC stages is increased depending on the required precision. These adjustments
ensure that the hardware is not significantly limiting the expected variance.
The resulting energy consumption of each estimator is shown versus variance for a range of
input SNR and L. Since lower variance and lower energy consumption are desired, data points to
the bottom and left are better. While this is the right presentation of the data for optimizing the
energy of the frequency estimator in isolation, L must be considered if a system-wide reduction in
power consumption is to be achieved because the RF and analog front-end are on for different
amounts of time. For instance, in the case where the front-end power dominates that of the
53
frequency estimation, choosing an algorithm with smaller L may optimize system energy even if
it has higher frequency estimation energy. Since the absolute energy consumption and the L for
each data point is given in the graphs, the designer can make the appropriate trade-off.
Obviously, cases where both the power consumption and convergence time (L) are decreased for
the same variance are hands-down winners.
Figure 5-5 compares the energy consumption vs. variance of the weighted and unweighted
versions of the Meyr algorithm. At low SNR and at high required variance, it is more energy
efficient to use the non-weighted version. Here, there is a small difference in variance between
the two algorithms, so the hardware simplification of unweighted combining pays off. However,
at high SNR or low variance, it is more energy efficient to use the weighting function. Here the
energy savings from the unweighted averaging are outweighed by the longer correlation times
required to overcome the degradation in variance. For instance, at 24db SNR, and a required
variance of 3x10-7, the unweighted estimator converges in 128 samples, whereas the weighted
estimator takes only 64 samples and as a result, consumes marginally less energy.
Figure 5-6 compares the weighted and unweighted versions of the Kay algorithm. For the
Kay estimator, it is almost always better to use the weighted version of the algorithm. This is due
to the variance of the unweighted version of the Kay algorithm severely under performing the
weighted version. In this case, the hardware simplification of an unweighted average is not worth
the degradation in variance. For instance, at 24db SNR, and a required variance of 2x10-6, the
unweighted estimator converges in 128 samples, whereas the weighted estimator takes only 32
samples and as a result, consumes 1/3 as much energy.
54
1.E+05
SNR=48dB SNR=24dB SNR=12dB
L=
1.E+04
128
64
32
1.E+03
16
Energy (pJ)
8
1.E+02
4
2
1.E+01
Unweighted
Weighted
1.E+00
1.E- 11 1.E-09 1.E- 07 1.E-05 1.E-08 1.E-06 1.E-04 1.E-02
1.E-06 1.E- 04 1.E- 02 1.E+00
Variance
Figure 5-5: Meyr weighted vs. unweighted comparison
1.E+04
SNR=24dB SNR=12dB
L=
1.E+03
128
64
32
Energy (pJ)
1.E+02 16
8
4
1.E+01
Unweighted 2
W eighted
1.E+00
1.E- 09 1.E-07 1.E- 05 1.E- 03 1.E- 01 1.E- 08 1.E- 06 1.E- 04 1.E- 02 1.E+00
Variance
Figure 5-6: Kay weighted vs. unweighted comparison
55
1.E+05
SNR=48dB SNR=24dB SNR=12dB

L=
1.E+04 128
64
32
1.E+03
16
Energy (pJ)
8
1.E+02
4
2
1.E+01
Meyr
Kay
1.E+00
1.E- 08 1.E-06 1.E- 04 1.E- 02 1.E- 07 1.E- 05 1.E- 03 1.E- 01
1.E- 11 1.E- 09 1.E- 07 1.E- 05
Variance
Figure 5-7: Meyr vs. Kay weighted comparison
1.E+05
SNR=24dB SNR=12dB
L=
1.E+04
128
64
32
1.E+03
16
Energy (pJ)
8
1.E+02
4
2
1.E+01
D=1
D=2
1.E+00
1.E- 08 1.E- 06 1.E- 04 1.E- 02 1.E- 06 1.E- 04 1.E- 02 1.E+00
Variance
Figure 5-8: Meyr weighted D = 1 vs. D = 2 comparison
Figure 5-7 compares the weighted versions of the Meyr and Kay algorithms. The weighted
Kay algorithm is almost always better than or equal to the weighted Meyr algorithm. At low
SNR, the marked advantage of the Kay algorithm is due to the combination of achieving better
variance and requiring considerably less hardware to implement than the Meyr algorithm. At
high SNR where the algorithms have similar variance performance and similar hardware
56
requirements, the minor differences mostly result from the correlation of the data as it flows
though the hardware. At high variance there is little difference between the two, while at low
variance the Kay algorithm wins out.
Figure 5-8 compares the weighted version of the Meyr algorithm for D=1,2. Increasing D is
usually the right choice, especially for low variance. The power penalty is very small (only one
extra register) and the convergence time can be markedly better. For example, for an input SNR
of 12dB and required estimation variance of 2x10-5, the convergence time is decreased by a factor
of 4 while simultaneously decreasing the energy consumption by a factor of 4.3.
5.5 Conclusion
Four feed-forward frequency estimators were characterized for energy consumption and
variance for a given input SNR and correlation length. It was found that the weighted Kay
estimator is a safe bet for all regions of operation, especially for high SNR and low required
variance. The unweighted Meyr estimator may be used for low SNR and high required variance.
Exploiting D is the most powerful way to simultaneously decrease convergence time and energy
consumption especially for low required variance. It is surprising to find that certain hardware
simplifications, such as using the smallest D (D=1) and unweighted averaging does not usually
result in lower energy consumption. The degradation in variance due to these simplifications
requires longer convergence times and more energy consumption.
5.6 Postscript: Application to DSSS Systems
For DSSS, it is sometimes suggested in the literature to apply these frequency estimation
algorithms to chips rather than to post-correlated symbols to maximize D; this is not usually
advantageous. Whereas the normalized frequency offset, Ω = ∆ωT, has been used thus far, when
comparing the variance between chips and symbols, the non-normalized variance,
57
Var[∆ω]=Var[Ω]/T2, must be used. For the same convergence time, and assuming minimal
power loss in the code correlator due to frequency offset, the performance when operating on
chips is significantly worse than when operating on symbols. Figure 5-9 shows the difference for
802.11b-like symbols. The difference is more pronounced for long convergence times and low
SNR. Therefore, estimation should be conducted on post-correlated symbols.
1.00E-02 Chips 12dB Chips 24dB

Symbols 12dB Symbols 24dB
1.00E-03
Symbol Variance
1.00E-04
1.00E-05
1.00E-06
1.00E-07
0 50 100 150
Convergence Time (Symbols)
Figure 5-9: Variance of frequency estimation applied to chips versus symbols for 802.11b-
like symbols
The only caveat is with a large frequency offset. If, the constraint in (5-5) is not satisfied for
symbol operation, one could operate on chips using the algorithms described here without having
to resort to more complex FFT-based algorithms. Even if (5-5) is satisfied, there is an SNR
degradation in the code correlator due to the large frequency offset. For 802.11b-like symbols
(11-bit barker sequence spreading, root-raised cosine transmit and receive filters w/ 50% excess
bandwidth), the power loss for correlation prior to frequency-offset correction is approximately
3db with a 600 KHz offset. Therefore, it may be advantageous to operate on chips because
correlation to symbols causes an SNR loss. A coarse/fine estimation may be employed where
58
coarse estimation is performed on chips, and then fine estimation performed on coarsely-
corrected symbols.
In all cases, because of the SNR degradation due to correlation in the presence of frequency
offset, even if frequency-offset estimation is performed on symbols, the frequency-offset
correction should be applied to chips.
59
PNII System Refinement
6
6.1 Introduction
The PNII design (Chapter 4) consisted of phase and frequency estimation algorithms that
required a total of 55 symbols (out of a total 57) to converge. Here, an exploration is conducted
within a system framework to improve these two systems. Two levels of refinement are applied
to the PNII system. First, keeping the existing architecture, the results of Chapter 5 are applied to
see if the frequency estimation block power consumption and convergence time can be improved
by selection a different FF algorithm. Second, more radical changes are considered including FF
phase estimation rather than the FB DPLL currently in use. Also, the use of a differential
modulation scheme is explored in place of the coherent QPSK currently in use. All exploration is
done within the context of lowering system power consumption, so the system power
consumption estimation tool is used to compare the various alternatives.
The original PNII system was designed in a 0.18 µm CMOS process. Since that process is no
longer available, the refinement of the system is conducted for a 0.13 µm process. Therefore, the
numbers are not directly comparable to the measurements in Chapter 4. However, the original
system power consumption is re-estimated for the new process and all refinements compared to
60
that estimation. A few modifications were made to the original system to correct problems
causing unnecessary power consumption in order to make a more fair comparison. First,
although clocks to the sub-blocks were gated when not in use, the input signals continued to
switch causing a non-negligible amount of power to be consumed. In this study, the input signals
are also gated (using the clock gating signal). Second, the original system included an early-late
correlator so that it could operate with an off the shelf radio front-end. However, in this study,
we are assuming the use of the custom radio front end which doesn’t require the timing tracking
unit. Therefore, the rotate/correlate block power is reduced by a factor of 3. The original system
now consumes 375 nJ over the 57 symbol synchronization header, an average of 5.4 mW during
synchronization, and 6.2 mW during data reception.
6.2 Frequency Estimation Refinement
The frequency estimation component of the PNII system described in Chapter 4 is re-
examined using Chapter 5 as a guide for reducing power consumption and convergence time.
The complete specs for the frequency offset estimation block for the PNII system is described
in Chapter 4. The relevant parameters for the following discussion are: The maximum input
frequency offset is 210 KHz. The specification for frequency estimation is a variance of ΩT =
4.5e-5 so that the 3-σ variation is within +/- 2.5 KHz frequency offset. The minimum input SNR
is 12 dB.
The original design uses the unweighted Meyr estimator with D = 1 and L = 35.
Figure 6-1 details the performance of the different Meyr and Kay algorithms achieving 4.5e-5
variance. The unweighted Meyr algorithm used in the original design performs the worst in terms
of convergence time at 35 symbols. The weighted Kay algorithm performs the best with
convergence time of 17 symbols for D = {1, 2, 3} and convergence time of 18 symbols for D = 4.
When the convergence time is small, it is not uncommon for there to be convergence time
61
increases for a larger D. In essence, the extra delay of D symbols in waiting for the first
estimation is larger than the corresponding reduction in convergence time.
Figure 6-1: Convergence time of different frequency estimators
The requirement of both the Meyr and Kay algorithms is that ΩD < π . The maximum
frequency offset of 210 KHz implies that D = 1 is the largest D possible without using a more
complicated coarse/fine estimation method. Fortunately, in this instance, the weighted Kay
algorithm with D = 1 gets the best performance and there is no reason to resort to the more
complicated coarse/fine estimation schemes.
Table 6-1 details the new frequency estimation method for PNII vs. the original one. The
convergence time for frequency estimation is reduced from 35 symbols to 17 for a savings of
50%. Over the entire original synchronization header of 57 symbols, this results in 30% savings.
Due to the shorter convergence time and lower power consumption, the new algorithm consumes
16% of the original algorithm’s energy consumption.
62
Table 6-1: New and old frequency estimation methods
Algorithm D Convergence Time Total Energy (pJ)

(symbols)
Unweighted Meyr 1 35 3407
Weighted Kay 1 17 535
Reduction 50% 84%
This shows that the work of this thesis and especially in Chapter 5 can result in significant
system energy and convergence time savings as applied to actual systems.
6.3 Frequency and Phase Estimation Redesign
Further refinement of the PNII system is explored further by deviating from the original
design. The phase and frequency estimation algorithms required a total of 55 symbols (out of a
total 57) to converge. Therefore, a joint optimization of these two systems is conducted to see if
an improvement can be made. First, alternative phase estimation schemes are explored where a
FF algorithm is used in lieu of the FB DPLL in the original design. Lastly, a change in the
modulation scheme itself is considered. Differential PSK can reduce the synchronization
overhead, but incurs a BER penalty versus the coherent QPSK modulation in the original design.
These trade-offs are explored in a system power consumption framework.
In many standards-based wireless communication systems (i.e. 802.11b), the modulation
specified is differential PSK, or DPSK. This choice is made to alleviate the synchronization
requirements since in using differential modulation schemes, a coherent phase does not need to be
estimated and tracked. As will be shown, this also relieves the frequency estimation
requirements. However, differential modulation incurs a BER penalty as compared to coherent
modulation. The PNII system was analyzed to see if system energy consumption is reduced or
increased if differential modulation is used.
63
6.3.1 Differential Modulation Penalty
To fairly evaluate the systems, it is necessary to examine the performance degradation of
differential BPSK and QPSK versus the coherent versions. Proakis [PRO] gives the bit error rate
for Differential PSK (DBPSK and DQPSK) as
1 −γ b
Pb 2 = e (7-1)
2
1
1 − ( a 2 +b 2 )
Pb 4 = Q1 (a, b ) − I 0 (ab )e 2 (7-2)
2
(
a = 2γ b 1 − 1 2 ) (7-3)
(
b = 2γ b 1 + 1 2 ) (7-4)
where Q1(a,b) is the Markum Q function and I0(x) is the modified Bessel function of the first
kind of order zero [PRO]. The results are plotted in Figure 6-2 as well as Pb for coherent BPSK
and QPSK:
⎛ 2 Eb ⎞
Pb = Q⎜⎜ ⎟.
⎟ (7-5)
⎝ N0 ⎠
The SNR degradation for DBPSK is less than 1dB for Pb < 1e-5. However, the degradation
for DQPSK is 2.3 dB for moderate to high SNR. Higher order modulation schemes, with M > 4,
typically incur a 3 dB penalty.
64
Figure 6-2: BER of coherent and differential QPSK, BPSK
6.3.2 Phase Error vs. SNR Degradation
To compare the different schemes, the designs are normalized to a common design target. The
instantiation of each scheme is designed to meet the design target, and then the system power
consumption of all the designs meeting the same design target are compared against each other.
The design target is specified as an SNR degradation versus the ideal detection scheme. Different
kinds of phase errors will affect the BER in different ways. So, a discussion of phase error effects
on BER is required.
The probability of symbol error for PSK modulation in AWGN [PRO] is
π
PM = 1 − ∫ πM pΘ r (Θ r )dΘ r (7-6)
−
M
where,
1 − 2γ s sin 2 Θ r ∞ −(V −
pΘ r (Θ r ) = ∫0 Ve
4γ s cos Θ r ) 2 / 2
e dV . (7-7)
2π
Assuming gray-coded symbols, the BER, Pb can be computed as,
65
1
Pb = PM . (7-8)
log 2 M
For BPSK and QPSK, (7-6) exactly matches the analytical formulas
⎛ 2 Eb ⎞
P2 = Q⎜⎜ ⎟
⎟ (7-9)
⎝ N0 ⎠
2
⎛ 2 Eb ⎞ ⎛ ⎞
P4 = 2Q⎜⎜ ⎟ − Q⎜ 2 Eb ⎟ (7-10)
⎟ ⎜ N ⎟
⎝ N0 ⎠ ⎝ 0 ⎠
[PRO]. Therefore, the BER for both BPSK and QPSK is
⎛ 2 Eb ⎞
Pb = Q⎜⎜ ⎟,
⎟ (7-11)
⎝ N0 ⎠
which is exact for BPSK and slightly pessimistic, but fairly accurate for QPSK. There is no
closed form solution for the integral for M>4, so it is computed numerically.
The BER given a fixed phase offset, ε, can be computed by changing the integration limits on
the integral in (7-6) to [− π M + ε K π M + ε ]. The results for QPSK and errors of 1 ⋅ 10 −1 ,
1 ⋅ 10 −2 , 1 ⋅ 10 −3 radians is shown in Figure 6-3 (labeled “fixed”). It can be seen that errors
below 1 ⋅ 10 −3 have negligible BER degradation (<0.1dB) versus the optimal detection case
also shown.
66
Figure 6-3: QPSK BER with Gaussian and fixed phase errors
The BER for a random phase error can be computed by evaluating the stochastic integral
∞ π +ε
⎡ ⎤
E [PM ] = ∫ ⎢⎣ ∫−π M +ε pΘ r (Θ r )dΘ r ⎥⎦ p (ε )dε
1 − M
(7-12)
ε = −∞
where pΘ r (Θ r ) is defined as in (7-7). The results for QPSK and zero-mean Gaussian phase
noise with variance 1e-1, 1e-2, 1e-3, and 1e-4 are also shown in Figure 6-3. Simulations match
these calculations very well. It can be seen that variances lower than 1e-3 have negligible BER
degradation (< 0.1 dB) versus the optimal detection case. Note that stochastic phase errors with
variance σ2 have slightly worse performance than constant phase errors of σ especially at high
variances.
67
Figure 6-4: QPSK BER with Gaussian phase error
Meyr [MEY] computes an approximation to the BER degradation (in dB) for stochastic phase
noise as
10 ⎛ ⎞
⎜1 + 2 log 2 ( M ) cos ( pi / M ) b N ⎟ var(φ )
E
D= 2
(7-13)
ln(10) ⎝ o⎠
where, M is the constellation size, and Eb

N0 is the signal to noise ratio per bit. These are also
plotted along with the calculated values in Figure 6-4. The Meyr equation is designed to be valid
for small SNR degradations (< 0.2 dB). In practice, the Meyr equation is a good approximation
except for very large phase variances, such as 1e-1. However for variances of 1e-2 or smaller, the
Meyr approximation fits very well.
Phase errors are not always Gaussian distributed. Sometimes, as in symbols with a residual
frequency offset, the phase is uniformly distributed. The BER can be calculated using the same
technique for the Gaussian distributed errors with the Gaussian PDF replaced with a uniform
PDF. Figure 6-5 shows the BER of QPSK symbols with phase errors uniformly distributed with
different bounds.
68
Figure 6-5: BER vs. SNR with uniform phase error in the range of [0..lim]
6.3.3 Feed-Forward Phase Estimation
A FB algorithm for phase estimation, the DPLL, was used in the original PNII design. FF
algorithms are known to have faster convergence time than FB algorithms. Therefore, a FF phase
estimation scheme is considered in this exploration. The most common FF phase estimator is the
Viterbi&Viterbi (V&V) estimator [TAV],
1
arg ⎧⎨∑n =0 rn e jM arg{rn } ⎫⎬
N −1 L
φˆ = (7-14)
M ⎩ ⎭
where L = 2 has been shown to be nearly optimal for QPSK symbols. The estimator is NDA, and
therefore has a π/M phase ambiguity. This ambiguity can be resolved with the use of known
synchronization header bits.
A plot of V&V variance versus estimation lengths for different SNRs is shown in Figure 6-6.
The modified Cramer-Rao bound,
69
1
MCRB(φ ) = , (7-15)
2 N ( Es / N 0 )
for each SNR is also shown. It can be seen that SNRs of 12 dB and above essentially achieve the
MCRB.
Figure 6-6: Phase estimation variance vs. L for different SNRs
It is beyond the scope of this thesis to compare different FF phase estimation schemes or to
estimate the power consumption for a wide range of phase estimator parameters. However,
whenever a FF phase estimator is required for the following discussion, the V&V estimator will
be used, and the power consumption will be estimated for that specific instantiation of the
estimator.
6.3.4 Frequency and Phase Estimation Redesign
Finally, enough background information has been given to describe and compare the different
schemes considered in this exploration. A maximum packet length of 1024 bits is assumed along
with the typical channel model in Chapter 2. 50 ppm crystals are assumed as in the original
70
design. The design goal is an implementation margin of 1dB at 12dB input SNR for coherent
modulation and a margin of 1 dB at 14.3 dB input SNR for differential modulation.
Using (2-10), the 95% correlation time of the channel is 4032 symbols. Therefore, phase and
amplitude of the channel are static throughout the 1024-bit packet. However, the frequency offset
can be up to 210 KHz, which, unless corrected, causes a fast-changing phase offset. If the
frequency offset can be corrected to a tolerance that makes the phase seem static over the course
of the packet, phase can be estimated once and not tracked throughout the packet.
Using Figure 6-3 and Figure 6-5, it can be seen that a constant phase offset of 1e-1 (referred to
as “Bound 1”), a Gaussian phase offset with variance 7e-3 (referred to as “Bound 2”), or a
Uniform phase offset with bound 1.5e-1 (referred to as “Bound 3”) can be tolerated without
exceeding the 1dB implementation margin.
The four frequency/phase estimation methods to be considered are shown in Table 6-2. The
first scheme is the original scheme described in Chapter 5 with the improvements described in
§6.2. The second scheme uses an estimate-once method where the frequency offset and phase are
estimated only once at the start of the packet. The severe frequency estimation requirements for
this system may render this scheme impractical. The third method uses FF frequency and phase
estimation, but re-estimates the phase every symbol. This method gives the most relaxed
frequency estimation requirements, and therefore the shortest packet header length, of any of the
coherent schemes. Lastly, the differential QPSK modulation scheme is considered. The DQPSK
scheme has the most relaxed frequency estimation requirements, and no need for phase
estimation, so it achieves the shortest packet header length, but as was shown above, incurs a
2.3dB SNR penalty.
71
Table 6-2: Frequency/phase estimation methods to be considered
Method Modulation Frequency Estimation Phase Estimation

Original: FB Coherent FF, var=4.5e-5 PLL, pull-in range
Coherent 2.5KHz
Tracking
Estimate Once Coherent FF, var=9.5e-9 FF, estimate once,
var=7e-3
FF Coherent Coherent FF, var=2.8e-4 FF, re-estimate every
Tracking symbol,
Var=3.5e-3
Differential Differential FF, var=1.1e-3 none
The revised original method has been described §6.2 and takes 17 symbols for the frequency
estimation and 19 symbols for the PLL for a total of 38 (including the additional 2 for coarse
timing). Energy consumption is 271 nJ during synchronization. Power consumption averages
5.74 mW during synchronization and 6.18 mW during data reception. It was shown in Chapter 4
that this method achieves better than the required 1 dB implementation margin for frequency and
phase estimation.
The estimate once method requires a small residual frequency offset so that the maximum
phase error over the length of the packet is less than 1.5e-1 radians. Then, since the frequency
offset shows up as a constant phase error ramp over the lenght of the packet, the BER will follow
a uniform distribution of Bound 3. The required frequency estimation variance to achieve this
error is 9.5e-9. However, the convergence time for this variance is greater than 100,000 symbols.
Therefore, an estimate once scheme for phase and frequency is not practical for systems with a
large frequency offset and will not be considered further.
A FF coherent tracking method estimates the frequency once at the start of the packet, and re-
estimates the phase every N symbols. Therefore, the frequency estimation needs only be accurate
enough so that the phase error remains small over N symbols. The phase error allowance is split
evenly between the phase estimator and the frequency error over N symbols. Therefore, every N
symbols, the phase is re-estimated to within a variance of 3.5e-3 (½ Bound 2) using an estimation
72
length of L. For 3.5e-3 variance at 12 dB input SNR, the V&V estimator takes L = 5 symbols to
converge. The energy consumption of the V&V estimator for L = 5, and a symbol rate of 806
KHz is 647 uW. The initial frequency offset variance requirement is 2.78e-4/N2 (to achieve a 3-σ
variation of ½ Bound 1, the frequency offset variance is (½ Bound 1/3/N)2).
With increasing N, the power consumption during reception decreases, however, a tighter
frequency estimation variance is required thereby increasing the convergence time of the
frequency estimator. The smallest packet header is achieved when N = 1 and the frequency
estimation variance is the most relaxed. For N = 1, the required frequency estimation variance is
2.8e-4 which can be achieved using the weighted Kay estimator with an estimation length of 9.
The header length must include the time for the frequency estimation and for the first phase
estimation to be produced, for a total of 14 symbols for phase and frequency estimation.
It may seem strange to have N < L, but it is possible if the V&V phase estimator is pipelined
producing one result every symbol that is the average of the previous L symbols.
Energy consumption for the frequency/phase estimator is 126 nJ over the 16 total symbols of
the packet header. Total power consumption is 5.97 mW during synchronization and 6.68 mW
during data reception. The 1 dB implementation margin is met.
The differential method requires only frequency estimation so that the phase error per symbol
is less than 1e-1 (Bound 1). The required frequency estimation variance for 3 sigma operation is
1.1e-3. The convergence time to achieve this variance is 6 using the weighted Kay algorithm.
Energy consumption for the frequency estimation is 63 nJ over the 8 symbols in the synch header.
Total power consumption is 6.36 mW during synchronization and 5.78 mW during data
reception. The 1dB implementation margin is met. Table 6-3 details the results of the 4
algorithms.
73
Table 6-3: Comparison of different Frequency/Phase Estimation Schemes
Method Convergence Time Modulation Energy during Power during data

(frequency/phase Penalty (dB) synchronization reception (mW)
component) (nJ)
Original: FB 38 0 271 6.18
Coherent
Tracking
Estimate Once >100,000 0 N/A N/A
FF Coherent 16 0 126 6.68
Tracking
Differential 8 2.3 63 5.78
6.3.5 System Results
Comparing the convergence time and base power consumption in isolation does not take into
account the system impact of the 2.3 dB of the differential scheme, the different synchronization
power consumption during data reception, or the front-end power that is consumed during the
longer packet headers. The system power consumption is based on a set of parameters, such as
transmitter efficiency, required BER and input SNR, receiver power consumption, baseband
power consumption during synchronization and data reception, and packet header length. The
values for the PNII system are outlined below in Table 6-4.
Table 6-4: Parameters used in system exploration
Txdiss (for 0dBm 30 mW

transmit power)
Tx efficiency 20%
Rxdiss 70 mW
Figure 6-7 shows the system energy consumption comparing the original, FF coherent
tracking, and differential methods along with the original PNII system. It is shown that for 1024-
bit packets, the FF coherent tracking scheme is the lowest power consumer by a 1% margin vs.
the differential method. The coherent PLL scheme is 2% worse than the FF coherent tracking
scheme. For short packets where header length is the dominant factor in system power
74
consumption, differential schemes can pay off. Especially considering the extra design time and
extra risk associated with implementing the coherent scheme, the differential scheme looks
attractive. However, for longer packets, the Differential scheme never wins. This is because the
longer header length of the coherent scheme is amortized over more data bits, and therefore the
2.3dB penalty of the differential scheme matters more. However, in this instance, the margin for
infinitely long packets is only 2%. This is due to the specific power consumption of the different
components of the system. In this system, the extra power required to transmit the additional 2.3
dB (170%) is only 2.7 mW. This is due to the low original transmit power of 0 dBm (1 mW) and
the moderately good PA efficiency of 20%. This additional power is relatively small in
comparison to the front-end power of 100 mW. The differential scheme will be even less
favorable in systems where the additional transmitted power is small compared to the front-end
power. Low transmitter efficiencies raise the additional power required to transmit the additional
2.3dB. Therefore, low transmitter efficiencies favor the coherent scheme. Compared to the
original PNII system, the FF Coherent Tracking method achieves a reduction of 66% in
synchronization energy consumption, resulting in a 7% lower system energy for packets lengths
of 512 used in the original system.
75
Figure 6-7: System power consumption for different schemes
The rapid power estimation tool described in Chapter 3 allows four frequency and phase
estimation schemes to be analyzed. The system power estimation tool then allows the schemes
with different convergence times and power consumption to be compared. This exploration
resulted in a reduction of 66% in synchronization energy consumption, 75% in synchronization
header length, and 7% in system energy from the original PNII system design. It also determined
packet lengths at which differential modulation is advantageous. Theoretically efficient
modulation schemes, which result in more synchronization overhead, are useful only for long
packets and long transmit distances. Otherwise, less efficient modulation schemes (such as
differential PSK) result in lower system power because of the reduced synchronization overhead.
76
Interpolation
7
7.1 Introduction
A major component of most timing recovery algorithms of any type is a timing interpolator to
perform the parameter adjustment. Therefore, a study of timing recovery algorithms relies on
accurate power consumption estimates of interpolators of various sizes and performance.
Two styles of interpolating filters exist: ones that use a lookup-table (LUT) for the
coefficients, and the others that compute the coefficients on the fly, called the Farrow
interpolators [MEY]. LUT-based interpolators are straightforward to characterize. The output
SNR requirement is achieved by the precision and number of taps, and interpolation granularity is
achieved through the number of coefficient sets that are stored. However, the Farrow
interpolators are more complex and less straight-forward to specify. A study of timing recovery
would not be complete if it didn’t consider both types of interpolation. Therefore, before a study
of timing recovery can be conducted, the Farrow interpolator must be characterized.
This chapter performs a thorough study of the Farrow type of interpolator over a wide range of
parameters. The results of this work can be used to conduct the study of timing recovery
algorithms of all types. Specification of the interpolator is inherently linked to the specifications
77
of the ADC preceding it. Therefore, a joint optimization of interpolation and ADC power
consumption is conducted in this study. The results of this study show the necessity in using a
system-level framework to evaluate power consumption, rather than just examining the power
consumption of individual blocks.
7.2 Interpolation Background
The main objective of an interpolation/resampling filter is to reproduce the samples of a
digitized sampled analog signal at the desired instant, µ, with no or minimum distortion from a
given sampled version. The primary performance metrics on which interpolators are judged are
the SNR degradation through the interpolator, and the granularity of the achievable timing
offsets.
This background on interpolators follows closely the description in [MEY]. The ideal linear
interpolator has a frequency response
( ) ⎛ 2π ⎞
∞
1
H I e jωTs , µTs = ∑H I
⎜ω −
⎜ n, µTs ⎟⎟ (6-1)
Ts n = −∞ ⎝ Ts ⎠
with
⎧⎪T exp( jωµT ) ω < 1

H I (ω , µTs ) = ⎨ s s 2π 2Ts . (6-2)
⎪⎩ 0 elsewhere
The corresponding impulse response is the sampled sinc(x) function. Conceptually, the filter
can be thought of as an FIR filter with an infinite number of taps. The taps are a function of µ.
For a practical receiver, the interpolator must be approximated by a finite-order FIR filter.
( ) ∑ h (µ )e
N −1
H e jωTs , µ = n
− jωTs n
. (6-3)
n=− N
The 2N coefficients of the FIR filters must be pre-computed and stored in a memory for a
number L of possible values. This represents the LUT style interpolator. As a consequence, the
78
timing resolution suffers from a maximum discretization of L/2. Assuming the word length of
each tap is W, implementation complexity depends on the structure parameters, L, 2N, and W.
An alternative to storing a set of coefficients is to compute them on the fly using polynomial
interpolation. Each coefficient hn(µ) is approximated by a (possibly different) polynomial in µ of
degree M(n):
M (n)
hn (µ ) = ∑ c (n )µm
m
. (6-4)
m=0
For a 2N-th order FIR filter, the
N −1
2N + ∑ M (n)
n=− N
(6-5)
coefficients are obtained by minimizing the quadratic frequency-domain error averaged over all
µ:
2
N −1 M ( n )
σ 2 1 2πB ⎡ ⎤
σ = x ∫ ∫ e jwT µ − ∑ ⎢ ∑ cm (n )µ m ⎥e − jnωT dω dµ .
2 s s
(6-6)
4πB 0 − 2πB
e
n= − N ⎣ m=0 ⎦
where σx2 is the input signal power. The optimization is performed within the passband of the
signal x(t). No attempt is made to constrain the frequency response outside B. The errors are
reported as a ratio vs. input signal power in dB.
Since the function is restricted to be of polynomial type, the error of the polynomial
interpolator will be larger than that for the LUT-based MMSE interpolator described by (6-3),
although it can be made arbitrarily small by increasing the degree of the polynomial. Though the
polynomial interpolator performs worse, it is often chosen because it can be implemented very
efficiently in hardware.
For simplicity, it is assumed that all polynomials have the degree M(n) = M. Inserting for
Hn(µ) the polynomial expression of the FIR transfer function and interchanging summation,
results in
79
∑ [c (n)z ].
M N −1
H (z, µ ) = ∑µm m
−n
(6-7)
m=0 n=− N
The inner sum describes a time-invariant FIR filter that is independent of µ, and there is one of
these for each degree m of the polynomial. The polynomial interpolator can thus be realized as a
bank of M parallel FIR filters where the output of the m-th branch is first multiplied by µm and
then summed. This structure was devised by Farrow [MEY]. A block diagram of this structure is
shown in Figure 7-1.
HM(z) x
cm(-N)
x
HM(z) HM-1(z) HM-2(z) ... H0(z) cm(-N-1) z-1

+
...
...
X + X + X + y
cm(N-1) z-1
? +
rm
Figure 7-1: Block diagram of the Farrow interpolator.
The MMSE criterion (6-6) chosen here is a common one, but other metrics have been
proposed in the literature. For instance, [VO] suggests using a weighted average of time- and
frequency-domain error and timing detection error due to the imperfect interpolator. Other
metrics may include side-lobe magnitude when near band interferer rejection is a required feature
of the interpolation filter. Regardless of the optimization function, the general structure of the
taps is very similar (in terms of number of ‘0’ and ‘1’ in the taps). For instance, the Vo taps for
λ = 4 (N, M) = (2, 2) are shown in Table 7-2 along with the MMSE taps in Table 7-1. Therefore,
the power consumption results obtained here can be applied to both. However, the final metric by
which one wants to compare power consumption (i.e. power consumption vs. final SNR or power
consumption vs. timing recovery error) will change which parameters optimize the criteria. The
80
rapid power estimation tool described in Chapter 3 could aid the designer in re-evaluating the
power consumption of the interpolators with different tap values and optimality criteria. The rest
of this discussion assumes the use of the MMSE criteria (6-6).
m=0 m=1 m=2

n = -2 0.0000 -0.3688 0.3688
n = -1 1.0000 -0.6570 -0.3430
n=0 0.0000 1.3430 -0.3430
n=1 0.0000 -0.3688 0.3688
Table 7-1: MMSE coefficients for λ = 4, (N, M) = (2, 2)
m=0 m=1 m=2

n = -2 0.0000 -0.2867 0.2867
n = -1 1.0000 -0.7133 -0.2867
n=0 0.0000 1.2867 -0.2867
n=1 0.0000 -0.2867 0.2867
Table 7-2: Vo coefficients for λ = 4, (N, M) = (2, 2)
7.3 Farrow Interpolator Exploration
The Farrow structure has several parameters:
• Coefficients themselves
• Coefficient bit widths (WT)
• Bit-widths of input data (SNR of input data) (WI)
• Oversampling rate of input data (λ)
• Bit-widths/resolution of timing offset (Wµ)
• Filter lengths (N)
• Polynomial order (M)
The oversampling ratio (OSR) of the input data to the interpolator is dictated by the OSR of
the ADC preceding it. Although an oversampling ratio of 2 is required to meet the Nyquist
sampling requirements, larger OSRs are often used in practice. As seen in Chapter 4, there are
system considerations other than from the synchronization system that may govern this
81
parameter. For instance, a bandwidth-limiting filter must precede an ADC to avoid aliasing of
the signal. The specs of this filter are relaxed if a higher oversampling rate is used in the ADC.
Each of the parameters above affects the two metrics (output SNR, and timing resolution) of
the output as follows:
Timing resolution = λ ⋅ 2
Wµ
•
• Output SNR = F (coefficients, WT , WI , λ , N , M ) , where F is some unknown function.
There are several factors we can use to constrain the choice of (N, M). For M=1, the error is
invariant of N. Therefore, only (1, 1) is considered. For the error required by typical
applications, N ≤ 4 suffices to produce small enough errors (below –60 dB for λ = 2, and –90 dB
for λ = 4). With N ≤ 4, there is almost no difference between M = 3 and M = 4. For M = {2,3},
the errors for N = 1 are indistinguishable. Therefore, only (1, 2), (2, 2), (3, 2), and (4, 2) are
considered for M = 2, and (2, 3), (3, 3), and (4, 3) are considered for M = 3.
The exploration is limited to λ = {2, 4} because 2 is the minimum λ which meets the sampling
theorem requirement, and λ > 4 are rarely used [MEY]. Also, the input SNR is limited to {2, 4,
8} bits which gives a range most commonly used in low data rate networks that typically use low
order modulation constellations BER lower than 10-4. Bit resolutions on Wµ are limited to {2, 4,
8} which gives a minimum resolution of 1/512 for λ = 2 and 1/1024 for λ = 4. This is well
beyond the range that most system would ever require (typical systems require 1/8 or 1/16
precision). WT is limited to {2, 4, 8} bits. For high SNR systems, 16 bits should be included for
M = 3 to reach errors smaller than –50 dB for λ = 2 and –60 dB for λ = 4 (see Figure 7-2). Even
with these restrictions on the parameter space, the power consumption of over 500 interpolators is
to be estimated. This would not be possible without the fast gate-level power estimation method
developed in Chapter 3.
Output SNR is calculated assuming the incoming data is AWGN, and that the interpolator
error is white as well. In that case, the variances of the noises add to obtain the output SNR.
82
Since the interpolator noise is not quite white, this is an optimistic, but reasonable approximation
similar to the ones made in ADC analysis.
Hardware complexity (i.e. the number of equivalent full adders) is approximately equal to:
( M + 1) ⋅ (2 N ) ⋅ WI ⋅WT
+ ( M + 1) ⋅ (2 N − 1) ⋅ (WI + WT )
2 N −1
+ ( M + 1) ⋅ (2 N − 1) ∑ log(i + 1)
i =1
(6-8)
+ M ⋅ (WI + WT ) ⋅ (Wµ + 1)
+ M ⋅ log 2 (2 N ) ⋅ (Wµ + 1)
M
+ Wµ ∑ log(i )
i =1
If all the circuitry in the system were switching every cycle, the power consumption would be
proportional to this hardware complexity times the clock frequency. However, there are several
factors that influence how much of the circuitry is switching each clock cycle:
• Density of taps (number of 1’s)
• Correlation in µ
• Correlation in input data
Correlation in µ increases with an increase in λ because it is assumed the µ value is updated at
symbol rate (as in most timing recovery systems). Therefore, µ is changed only every λ samples.
For similar reasons, correlation in input data increases with increasing λ because the symbol rate
remains the same. Therefore, there are more samples of the same symbol, which are highly
correlated. The average input data correlation per bit and µ increase with fewer bits. This is
because the bits used are the MSB’s and in correlated data, the MSB’s exhibit more correlation
than LSB’s. Essentially, the more bits there are, the more LSB’s there are and there is less
correlation between bits of adjacent samples.
83
Figure 7-2: Tap error (dB) vs. (N, M) and WT
For implementation exploration, each set of taps is truncated to WT bits. The error metric is
re-computed for each set of taps at each different resolution and plotted in Figure 7-2. The higher
order and longer filters perform worse than the lower order and shorter filters in the case of small
number of bits. The conclusion is that the higher order and longer filters take more tap resolution
to implement with desired results. WT =16 should be included if the low errors of λ = 4, M = 3
are to be achieved. Looking at the graphs, there are clearly some sets of taps that would never be
used because they perform worse than or equal to a set of taps with less complexity (and hence
less power consumption). Below are the sets of taps (N, M), λ, and WT that would be used:
WT = 2:
λ = 2: (1, 1), (2, 2)
λ = 4: (1, 1)
WT = 4:
λ = 2: (1, 2), (2, 2), (2, 3)
λ = 4: (2, 2), (3, 2)
84
WT = 8:
λ = 2: (2, 2), (3, 2), (2, 3), (3, 3)
λ = 4: (1, 2), (2, 2), (3, 2), (2, 3)
The current method is to optimize the infinite precision taps, and then truncate those taps to
the requisite number of bits. Future work is to optimize the taps for each WT. In this case, maybe
some of the higher order and larger filter lengths could achieve better results at the lower tap
resolutions.
Using these sets of taps listed above, the power consumption is estimated for different WI and
Wµ. The clock speed is increased proportional to λ to keep the symbol rate constant over all
simulations. A symbol rate of 1 MHz is assumed. When comparing interpolation filters with
different WI or λ, the power in the ADC preceding the interpolation filter must be included if a
fair system trade-off is to be made. ADC power is estimated using (3-1) and (3-2) with an FOM
of 1.2e12.
The results show that choosing the wrong parameters could have a detrimental impact (up to
factors of 10) on power consumption. It was expected that the results would show succinct rules
for dictating which interpolator parameters to use for the lowest power implementation of a given
spec. However, nice trends, like those that emerged in Chapter 5 for frequency estimation, are
not observed. Rather, the results highlight the importance of the rapid power estimation
framework for examining a large parameter space. The designer of a Farrow interpolator must
examine many instantiations to determine which one is lowest power for the given specification.
The rest of this chapter describes the results of the exploration. Some basic guidelines are
given for designing the interpolator to meet the final specifications for timing resolution and
output SNR with the lowest power. Interspersed with these guidelines are some examples of how
much power can be wasted if a systematic exploration of the space is not conducted.
85
7.4 Achieving the Timing Resolution Specification
First, the timing resolution specification is addressed. λ and Wµ are the only parameters that
influence timing resolution. Since increasing λ incurs more ADC power, it is expected that the
most power efficient way to achieve the required timing resolution is to use the minimum λ
and Wµ = log 2 (resolution / λ ) .
However, there are a few rare cases where it is advantageous to move to a higher λ to achieve
a required timing resolution and SNR with lower total power. Figure 7-3 through 7-5 show
graphs of equal timing resolutions achieved with λ = 2 versus λ = 4 including ADC power. In
one example, to achieve 22 dB final SNR and 1/1024 timing resolution for 1 MHZ data, the best
solution with λ = 2, Wµ = 9 achieves 22.3 dB SNR for 106 uW (79 uW in the interpolator and 27
uW in the ADC). However, the best solution with λ = 4, Wµ = 8 achieves 23.4 dB SNR for only
68 uW (15 uW in the interpolator and 53 uW in the ADC). So, there is a power savings of 50%
with higher oversampling even when ADC power is taken into account. While such a case is rare
in the data examined in this study, the power penalty for choosing the wrong configuration is
high.
Figure 7-3: Interpolation performance for timing resolution of 1/16
86
The relative power consumption of the ADC versus the interpolator is a result of the data rate
and the process technology parameters. As the data rate increases but the process stays the same,
the FOM of the ADC decreases, making the ADC more expensive that just linear scaling with
frequency. As process technology scales, the digital circuitry (interpolator) consumes less power,
but analog circuitry power (ADC) doesn’t scale as quickly. Therefore, unlike the frequency
87
estimation results which were essentially technology-independent, the results for the interpolators
are specific to the data rate and technology chosen. Therefore, the designer is encouraged to
conduct a systematic exploration of the design space for the given process and data rate to
achieve the lowest power implementation.
Another method to increase timing resolution that doesn’t incur the power penalty of the
higher speed ADC would be to run the ADC at the minimum oversampling rate of 2x, have a 2x
fixed interpolator after the ADC to upsample the data to 4x oversampling, and then run through a
λ = 4 variable interpolator. The fixed interpolator would add some noise to the signal, and some
extra power consumption, but may be worthwhile versus the corresponding increase in Wµ. This
analysis is beyond the scope of this thesis and is left for future work.
7.5 Achieving the Output SNR Specification
From a systems perspective, the lowest power solution for a given SNR spec usually has the
output SNR limited by the input SNR and not the errors in the interpolator itself (internal SNR).
Typically, achieving the input SNR has a high marginal cost because circuit power in the
transmitter, receiver front-end, and ADC has to be expended to achieve this SNR. If the
interpolator output SNR is not input SNR limited, these subsystems are wasting power achieving
an unnecessarily high input SNR.
To achieve an input SNR-limited implementation, the coefficients, N, M and WT must be
chosen so that the noise produced in the interpolator is much lower than the noise of the input.
However, when there are several configurations of N, M, and WT that achieve this goal, there are
no hard and fast rules for how to select them to achieve the lowest power result. Figure 7-6
through 7-9 show the exploration results for the interpolators examined in this study.
There are cases where increasing the performance actually requires less power. For any two
set of points (SNR, power): (x1, y1), (x2, y2) if x1 > x2 and y1 < y2, then (x1, y1) would always
be favored over (x2, y2) even if it exceeds the SNR requirement. For instance, for WI = 4, λ = 2,
88
and Wµ = 8, the configuration with N = 1, M = 2, and WT = 8 achieves (20.5 dB, 167 uW),
however the configuration with N = 2, M = 2, and WT = 2 achieves (22.3 dB, 102 uW). There is
an increase in performance by 1.8 dB for a decrease in power of almost 40%.
Figure 7-6: Interpolator performance for Wµ = 2
89
There can also be a large penalty in power for a marginal increase in performance for a given
WI, λ, and Wµ. For instance, for WI=2, λ=2, and Wµ=8, the configuration with N=2, M=2 and
WΤ=2 achieves (11.9dB, 32uW) however, the configuration with N=3, M=3, and WΤ =8 achieves
(12.0dB, 371uW). There is an increase of a factor of 11 in power, but only a 0.1dB increase in
performance. The designer with an output SNR specification of 12dB might make a system-level
decision to reduce it to 11.9dB to save over 300uW in the interpolator.
Because the effect on power consumption in these examples is significant, the designer is
encouraged to conduct a systematic exploration of the design space to achieve the lowest power
implementation using the framework and analysis implemented in this thesis.
7.6 Conclusion
Characterization of the Farrow interpolator is required for exploration of the timing estimation
space. Over 500 instantiations of the Farrow interpolator were examined covering the parameter
space used in typical systems. It was shown that a designer who follows heuristic guidelines to
implement these interpolators could incur huge power penalties. Interpolator design should not
be done in isolation. Joint optimization of the ADC and interpolator is necessary to achieve
optimal system performance. Designers are encouraged to conduct a systematic exploration of
90
the design space to achieve the lowest power implementation. The tools developed in this thesis
allow that exploration to be easily conducted.
It is expected that the Farrow-style interpolator will outperform the LUT-based interpolator
when the storage space required for the taps is large (i.e. when N ⋅ WT ⋅ 2
Wµ
is large). However,
the exact crossover point has not been evaluated.
7.7 Postscript: Interpolator Hardware Implementation Specifics
The implementation chosen here follows Figure 7-1 very closely. Signed integers are used for
the input data and intermediate results throughout the interpolator. The µ data is unsigned
fractional representation to represent values in [0..1) with the appropriate precision (Wµ).
Intermediate results are scaled to accommodate growing bit precision without truncation errors.
There are several hardware improvements that can be made to improve the area and power of
the interpolators. The use of carry save arithmetic may improve the performance and/or reduce
the power consumption of the interpolator. This would be especially effective for large N and
large input bit widths. An automated tool has been developed that scales intermediate results of
FIR filters while keeping resulting overall quantization noise within a certain bound [SHI]. This
would result in area and power savings if applied to these interpolators. For specific
configurations, there are some hardware simplifications that will reduce the power consumption.
For instance, the symmetry in the filter taps for M=2 can be exploited to reduce hardware as
described in [VO2]. Incorporating these changes into the interpolation implementation is beyond
the scope of this thesis.
91
PN3 System Design
8
The field of Wireless Sensor Networks (WSNs) has many applications, including closed-loop
environmental control of smart buildings, ecological monitoring, and structural monitoring. The
embedded nature of these applications and the large number of nodes makes changing batteries
impractical. Therefore, these applications can only be mass deployed when power consumption
is reduced to levels below what can be scavenged from the environment. It has been calculated
that nodes consuming less than 1mW can achieve reasonable duty cycles by harvesting light or
vibration energy typical in office environments [RAB2].
In Chapter 6, it was shown that differential modulation schemes, while theoretically
inefficient, can consume less power in practice than coherent schemes if short packets are used.
One design example where short packets are indeed common is sensor networks.
Differential modulation schemes are one example of where the system specs are designed to
relax the synchronization requirements therefore reducing the length of the packet header. Going
further, fully non-coherent modulation schemes might reduce the header length even more. The
system power estimation tool can be used to evaluate whether these changes actually improve the
system power consumption.
92
PN3 is a 50 Kbps system designed for use in wireless sensor network applications [SHE]. The
PN3 system design is an exercise in extreme simplicity to study the effects on system power
consumption. A non-coherent modulation scheme is used, no AGC is employed, and other
system specs are selected, such as packet length and crystal accuracy, to ensure very relaxed
synchronization requirements. In addition to designing the synchronization scheme with a short
header length, both analog and digital implementations are explored.
The modem front-end in use is a MEMS-based RF transceiver designed for low power and
fast turn on times for WSNs. The carrier frequency is 2 GHz. The radio employs a self-mixing
signal down conversion using an envelope detector [OTI]. The front-end consumes 3 mW in
receive mode, and around 6mW in transmit mode. The maximum data rate is 50 Kbps, and, since
a self-mixing scheme is employed, on off keying (OOK) is the only modulation scheme available
within one channel. (If several channels are available, FSK modulation is possible.) The 1 µs
oscillator startup time in the transmitter dictates the transmitted signal envelope.
MAC layer schemes for WSNs are described in [LIN]. WSNs use packet-based
communication where the synchronization parameters need to be re-estimated for every packet.
Packet lengths for typical applications are 30 bits for control packets and 200 bits for typical data
packets with a maximum of 500 bits. The small packet lengths dictate the need for short
synchronization headers. In contrast, the 128-bit preamble for 802.11b would impose overheads
over 300% for control packets. MAC layer simulations show that higher data rates reduce overall
system power consumption by reducing the duration of packets and consequently the collision
probability.
8.1 Simplification of Synchronization
The low power design strategy of this work is to reduce the complexity of the synchronization
system through careful system design. A trade-off of packet length, data rate, clock accuracy,
93
and modulation scheme achieves simplified synchronization requirements able to be implemented
with low power consumption.
First, the choice of RF radio architecture (itself a product of a simplification strategy to reduce
power) has already greatly simplified the synchronization requirements. First, interferers do not
need to be considered in the synchronization system. The high-Q MEMS channel-select filter
sufficiently suppresses out-of-band interference. In-band interferers are dealt with at the MAC
layer with carrier sense mechanisms. Second, the self-mixing architecture removes frequency
and phase information from the incoming signal. Therefore, OOK is the only available
modulation scheme for which frequency and phase synchronization are not required. For any
OOK radio, the only synchronization elements required are timing and amplitude. It is desirable
to estimate them only once rather than continuously track them throughout the packet for low
power operation.
Amplitude estimation requirements are dictated by packet length, data rate, and channel
Doppler and multipath. It is desired to keep the symbols slow enough so that multipath effects do
not need to be estimated and corrected. This scheme renders individual links vulnerable to deep
fades which may not be acceptable in all systems. However, WSNs are robust to single link
failures, relying, instead, on the ability to route packets to a group of nearby nodes, one of which
will not be suffering from a fade due to the spatial diversity.
Delay spreads of up to 460ns for indoor wireless channels have been reported in the literature.
Multipath effects are insignificant if the delay spread is much less than the symbol time. This
ensures that the channel looks flat with respect to the signal bandwidth. For all multipath arrivals
to fall within the first 10% of the symbol time, symbol times longer than 4.6 µs or data rates
slower than 200 Ksps are required. Having removed multipath effects, only the signal amplitude
needs to be estimated to achieve optimal decisions. More complicated and power hungry
circuitry, such as equalization, are avoided. Therefore, for any WSN with robustness to single-
link deep fades, no equalizer is required if symbol rates are slower than 200 Ksps.
94
To estimate the amplitude at the start of the packet only, rather than track it throughout the
packet, the entire packet must be transmitted before the channel changes substantially. The rate
of change of the channel is dictated by the Doppler rates, typically 10Hz for indoor wireless
channels. For 90% correlation between the header and the end of the packet for any indoor
wireless link, the packet length must be less than 10ms. Therefore, there is a trade-off between
the maximum packet length and the data rate. Faster data rates allow longer packets to be sent. If
a maximum packet length of 1024 is required, then data rates faster than 100Kbps are needed.
In this particular case, the maximum data rate that can be achieved by the RF front-end is 50
Kbps. This will allow the 500-bit packets required for normal operation to be sent, but
unfortunately is slower than the 100 Kbps required to send 1024-bit packets without having to re-
estimate the signal amplitude. The 1024-bit packets used for development and debugging efforts
can either be fragmented into smaller pieces, or sent whole knowing the packet error rate (PER)
will be impacted.
Timing offset is affected by packet length, clock accuracy, and required synchronization
timing resolution. Clocks can slip up to ½ the required timing resolution before needing to be re-
estimated. For any communication system, if the crystal accuracy in ppm is better than
1e6*ε/2N, then no timing tracking is needed. Where N is the maximum number of symbols in a
packet, and ε is the fractional timing resolution requirement. In our case, with a required timing
resolution of one-tenth the symbol period, and 1024-bit packets, 50 ppm clocks are required to
not have to re-estimate the timing instant during the packet. Shorter packets and coarser timing
resolution requirements will ease the clock accuracy requirements.
Finally, carrier sense is a required feature of our MAC protocol [LIN]. While there are many
possible ways of implementing carrier sense, a simple method is chosen for this system. The
chosen carrier sense method integrates the channel energy over 10 symbols and compares the
energy to a fixed threshold. Coding in the data-link-layer ensures that the data streams have at
95
least 5 ones in any string of 10 symbols. A coding scheme, with an extremely simple
implementation, uses one extra symbol every 9 symbols to achieve this requirement. The CS
threshold is left programmable so that the MAC layer can tradeoff the probability of miss detect
vs. the probability of false detect.
Table 8-1 details the synchronization requirements for the PN3 system. Only timing and
amplitude are required. Further, since these parameters are static over the length of the packet, it
is only necessary to estimate them once, rather than continuously track them.
Table 8-1: Summary of synchronization requirements for the PN3 system
Synchronization Synchronization
Parameter Requirement
Timing Estimate Once
Frequency Not estimated
Phase Not estimated
Amplitude Estimate Once
While the minimum SNR from the envelope detector for correct operation is 13dB, the
received signal often has higher SNR. Since no automatic gain control (AGC) is employed, the
signal amplitude can be almost 30 times larger than that of a 13 dB SNR signal. To digitize this
range of signals while keeping quantization noise from affecting the low-SNR operation, an 8-bit
analog-to-digital converter (ADC) is required. A discussion of the implications of this AGC-less
design on system power consumption is found in §8.8.
8.2 Analog vs. Digital Implementation
Two implementation methods for the synchronization system are investigated. First, an all-
digital scheme is used that takes the signal from the envelope detector directly to an ADC after
which all synchronization processing is done in the digital domain. The second scheme is
designed to eliminate the largest single power consumer in the synchronization system, the ADC,
by performing the synchronization with analog circuits that are controlled by a digital
96
synchronization loop. For each scheme, the synchronization header length and content are
chosen to optimize the performance of the selected algorithms.
Ideal OOK detection achieves 1e-4 BER with 11.4 dB SNR [PRO]. Therefore, the total
implementation loss in the baseband must be less than 1.6 dB to achieve the 13 dB requirement.
There are several places where loss will occur: non-ideal matched filtering, threshold variance,
and timing variance. In addition, for the digital algorithm, there will be clipping and quantization
noise in the ADC. In the analog algorithm, there will be circuit noise and non-idealities such as
offset and non-linearity. As was described in Chapter 2, the total SNR loss is initially divided
amongst the synchronization blocks using designer experience and initial estimates. From initial
calculations, it was determined that matched filtering and threshold estimation would dominate
the implementation losses. Therefore, it was decided to split 1dB loss evenly between them, and
split the remaining 0.6dB evenly between timing estimation and other losses. The design targets
for the implementation losses are detailed in Table 8-2.
Table 8-2: Target synchronization implementation losses
Matched filter losses 0.5 dB

Threshold variance losses 0.5 dB
Timing variance losses 0.3 dB
Other (ADC noise, circuit noise, etc.) 0.3 dB
Total 1.6 dB
In the following, the two implementation schemes are described and analyzed.
8.3 Matched Filtering
Since the startup time of the oscillator is small compared to the symbol time (less than 1us),
the incoming signal can be matched filtered to a square wave without much loss of SNR. Square
wave matched filtering is performed by simple integration of the signal energy. This is
performed by an adder in the digital domain and an integrator in the analog domain.
97
It is possible to calculate the SNR loss of non-ideal matched filtering if the exact transmit and
receive waveforms are known. However, for our system, an analytical expression for the
waveforms is difficult to determine. First, the startup time of the oscillator dictates the shape of
the transmitted waveform. Second, the waveform is altered by the non-linearity in the envelope
detector of the receiver. The SNR loss can be bounded by estimating that the received signal
deviates from a square wave in the first and last 10% of the symbol. Therefore, correlation of the
received waveform with a square matched filter results in less than 0.5dB loss for both the analog
and digital schemes.
8.4 Amplitude Estimation
The threshold, for both the digital and analog schemes, is estimated by averaging the energy in
N symbols of alternating 0’s and 1’s. With N even, this threshold estimator automatically
accounts for any offset present in the signal that could result from the RF and analog circuitry and
the ADC. In addition, because of the alternating 0’s and 1’s in the header, the threshold can be
estimated without prior timing information. The variance of the estimated threshold equals the
symbol variance divided by the estimation length, N.
Threshold variance directly degrades the SNR of the symbols. This can be seen upon
examination of the decision equation for noisy OOK symbols with a noisy threshold:
1
>
Es + N o Th + N t (8-1)
<
0
Where Es is the symbol energy, No is the symbol noise, Th is the ideal threshold, and Nt is the
threshold estimate noise. Moving Nt to the other side of the equation gives the standard OOK
decision equation with increased noise.
With N = 10 symbols, this threshold estimator achieves 1/10 the variance of the symbols.
Therefore, the SNR degradation due to the amplitude estimation is approximately 0.5 dB.
98
With ideal circuitry, the analog threshold estimation scheme achieves the same variance as the
digital scheme. In actuality, the integrator used to average the 10 symbols suffers form non-ideal
gain (ideal gain is 1/10 to achieve the appropriate average), non-linearity in the integration
function, offset, and noise. It is assumed that circuits can be implemented so that these effects are
within the margin allotted for circuit impairments.
8.5 Timing Estimation
Timing estimation is the main area where the digital and analog schemes diverge. The digital
scheme can take advantage of the fact that parallelism is easily implemented with digital circuits
and use a parallel feed-forward search for the correct timing instant. Approximate estimates
show that implementing a similar algorithm in the analog domain would consume more power
than the ADC (replicating the code correlator of length 7 with 10x oversampling would require at
least 70 amplifiers, and the peak detection circuit would require several comparators and precise
gain stages). Therefore, a feedback DLL-like algorithm is used in the analog scheme.
The chosen digital synchronization scheme uses an ADC that samples at 10 times the symbol
rate. The ADC oversampling ratio (OSR) of 10 achieves the timing variance of 2.5e-3 required to
keep implementation losses below the 0.5 dB target. Theoretically, oversampling ratios as small
as 2 could be used. However, the data would then have to be digitally interpolated to achieve the
required timing resolution. Given the very low data rates of 50 Kbps used here, 10x
oversampling yields an ADC sampling rate of 500 KHz, which is achievable for under 200 uW.
The optimal timing instant is estimated using a maximum-likelihood feed-forward data-aided
estimator,
N −1
arg max
ε
∑a z
n=0
n n (ε ) , (8-2)
where an are the known data bits (mapped from OOK to antipodal representation) and zn(ε) is the
received signal at the fractional timing offset ε. The input stream is correlated against the known
99
7-bit sequence in the header, and the maximum search is performed with a peak-detection
algorithm. In addition to performing timing estimation, this peak detection method also yields
packet synchronization because a sequence is used that has high autocorrelation and low cross
correlation with other sequences in the preamble.
The variance of the timing estimate is a combination of the variance from the estimator and
the variance due to quantization of the timing resolution. The variance of the estimator is given
in [MEY] for root-raised cosine data. Since the transmit shape is unconventional and has a high
excess bandwidth, the expected variance is coarsely estimated to be approximately the variance
for root-raised cosine shape with maximum excess bandwidth (α = 1) to be:
1 0.25 1
var(ε ) T = * + 2 * 0.004 , (8-3)
N 2 SNR N
or 1.0e-3 (σε ~= 3% of the symbol time) at 13 dB input SNR with an estimation length of 7.
The variance due to quantization of the timing estimator is largest when the optimal timing
instant is exactly halfway between two of the samples. In this case, the variance is:
(
var(ε ) Q = 1
2OSR
)
2
(8-4)
The chosen timing quantization of 10 samples per symbol yields a variance of 2.5e-3.
Typically increasing the sampling rate of the ADC to improve the timing variance is
expensive, so the optimal solution achieves var(ε)T much less than var(ε)Q. In this
implementation, the variance due to the timing estimator is one half that due to the timing
quantization.
The chosen analog synchronization scheme uses a data-aided feedback algorithm to determine
the optimal timing instant. The loop filter and voltage/numerically controlled oscillator
(VCO/NCO) could be implemented in either the digital or the analog domain. However, the
symbol integration would be implemented with analog integrators. Analysis in [MEY] shows
that feedback algorithms take approximately two times the amount of time to converge to the
100
same variance as feed-forward algorithms. Therefore, a header sequence of length 14 would be
needed for timing estimation. The header sequence would most likely be alternating 0’s and 1’s,
which carries the most timing information.
Since alternating 0’s and 1’s are used for timing estimation, packet synchronization needs to
be performed separately in the digital domain once timing synchronization is achieved. A 7-bit
PN sequence is required to get the same detection probability of the digital algorithm.
While impact of timing variance on SNR is treated in [MEY], but requires the exact pulse
shape to be known. Since the exact pulse shape is unknown in this case, it is difficult to calculate
the exact SNR degradation due to timing variance. From the analysis in [MEY], we do know that
errors in timing yield a reduction in the useful component of the signal and introduce ISI.
For the digital scheme, rough analysis is possible with the reasonable assumption that most
timing errors have a magnitude less than 1/OSR. In the worst case, when the correct timing
instant is halfway between two samples, the timing error is 1/2OSR. When a fractional timing
error of x occurs, then x% of the previous (or next) symbol’s energy is integrated into the current
symbol and x% of the current symbol is integrated into the next (or previous) symbol. With
uniformly distributed uncorrelated data, there is a 50% chance that the previous (or next) symbol
is the same as the current one. In that case, the current symbol integrates the correct amount of
signal energy. The other 50% of the time, the previous (or next) symbol is different from the
current symbol and the energy integrated by the current symbol is 1-x% (without loss of
generality, assuming the current symbol is a ‘1’). Therefore, the probability of error in the worst
case can be estimated as
1
Pe = 0.5 * Q( SNR ) + 0.5 * Q( SNR (1 − ))
2OSR (8-5)
The inverse Q function can be used to determine the SNR loss.
101
Using (8-5), the digital scheme timing estimation variance of 2.5e-3 yields approximately 0.3
dB loss, which was the implementation target. It is assumed that the analog scheme could
achieve the same variance and implementation loss.
8.6 Digital Synchronization Scheme Summary
The synchronization header structure for the digital scheme is shown in Figure 8-1a. The
chosen digital synchronization scheme uses an ADC that oversamples the signal from the
envelope detector by a factor of 10. The threshold is estimated by averaging the energy in 10
symbols of alternating 0’s and 1’s. Next, the optimal timing instant and packet synchronization is
estimated using a maximum-likelihood feed-forward data-aided estimator that correlates the data
with a 7-bit sequence and performs peak detection to find the maximum over the 10 timing
instants provided by the ADC. Lastly, during data reception, symbols are matched filtered, and
sliced using the estimated threshold. The total synchronization header length is 18 (10 for
threshold estimation, 7 for timing estimation, and 1 for overhead because the peak detection
algorithm latency renders the first data symbol unusable).
Figure 8-1: Digital (a) and analog (b) synchronization header structure
Figure 8-2 details the simulated implementation loss of the various cumulative impairments in
the digital scheme. The first simulation includes only losses from non-ideal matched filtering
(‘MF’). The second simulation additionally includes quantization and clipping in the ADC
(‘quant’). The third simulation additionally includes the threshold variance (‘thresh var’). And,
the fourth simulation includes all the implementation impairments by finally adding in the timing
102
variance (‘timing var’). The total implementation loss is 1.3dB, thereby achieving better than 1e-
4 BER at 13dB input SNR. Documentation of the simulation environment is found in §8.10.
1.0E-03
1.0E-04
BER
Ideal OOK
1.0E-05 MF
M F, quant
M F, quant, thresh var
M F, quant, thresh var, timing var
Implementaito n Target
1.0E-06
10 11 12 13 14
SNR (dB)
Figure 8-2: Performance breakdown of the digital synchronization scheme
8.7 Analog Synchronization Scheme Summary
The synchronization header structure for the analog scheme is shown in Figure 8-1b. In the
chosen analog synchronization scheme, an integrator that averages the energy in 10 symbols of
alternating 0’s and 1’s performs the threshold estimation. The estimated threshold value is
sampled and held for use throughout the data portion of the packet. Next, timing estimation is
performed using a data-aided feedback algorithm. Lastly, during data reception, symbols are
matched filtered (integrated), and sliced against the estimated threshold in the analog domain
using a comparator. Unlike the digital scheme, where the timing estimator also yields packet
synchronization, the analog scheme must perform this function separately. This is done in the
digital domain by correlating the received data with a 7-bit sequence. The total synchronization
header length is 31 (10 for threshold estimation, 14 for timing estimation, and 7 for packet
synchronization).
103
8.8 Comparison of Synchronization Schemes
Because the two schemes have different header lengths, a comparison must be done within the
system power framework described in Chapter 3. The energy per useful bit metric (3-3), includes
the number of synchronization bits (BS), the number of data bits (BD), the transmitter power
dissipation including radiated power (PDiss,TX), the receiver front-end power (PDiss,TX), the
baseband power when synchronizing (PS), and the baseband power when receiving data (PD).
It is estimated that the digital scheme will consume 300 µW. The ADC power estimation is
200 µW using an FOM of 7e11 although FOMs as high as 8.6e12 have been reported for specs in
this range [SCO]. The digital circuitry power estimate is less than 100 µW as determined by the
aforementioned power estimation tool. Since the ADC dominates the power consumption and the
digital circuitry power is reduced only slightly from synchronization mode to data reception
mode, the power during synchronization and data reception are approximately equal.
Even if the analog synchronization circuits consume no power, the analog scheme consumes
more total energy for packets shorter than 400 bits because of the increased synchronization
header length. Figure 8-3 shows the energy per useful bit for the system assuming PTX = 6 mW,
PRX = 3 mW, PS = PD = 300 µW and BS = 18 for the digital scheme, and PS = PD = 0 and BS = 31
for the analog scheme.
4.4.E-07
Analog Scheme
Energy per Useful Bit
3.9.E-07 Digital Scheme
3.4.E-07
2.9.E-07
2.4.E-07
1.9.E-07
10 100 1000
Packet Length
Figure 8-3: Energy-per-useful-bit vs. packet length of analog and digital schemes
104
If an analog scheme could be devised in which the synchronization circuitry consumes ½ as
much power as the digital scheme, it could use only two extra header bits (20 total header bits)
before the system power consumption analysis would favor the digital scheme.
The conclusion is that header length is more important for system power consumption than
baseband circuit power once baseband circuit power is reduced to a small fraction of the receiver
and transmitter front-end power.
One drawback of the digital algorithm is that the ADC remains on during data reception.
Once the correct timing instant is known, the ADC could be turned off and an analog integrator
could be used to integrate the energy and slice it in the analog domain. The integrator and
comparator would consume less power than the ADC. Therefore a hybrid scheme where
synchronization is done in the digital domain and reception is done in the analog domain would
consume less power than the all-digital scheme. However system power analysis shows that this
hybrid digital/analog scheme has the potential to decrease energy per useful bit by only 2-3%
even assuming the analog components could be implemented for no power. For the next
generation transceiver, which has reduced PTX and PRX by a factor of two, the hybrid scheme
could reduce system energy consumption by 3-6%. It is unknown whether these small gains
would be worth the design time and increased area incurred by this scheme.
Finally, the dynamic range requirement of the ADC is large because no AGC is implemented.
A perfect AGC could reduce the ADC to just over 2 bits (13dB SNR). This would yield a 40x
decrease in the ADC power. The system power consumption would be improved as long as the
AGC power was less than the decrease in ADC power. However, even if the AGC could be
implemented with zero power and requiring no extra header bits, this would only yield a 3%
change in the system energy. Further, if inclusion of the AGC required more header bits, it would
adversely affect the system performance. Therefore, the current system design, which does not
use an AGC, has better system performance.
105
50%
0-bit header
40% 9-bit header
Savings
30%
20%
10%
0%
10 100 1000
Packet Length
Figure 8-4: Energy savings of 0-bit and 9-bit headers vs. 18-bit headers
The conclusion is that the digital synchronization and detection scheme power consumption is
low enough so that further reduction has very little impact on system power performance. The
synchronization parameter that has the potential to substantially impact system performance is the
header length. As shown in Figure 8-4, a half-length header has the potential to decrease system
energy by 19% for 30-bit control packets, and 4% for typical data packets of 200 bits, while a
zero-length header has the potential to decrease the system energy by 38% for control packets and
8% for typical data packets.
8.9 Conclusion and Future Work
Synchronization performance requirements were reduced through careful selection of
modulation, data rate, packet length, and clock accuracy. Of the four channel parameters
(frequency, timing, phase, and amplitude), only two parameters, timing and amplitude, need to be
estimated for our radio in a WSN environment. Further, these two parameters need to be
estimated only once in a packet rather than continuously tracked. Implementation schemes across
the digital/analog boundary were explored. It was determined that the digital algorithm results in
lower system power because it requires a shorter header. Because great care has been taken to
reduce synchronization requirements at a system level, and to reduce the power consumption of
the synchronization circuits at the block level, the digital algorithm power consumption is low
106
enough so that further reduction has very little impact on system performance. One way to
substantially reduce system energy consumption would be to further reduce the header length.
8.10 Postscript: Simulation Environment
The digital synchronization system was simulated in MATLAB Simulink and Stateflow
[MAT]. The top level simulation and main synchronization blocks are shown in Figure 8-5. The
main synchronization block consists of a few correlators and a state machine for control.
Simulation results of the timing correlator peak are shown in Figure 8-6.
Figure 8-5: Digital algorithm high level simulation and digital synchronization block.
107
Figure 8-6: Simulation results of the digital synchronization system timing correlator
108
Conclusion and Future Work
9
This thesis has shown that through a systematic exploration of synchronization power
consumption, significant system energy savings can be realized. First, a framework for exploring
power consumption in synchronization systems was defined. Then power consumption was
explored in more detail for frequency estimation and timing recovery. Lastly, these results were
used in system examples to show that the framework developed here had significant impact on
the system energy consumption.
An enabling step in this work was the development of a fast and accurate method for power
estimation that is within 15% accurate of the best available method and over 50 times faster.
This tool was used to conduct a systematic exploration of feed-forward data-aided frequency
estimation algorithms. The results of this study were straight-forward rules for which algorithm
to choose for a given system specification. Simultaneous reductions in energy consumption and
convergence times of more than a factor of 4 are possible in some scenarios.
The results of the frequency estimation study were used to improve the power consumption of
the frequency estimator block in the PNII system by 84% while simultaneously improving the
convergence time of this block by 50%. Further exploration of both the phase and frequency
estimator blocks in this system resulted in total convergence time reduction of 75%,
109
synchronization system energy reduction by 66%, and system energy reduction by 7%. In
addition, differential versus coherent modulation schemes were compared in a system power
consumption framework. While differential modulation schemes require more transmit power to
achieve the same BER, they can relax the constraints on the synchronization system. It was
determined at what packet length it makes sense to move to differential modulation schemes.
Following the framework developed here, a synchronization system was developed for a
wireless sensor network device that consumes 300uW (including ADC power). This is low
enough so that further reduction has very little impact on system energy consumption.
Because synchronization requirements are heavily dependent on system parameters like
modulation type, data rate, and analog front-end performance, no two synchronization systems
are the same (even within the same wireless communication standard). Therefore,
synchronization design has long been the domain of experts who use their experience to guide
their selection of the appropriate algorithms. While this method usually produces a system that
meets the performance requirements (variance or SNR margin), it is not necessarily optimal from
a power, area, or convergence time perspective. This work has shown that in some domains (i.e.
frequency estimation), systematic exploration of the space can result in straight forward rules for
achieving the performance (variance) with the best convergence time and power consumption.
For other spaces where straight-forward rules are not available (i.e. interpolators), heuristic
design can result in substantially suboptimal designs. Tools have been developed in this thesis to
rapidly characterize the large design space and allow the correct algorithm parameters to be
selected.
There are many fundamental benefits to completing the exploration of the synchronization
space. It will highlight areas for future algorithm development where existing algorithms are
inefficient. It will assist in producing the most efficient implementation for existing wireless
communication standards by determining which algorithms are most efficient for a given
110
specification. Finally, it can assist in the creation of new wireless communication standards to
meet the quality of service goals with the lowest power or lowest synchronization overhead.
The importance of synchronization within the wireless system is becoming more critical as
transmit distances are decreased, higher transmission speeds and higher order modulation
schemes are used, and integration concerns move the RF circuitry into digital CMOS processes.
Therefore, the results of this work will have more impact as time goes on.
111
Power Estimation Scripts
A
A.1 Makefile
#######################################################################
PROJ_NAME = FREQ_EST_7 #directory name consistent in all trees
SRC_ROOT = /tools/designs/tcir/synch/sim
MC_ROOT = /tools/designs/tcir/synch/mc/${PROJ_NAME}
NETLIST_SCRIPT = scripts/create_netlist_vlg.scr
REPORTING_SCRIPT = scripts/report_power.scr
TESTBENCH_SCRIPT = ${PROJ_NAME}/TB.vhd
SIMULATE_SCRIPT = scripts/simulate.do
SYNTHESIS_SCRIPT = scripts/syn_script.dc
MTI_ANALYZER = vcom
MTI_OPTS = -93 -source
MTI_WORK = -work work
VSIM_OPTS = -c -do
DC_SHELL = dc_shell
112
#------------------------------------------------
# report the power consumption
%.base_pwr %.node_pwr: %.pwr_scr %.mapd_vhd %.saif
dc_shell -f $(*).pwr_scr
# simulate to get the switching activity file
# depends on: .mapd_vhd, .fwd_saif .tb_vhd .do
%.saif: %.mapd_vhd %.fwd_saif %.tb_vhd %.do
-rm -r work
-vlib work
vmap work work
vmap CORE9GPLL /tools/picoradio/PN3/hw/lib/CORE9GPLL_VHDL_VITAL
vmap CORX9GPLL /tools/picoradio/PN3/hw/lib/CORX9GPLL_VHDL_VITAL
(${MTI_ANALYZER} ${MTI_OPTS} ${MTI_WORK}
/tools/picoradio/PN3/hw/lib/pulls.vhd)
${SRC_ROOT}/$(*).mapd_vhd)
${SRC_ROOT}/$(*).tb_vhd)
vsim ${VSIM_OPTS} $(*).do
-rm $(*).old_saif
mv $@ $(*).old_saif
sed -e 's/\\\[[0-9]*\\\]/~&~/g;s/~\\\[/\\$/g;s/\\\]~/\\$/g'
$(*).old_saif > $@
# analyze the vhdl file to create the database, mapped vhdl file, and
the forward saif file
%.mapd_vhd %.fwd_saif: %.phys_v %.net_scr
dc_shell -f $(*).net_scr
113
# make the netlisting script
%.net_scr: $(NETLIST_SCRIPT)
sed -e 's/BASENAME/$(*)/g' $(NETLIST_SCRIPT) > $@
# make the reporting script
%.pwr_scr: $(REPORTING_SCRIPT)
sed -e 's/BASENAME/$*/g' $(REPORTING_SCRIPT) > $@
# make the simulation do file
%.do: $(SIMULATE_SCRIPT)
sed -e 's/BASENAME/$*/g' $(SIMULATE_SCRIPT) > $@
# compile vhd source with constraints to create final physical verilog
%.phys_v: %.syn_scr %.fixed_vhd
-rm -r work/*
-vlib work
dc_shell -f $(*).syn_scr | tee $(*).compile_log
# fix the vhd from module compiler
%.fixed_vhd:
sed -e "s/INVERTER/INV/" ${MC_ROOT}/$(*).vhd > ./$(*).fixed_vhd
%.syn_scr: $(SYNTHESIS_SCRIPT)
sed -e 's/BASENAME/$*/g;' $(SYNTHESIS_SCRIPT) > $@
# can't make the testbench sufficiently parameterizable with this
version of Make
114
# need to set shell variables TEMPi and TEMPo to be the inwidth and
outwidth respectively
# before making the testbench
%.tb_vhd: $(TESTBENCH_SCRIPT)
# TEMPi := $(shell echo $* | awk '{ FS = "_" } ; {print $3}')
# TEMPo := $(shell echo $* | awk '{ FS = "_" } ; {print $4}')
sed -e
's/BASENAME/$*/g;s/INWIDTH/$(TEMPi)/g;s/OUTWIDTH/$(TEMPo)/g'
$(TESTBENCH_SCRIPT) > $@
# type: make foo.clean to clean all associated files
%.clean: FORCE
- rm $(*).tb_vhd $(*).do $(*).pwr_scr $(*).net_scr $(*).db
$(*).mapd_vhd $(*).mapd_vlg $(*).fwd_saif $(*).saif $(*).old_saif
$(*).syn_scr $(*).fixed_vhd $(*).compile_log $(*).mr $(*).st $(*).syn
$(*)__* $(*)*.pvl temp.v post*.db $(*).dc $(*).sdc
FORCE:
A.2 Netlist Script
/* dc_shell Command Log */
/* sets useful naming rules mostly for the backend */
/* unicad_setup_file =
"/tools/unicad2.4/HandOffKit_1.8.1.1/products/ptKit/etc/.synopsys_unica
d_dc.setup" */
vhdlout_write_components = true;
vhdlout_use_packages = {"IEEE.std_logic_1164", "CORE9GPLL.all" };
power_preserve_rtl_hier_names = true;
115
bus_naming_style = "%s[%d]" ;
bus_dimension_separator_style = "][";
bus_extraction_style = "%s[%d:%d]";
analyze -format vhdl BASENAME.vhd
elaborate BASENAME
set_fix_multiple_port_nets -all
/* 500 KHz clock -- denomiated in ps */
create_clock CLK -name clk -period 2000000
link
set_max_area 0
/* write the results */
define_name_rules vhdl -type port -allowed "A-Z a-z _ 0-9 () []" \
-first_restricted "0-9 _ ()
[]"
define_name_rules vhdl -type cell -allowed "A-Z a-z _ 0-9 ()" \
-first_restricted "0-9 _ ()"
define_name_rules vhdl -type net -allowed "A-Z a-z _ 0-9 () []" \
-first_restricted "0-9 _ ()
[]"
define_name_rules vhdl -special vhdl93
change_names -hier -rules vhdl
rtl2saif -output BASENAME.fwd_saif
116
write -hierarchy -format db -output BASENAME.db BASENAME
write -hierarchy -format vhdl -output BASENAME.mapd_vhd BASENAME
change_names -hier -rules verilog
write -hierarchy -format verilog -output BASENAME.mapd_vlg BASENAME
exit
A.3 Reporting Script
/* dc_shell Command Log */
bus_naming_style = "%s[%d]" ;
bus_dimension_separator_style = "][";
bus_extraction_style = "%s[%d:%d]";
vhdlout_use_packages = {"IEEE.std_logic_1164", "CORE9GPLL.all" }
power_preserve_rtl_hier_names = TRUE
read BASENAME.db
find_ignore_case = true
/* instance name must be lower case below */
read_saif -input BASENAME.saif -instance tb/dut -unit ns -scale 1 -
verbose
find_ignore_case = false
/* this one to report overall power */
report_power > BASENAME.base_pwr
/* this one to report power and switching activity per net */
report_power -flat -net -nosplit > BASENAME.node_pwr
exit
117
A.4 Testbench
-- Test Bench for BASENAME.vhd
use STD.textio.all;
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use IEEE.std_logic_textio.all;
use IEEE.std_logic_unsigned.all;
entity TB is
generic( ow : integer := OUTWIDTH;
iw : integer := INWIDTH);
end TB;
architecture stimulus of TB is
component BASENAME is
port( out_a : out std_logic_vector( ow-1 downto 0 );
in_i1 : in std_logic_vector( iw-1 downto 0 );
in_q1 : in std_logic_vector( iw-1 downto 0 );
load : in std_logic;
calc : in std_logic;
clear : in std_logic;
CLK : in std_logic );
end component;
signal oa : std_logic_vector(ow-1 downto 0);
signal oa_dum: std_logic_vector(ow-1 downto 0);
signal ii1: std_logic_vector(iw-1 downto 0);
signal iq1: std_logic_vector(iw-1 downto 0);
118
signal ld: std_logic;
signal ca: std_logic;
signal cl: std_logic;
signal clk: std_ulogic := '0';
file in_vec: text open read_mode is "./FREQ_EST_7/BASENAME.vec";
signal start : boolean := false;
begin
-- Clock process
clock: process
begin
-- define 1 MHz clock
wait for 500 ns;
clk <= not clk;
end process;
-- Device Under Test
dut : BASENAME
port map (oa, ii1, iq1, ld, ca, cl, clk);
-- Handle vector files
read_in : process (clk)
variable buf : line;
variable int : integer;
begin
if (clk' event and clk = '0') then
-- Read input vectors
readline(in_vec, buf);
read(buf, int);
ii1 <= conv_std_logic_vector(int, iw);
read(buf, int);
iq1 <= conv_std_logic_vector(int, iw);
119
read(buf, int);
ld <= conv_std_logic_vector(int, 1)(0);
read(buf, int);
ca <= conv_std_logic_vector(int, 1)(0);
read(buf, int);
cl <= conv_std_logic_vector(int, 1)(0);
assert not endfile(in_vec)
report "simulation done!"
severity FAILURE;
end if;
end process read_in;
end stimulus;
configuration cfg of TB is
for stimulus
end for;
end cfg;
A.5 Simulate Script
# execute with vsim -c -do simulate.do
vlib work
vsim +notimingchecks -t ps -c -foreign "dpfli_init
/tools/synopsys/2000.11/auxx/syn/power/dpfli/lib-sparcOS5/dpfli.so" TB
read_rtl_saif BASENAME.fwd_saif TB/DUT
set_net_monitoring_policy on TB/DUT
set_toggle_region TB/DUT
# times in ps
run 70000000
120
toggle_start
run 130000000
toggle_stop
toggle_report BASENAME.saif 1e-9 TB/DUT
exit
A.6 Synthesis Script
/****************************************/
/* Run all the synthesisys steps */
/****************************************/
/* a combination of all the scripts steps used as of 12/5/03 */
/********************/
/* setup.dc */
/********************/
sh date
/* Turn off the following error messages
(EQN-10) Warning: Defining new variable
(LINT-30) Warning: ##stuff that's fixed by
set_fix_multiple_net_ports###
(OPT-170) Information: Changed wire load model...
(OPT-171) Information: Changed minimum wire load model...
(OPT-200) Resolving this conflict by ignoring the
user_function_class attribute for this library cell.
*/
suppress_errors = suppress_errors + { EQN-10 LINT-30 OPT-170 OPT-171
OPT-200}
designer = "Josie Ammer"
company = "Berkeley Wireless Research Center"
view_background = "black";
121
common_lib_dir = "/tools/picoradio/PN3/hw/v1/syn/lib/"
common_script_dir = "/tools/picoradio/PN3/hw/v1/syn/lib/SCRIPTS/"
search_path = ". src SCRIPTS " + common_script_dir;
include common_script_dir + "synopsys_unicad_dc.setup"
include common_script_dir + "lib_load.dc"
define_design_lib work -path work
mydesign = BASENAME;
mc_designs = { "BASENAME" };
verilog_dest = mydesign + ".phys_v";
db_dest = mydesign + ".db";
flatten_design = 0;
opt_map_effort = "medium";
opt_area_effort = "medium";
opt_verify_effort = "none"
hdlin_enable_rtldrc_info = true;
hdlin_auto_save_templates = true;
timing_self_loops_no_skew = true;
vhdlout_use_packages = {"IEEE.std_logic_1164",
"CORE9GPLL.CORE9GPLL_COMPONENTS", "CORX9GPLL.CORX9GPLL_COMPONENTS"};
/************************/
/* readfiles.dc */
/************************/
analyze -format vhdl common_lib_dir + "pulls.vhd"
foreach ( mc_file, mc_designs ) {
read -format vhdl -library work mc_file + ".rst_vhd"
uniquify
elaborate -library work mydesign
current_design mydesign
122
link
write -hierarchy -output "postread.db"
/************************/
/* constrain.dc */
/************************/
include common_script_dir + "lib_minmax.dc"
auto_wire_load_selection = true
high_fanout_net_threshold = 0
compile_auto_ungroup_hierarchy = 1;
set_fix_multiple_port_nets -all -buffer_constants
set_max_fanout 20 all_inputs() - test_se
set_max_area 0.0
clk_name = {CLK}
create_clock -name clk_name -period 2000 -waveform {0 1000} {clk_name}
set_clock_skew -uncertainty 0.1 all_clocks()
set_drive 0 clk_name
set_auto_disable_drc_nets -none
/********************************/
/* boundary_initial.con */
/********************************/
max_transition_time_io = 0.1;
max_transition_time_internal = 0.2;
input_drive_cell = wc_lib_path + ":" + lib_name + "/IVLL/Z";
output_load_cell = wc_lib_path + ":" + lib_name + "/IVLL/A";
output_load_fanout = 4;
set_driving_cell -lib_cell IVLL -library wc_lib_path + ":" + lib_name -
pin Z all_inputs() - CLK - test_se
set_load {load_of(output_load_cell) * output_load_fanout} all_outputs()
123
set_max_transition max_transition_time_internal mydesign
set_max_transition max_transition_time_io all_inputs()
set_max_transition max_transition_time_io all_outputs()
/***********************/
/* optimize.dc */
/***********************/
uniquify
if( flatten_design ) {
ungroup -all -flatten
if( opt_verify_effort != "none" ) {
compile -map_effort opt_map_effort -area_effort opt_area_effort -
verify -verify_effort opt_verify_effort -auto_ungroup area
} else {
compile -map_effort opt_map_effort -area_effort opt_area_effort -
auto_ungroup area
write -hierarchy -output "postcompile1.db"
/********************************************/
/* Boundary optimization */
/********************************************/
simplify_constants -boundary
compile_delete_unloaded_sequential_cells = "true"
compile -inc -boundary
remove_unconnected_ports -blast_bus find(cell, "*", -hier)
write -hierarchy -output "postcompile2.db"
/********************************************/
/* Fix contamination delay (min path) */
124
/********************************************/
remove_attribute find( lib_cell, {wc_libx_path + ":" + libx_name +
"/DLY*"}) dont_use
set_dont_use find( lib_cell, {wc_libx_path + ":" + libx_name +
"/*X05"})
set_fix_hold all_clocks()
compile -incremental
write -hierarchy -output "postfixhold.db"
/********************************************/
/* Check the design */
/********************************************/
check_design
check_timing
/*************************/
/* writefiles.dc */
/*************************/
include common_script_dir + "change_names.dc"
write current_design -hier -format db -output postchangename.db
write current_design -hier -format verilog -output "temp.v"
write_sdc mydesign + ".sdc"
write_script -hier -format dcsh -output mydesign + ".dc"
/* Fixes extraneous "assign" statements and also renames records */
remove_design -design
read -format verilog temp.v
/* DON'T DO A COMPILE HERE WITHOUT PROPER CONSTRAINTS!! */
write current_design -hier -format db -output db_dest
write -format verilog -hier -o verilog_dest
/********************************************/
125
/* Write reports */
/********************************************/
report_area >report.final.area;
report_timing -delay max >report.final.timing;
report_timing -delay min >>report.final.timing;
report_test -configuration >report.final.test_cfg
check_test > report.final.test
report_test -scan_path -register > report.final.scanchains
report_constraint -all_violators >report.final.violations
report_design >report.final.design
exit;
126
References
[AMM] M. Josie Ammer, Michael Sheets, Tufan C. Karalar, Mika Kuulusa, Jan Rabaey, "A
Low-Energy Chip-Set for Wireless Intercom," Proceedings of the Design Automation
Conference (DAC), Anaheim, CA, June 2-6, 2003.
[CAM] Kevin Camera, "SF2VHD: A Stateflow to VHDL Translator," Masters Thesis,
Department of Electrical Engineering, University of California, Berkeley. May 2001
[CHA] Glenn Chang, et. al., “A Direct-Conversion Single-Chip Radio-Modem for Bluetooth”,
Proceedings of the International Solid-State Circuits Conference, San Francisco, CA,
USA. February, 2002.
[DEN] P. Dent, G. E. Bottomley and T. Croft, “Jakes Fading Model Revisited”, IEEE
Electronics Letters, 24th June 1993 Vol. 29 No. 13.
[HAI] Saleem Haider, “Datapath Synthesis and Optimization for High-Performance ASICs”
http://www.synopsys.com/products/datapath/datapath_bgr.html. October, 1997.
Referenced 10/4/04.
[HUS] Paul James Husted, "Design and Implementation of Digital Timing Recovery and
Carrier Synchronization for High Speed Wireless Communication," Masters Thesis,
Department of Electrical Engineering, University of California, Berkeley. May 2000.
[ISO] “Information Technology – Open Systems Interconnection – Basic Reference Model,”
International Organization for Standardization (ISO). Standard number ISO/IEC 7498-
1:1994, 1994.
127
[ITU] International Telecommunication Union. Recommendation ITU-R P.1238-2.
“Propagation data and prediction methods for the planning of indoor radio
communication systems and radio local area networks in the frequency range 900 MHz
to 100 GHz.”
[JTC] Joint Technical Committee of Committee T1 R1P14 and TIA TR46.3.3 / TR45.4.4 on
Wireless Access, Final Report on RF Channel Characterization, Paper No.
JTC(AIR)/94.0.1.17-238R4, Jan. 17, 1994.
[KOK] Masaru Kokubo, et. al., “A 2.4GHz RF Transceiver with Digital Channel Selection
Filter for Bluetooth,” Proceedings of the International Solid-State Circuits Conference,
San Francisco, CA, USA. February, 2002.
[LIN] En-Yi A. Lin, Jan M. Rabaey, and Adam Wolisz, ‘Power-Efficient Rendez-vous
Schemes for Dense Wireless Sensor Networks’, Proceedings of the International
Conference on Communications, Paris, France, 2004.
[MAT] Simulink and Stateflow, from the MathWorks, Inc., see http://www.mathworks.com
[MEY] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Communication Receivers:
Synchronization, Channel Estimation and Signal Processing, Wiley Press, 1998.
[OTI] B.P. Otis, Y.H. Chee, R. Lu, N.M. Pletcher, and J.M. Rabaey, ‘An Ultra-Low Power
MEMS-Based Two-Channel Transceiver for Wireless Sensor Networks’, Proceedings
of the Symposium on VLSI Circuits, Honolulu, Hawaii, 2004.
[PRO] John G. Proakis, Digital Communications, McGraw Hill Press, 1995.
[RAB] Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, Digital Integrated Circuits, a
Design Perspective, Second Edition. Pearson Education, Inc. 2003.
[RAB2] J. Rabaey, J. Ammer, T. Karalar, S. Li, B. Otis, M. Sheets, T. Tuan, "PicoRadios for
Wireless Sensor Networks: The Next Challenge in Ultra-Low-Power Design"
February 3-7, 2002.
128
[SCO] Michael D. Scott, Bernhard E. Boser, and Kristofer S. J. Pister, “An Ultra-Low Power
ADC for Distributed Sensor Networks,” Proceedings of the European Solid-State
Circuits Conference. Florence, Italy. September 2002.
[SHE] M. Sheets, B. Otis, F. Burghardt, J. Ammer, T. Karalar, P. Monat, and J. Rabaey, “A
5.8x3.3 cm^2 Self-contained Energy-scavenging Wireless Sensor Network Node,” The
Proceedings of the Wireless Personal Multimedia Communications Conference, Abano
Terme, Italy. Sept. 12-15, 2004.
[SHI] Changchun Shi, "Floating-point to Fixed-point Conversion," Ph.D. dissertation, UC
Berkeley, Department of EECS, Berkeley, CA 2004.
[STE] Stephens, Phase-Locked Loops for Wireless Communications, Second Edition.
[SYN] Module Compiler, from Synopsys, Inc., see http://www.synopsys.com
[TAV] G. Tavares, L. Tavares, and M. Piedade, “Improved Cramer-Rao Lower Bounds for
Phase and Frequency Estimation With M-PSK Signals”, IEEE Transactions on
Communications, Vol. 49, No. 12, December 2001.
[THO] John Thomson, et. al., “An Integrated 802.11a Baseband and MAC Processor,”
USA. February, 2002.
[TUR] K. Turkowski, “Fixed-Point Trigonometry with CORDIC Iterations”. Apple Computer
White Paper, January 17, 1990.
[VO] Nguyen Doan Vo and Tho Le-Ngoc, “Optimal Interpolators for Flexible Digital
Receivers”, Proceedings of the IEEE Vehicular Technology Conference, Orlando, FL,
USA, October 2003.
[VO2] Nguyen Doan Vo and Tho Le-Ngoc, “Low-Complexity Optimal Symmetric
Interpolation Filters for SDR Receivers”, Proceedings of the IEEE Canadian
Conference on Electrical and Computer Engineering. Montreal, Quebec, Canada, May
2003.
129
[WAL] Robert H. Walden, “Analog to Digital Converter Survey and Analysis”, IEEE Journal
on Selected Areas in Communications, Vol. 17, No. 4, April 1999.
[YEE] D.G.-W. Yee, “A design methodology for highly-integrated low-power receivers for
wireless communications,” Ph.D. dissertation, UC Berkeley, Department of EECS,
Berkeley, CA 2001.
130

Re SH Mi

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Re SH Mi

Uploaded by

Copyright:

Available Formats

Low Power Synchronization for Wireless Communication

Marcy Josephine Ammer

B.S. (Massachusetts Institute of Technology) 1997

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Engineering – Electrical Engineering and Computer Sciences

UNIVERSITY OF CALIFORNIA, BERKELEY

Professor Jan Rabaey, Chair

University of California, Berkeley

Marcy Josephine Ammer

Low Power Synchronization for Wireless Communication

Marcy Josephine Ammer

Doctor of Philosophy in Engineering – Electrical Engineering and Computer Sciences

University of California, Berkeley

Professor Jan Rabaey, Chair

Synchronization is increasingly important in wireless communication devices.

Synchronization performance is critical to system performance and, it is where a large amount of

design time and receiver area and power is spent.

require increased synchronization functionality.

3. Trends toward higher bandwidth efficiency moves modulation to higher order

constellations where synchronization specifications are tighter.

savings can be realized through systematic exploration of synchronization power consumption.

system examples to show the impact of this type of exploration.

that frequency estimation power reductions of up to 4x are possible while simultaneously

selection can result in a 10x reduction in power consumption.

for the constant encouragement to do good work.

Thanks to Paul Wright for adding a different perspective with flair.

for his constant wisdom, both technical and non-technical.

will sorely miss the safety net your knowledge provides.

August 28, 2004

Table of Contents ...................................................................................................................... iii

List of Figures .......................................................................................................................... vii

2.1 Introduction ...................................................................................................................... 7

2.3 Metrics for Comparing Algorithms................................................................................ 13

2.4 Wireless Channel Models............................................................................................... 16

3 Evaluation and Exploration Environment ............................................................................. 22

3.1 Introduction .................................................................................................................... 22

3.2 Simulation and HDL Description of Algorithms ........................................................... 24

3.3 Gate-level Power Estimation.......................................................................................... 26

3.4 Analog to Digital Converter Power Estimation ............................................................. 32

3.5 System Power Estimation Tool ...................................................................................... 33

4 PNII System .......................................................................................................................... 35

4.1 Introduction .................................................................................................................... 35

4.3 Synchronization System................................................................................................. 37

4.3.1 Timing Recovery..................................................................................................... 38

4.3.2 Course Timing Estimation ...................................................................................... 39

4.3.3 Fine Timing and Frequency Estimation .................................................................. 40

4.3.4 Frequency Correction and Timing Tracking ........................................................... 41

4.3.5 Phase Estimation and Correction ............................................................................ 42

4.3.6 Synchronization System Performance..................................................................... 43

4.4 Results and Conclusion .................................................................................................. 44

5.1 Introduction .................................................................................................................... 46

5.2 Frequency Estimation Algorithms ................................................................................. 47

5.3 Power Estimation Methodology..................................................................................... 52

5.4 Algorithm Comparison and Results ............................................................................... 53

5.6 Postscript: Application to DSSS Systems ...................................................................... 57

6 PNII System Refinement....................................................................................................... 60

6.1 Introduction .................................................................................................................... 60

6.2 Frequency Estimation Refinement ................................................................................. 61

6.3 Frequency and Phase Estimation Redesign.................................................................... 63

6.3.1 Differential Modulation Penalty.............................................................................. 64

6.3.2 Phase Error vs. SNR Degradation ........................................................................... 65