Professional Documents
Culture Documents
AbstractDesign techniques for a complete 60 Gb/s receiver severe power constraint. Furthermore, in order to cancel precur-
frontend with equalization, output slicing/demultiplexing, and sor ISI as well as postcursor ISI and hence to cover a broader
clocking capabilities are described. Current integration combined range of channel characteristics, a receive feed-forward equal-
with a cascode gate-voltage bias gain-control technique enables
energy-efficient implementation of CTLE, FFE, and DFE cir- izer (FFE) is a desirable addition to the CTLE and DFE. Along
cuits while operating near the speed limits of the technology. with the improvements to the equalizers themselves, a complete
Despite following the DFE that has already in principle sliced the frontend must include additional circuitry to support equalizer
data, adaptive error-sampling requires high gain to resolve small adaptation and clock data recovery (CDR).
residual error signalsthis challenge is addressed by the addi- To demonstrate that all of these requirements can be ful-
tion of interleaved, offset-canceled deserializing samplers. Clock
generation as well as distribution circuits are implemented to com- filled while meeting the overall power-budget target, this
plete the receiver frontend. The proposed 65 nm CMOS receiver work presents design techniques used to realize a 65 nm
operates at 60 Gb/s, consuming 173 mW from 1.2 V and 1.0 V 60 Gb/s receiver frontend (Fig. 1) including CTLE, FFE, DFE,
supplies. output slicers, and clock generation as well as distribution
Index TermsChip-to-chip communication, current integra- circuits [5]. Despite a substantial expansion in equalization
tion, decision feedback equalizer (DFE), feedforward equalizer capabilities (CTLE + FFE), the power consumption of the
(FFE), high-speed links. complete equalizer path remains essentially identical to our
earlier 46 mW DFE-only design [2]. This paper therefore
I. I NTRODUCTION begins by introducing the two key techniques that enabled this
advancementspecifically, current integration combined with
T HE everlasting expansion of Internet connectivity has
raised data traffic substantially, increasing demands
on high-bandwidth wireline communication systems.
the dynamic-latch-based DFE architecture from [2], and cas-
code gate-voltage bias as an efficient means of analog gain
Extrapolation from recent technical trends [1] indicates control for the FFE. Section III then describes the components
the need for 60 + Gb/s chip-to-chip transceiver systems in necessary to complete the receiver front-end design: interleaved
the near future. In contrast to this rapid bandwidth increase, and offset-canceled deserializing slicers to enable adaptive
the allowable power consumption of high-speed transceivers error-sampling as well as LC-based clock generation/buffering.
remains relatively constant. Specifically, assuming the same As shown in the measurement results in Section IV, all of
total power budget as current designs, receivers operating in these techniques combined result in a receiver frontend sup-
the range of 5060 Gb/s must achieve 24 pJ/bit efficiency porting 60 Gb/s while consuming 173 mV from 1.2 V and 1.0 V
to remain within the current total power window. supplies.
Several 5680 Gb/s receiver circuits have been reported
in recent years [2][4] as responses to the aforementioned II. 60 G B / S E QUALIZER D ESIGN
trends. These designs primarily focused on implementing high-
speed continuous-time linear equalizer (CTLE) and decision- As mentioned in Section I, multiple equalizers are necessary
feedback equalizer (DFE) circuits demonstrating the ability to to cancel various types of ISI in band-limited channel pulse
cancel intersymbol interference (ISI), while typically operating responses. While the most significant postcursor ISI taps are
close to the intrinsic speed limits of the underlying technol- typically handled by a receive DFE [2][9], a CTLE is com-
ogy. As these designs highlighted the feasibility of 60+Gb/s monly placed in front of the DFE to cancel long-tail postcursor
equalization, further power reduction is critical to enable inte- ISI [3], [7][9] that can span over 1030 taps and is, thus, too
gration of complete receiver functionality, while meeting the costly to be removed by the DFE (due to the large number of
delay latches required to store the data). However, the precur-
Manuscript received September 10, 2015; revised January 10, 2016; accepted sor ISI taps cannot be canceled by DFEs (since they rely on a
January 05, 2016. This paper was approved by Guest Editor Andreia Cathelin.
causal feedback loop), and typical CTLE transfer functions do
J. Han, N. Sutardja, K. Jung, and E. Alon are with the Berkeley Wireless
Research Center, University of California, Berkeley, CA 94704 USA (e-mail: not directly correct for precursor ISI either.
jdhan@eecs.berkeley.edu). A feed-forward equalizer (FFE) [10][13] is therefore often
Y. Lu is with Qualcomm Atheros Inc., San Jose, CA 95110 USA (e-mail: used to handle these precursor ISI taps. FFEs are often realized
yue.lu.phd@ieee.org).
Color versions of one or more of the figures in this paper are available online
at the transmitter side due to the simplicity of implementation
at http://ieeexplore.ieee.org. [10], [11]. However, at the receive side, the underlying cir-
Digital Object Identifier 10.1109/JSSC.2016.2519389 cuits are not required to drive a 50 termination, opening
0018-9200 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
A. Current-Integrating Equalizers
Fig. 3. Schematics of CTLE + DMUX and following FFE latches. VCKC,
Current-integrating equalizers were originally proposed in VCKP, and VCKN are in the same phase.
[13] for a low-power DFE and have been applied in a number
of designs since then [13][16], [18]. An extensive analysis of
the current-integrating DFE/FFE [14] reveals that the current-
integration technique typically provides 3 power-saving
compared to resistive loaded stages. Hence, we have applied
this scheme throughout the entire equalizer chain except for
the final 1-tap dynamic-latch-based DFE, which generates NRZ
outputs for DFE feedback.
The receiver-equalizer architecture is shown in Fig. 2.
The source-degenerated current-integrating CTLEs transfer
function is tailored to mitigate long-tail ISI, and the circuit
performs demultiplexing so that the following stages can oper-
ate at half-rate with wider integration time windows. Since
the front-end CTLE integrates over an entire bit period, as
described in [15], it can introduce 3.9 dB of pattern-dependent
high-frequency loss (typically one pre- and/or one postcursor Fig. 4. FFE + DFE integrator schematics.
tap), but these taps are readily handled by the power-efficient
current-integrating FFE and DFE stages. each subbranch, after the integration phase, PMOS precharge
Fig. 3 shows the CTLE integrator structure. The input dif- devices reset the output nodes to VDD. Within the context
ferential pair (M0, M1) is capacitively degenerated for linear of the overall multistage equalizer that contains such multi-
equalization, while switches (M2M5) are inserted between ple current-integration stages, this resetting (return-to-zero or
the input pair and output nodes to support the demultiplexing RZ) behavior must be carefully managed since the output sig-
operation. Inspired by the cascoded sampling technique pro- nal of one stage would not naturally be available to the next
posed in [17], the M2M5 switches are alternatively turned stage during its integration time. Dynamic latches (L0L3) are
on (by VCKC and /VCKC) to steer the differential input. For therefore inserted after the CTLE to convert the RZ waveforms
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 6. Variable cascode gate-voltage bias: (a) schematics and (b) linearity
comparison with a variable tail current-control scheme. In (b), Vin1dB is the
input voltage at which the gain drop by 1 dB from its target value, while Vinmax
is the maximum input-swing expected to be fed into the FFE stage.
1 The finite resetting also generates ISI in later post taps, but these are It is also important to note that the tap strength has to be
assumed to be small due to the accumulated suppression from multiple resets. increased by eN,reset to cancel the translated ISI from finite
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 14. (a) Resonant clock buffer with clock distribution network. (b) 30 GHz clock divider.
Fig. 16. Measurement setup and measured waveforms. (All clock sources are synchronized).
TABLE I
Subrate clocks have to be generated as well for the inter- P ERFORMANCE S UMMARY
leaved sampler and back-end functions, such as deserialization,
adaptation, and CDR. The same dynamic-latch design used in
the DFE feedback paths is reused to implement the 30 GHz
clock divider [Fig. 14(b)]. Due to placement considerations
and improved distribution of the capacitive-loading between
the oscillator and the output buffer, the dividers input clock is
driven directly by the oscillator rather than by the clock buffer.
Following the 30 GHz divider, a phase interpolator is inserted
to enable optimal placement of the deserializing clock phases.
achieves excellent power consumption in a mature 65 nm tech- [7] J. E. Proesel and T. O. Dickson, A 20-Gb/s, 0.66-pJ/bit serial receiver
nology. Specifically, despite the addition of CTLE and FFE, with 2-stage continuous-time linear equalizer and 1-tap decision feedback
equalizer in 45 nm SOI CMOS, in Symp. VLSI Circuits (VLSIC11) Dig.,
the equalizer core achieves nearly the same power consumption 2011, pp. 206207.
(48 mW) as the previous three-tap DFE design (46 mW) [2]. [8] P. Upadhyaya et al., 3.3 A 0.5-to-32.75 Gb/s flexible-reach wireline
The receiver frontend further supports 30 GHz clock generation transceiver in 20 nm CMOS, in IEEE Int. Solid-State Circuits Conf.
(ISSCC13) Dig. Tech. Papers, 2013, pp. 4243.
and distribution as well as the high-speed hardware necessary to [9] S. Parikh et al., A 32 Gb/s wireline receiver with a low-frequency equal-
support adaptive equalization and a baud-rate CDR, and despite izer, CTLE and 2-tap DFE in 28 nm CMOS, in IEEE Int. Solid-State
being implemented in a substantially more mature process tech- Circuits Conf. Dig. Tech. Papers (ISSCC13), 2013, pp. 2829.
[10] A. A. Hafez, M.-S. Chen, and C.-K. K. Yang, A 32-to-48Gb/s serializing
nology (65 nm vs. 20 nm), does so while dissipating slightly transmitter using multiphase sampling in 65 nm CMOS, in IEEE Int.
lower power and operating at slightly higher data rate than [3]. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC13), 2013, pp. 38
39.
[11] M.-S. Chen and C.-K. K. Yang, A 5064 Gb/s serializing transmitter
with a 4-Tap, LC-ladder-filter-based FFE in 65 nm CMOS technol-
V. C ONCLUSION ogy, IEEE J. Solid-State Circuits, vol. 50, no. 8, pp. 19031916, Aug.
2015.
This paper reports a 60 Gb/s receiver frontend in 65 nm [12] J. E. Jaussi et al., 8-Gb/s source-synchronous I/O link with adaptive
CMOS technology. The receiver incorporates CTLE, FFE, receiver equalization, offset cancellation, and clock de-skew, IEEE J.
Solid-State Circuits, vol. 40, no. 1, pp. 8088, Jan. 2005.
and DFE equalizers, output slicing, and clocking blocks to [13] M. Park, J. Bulzacchelli, M. Beakes, and D. Friedman, A 7 Gb/s 9.3 mW
demonstrate the possibility of serial-link receiver operation 2-tap current-integrating DFE receiver, in IEEE Int. Solid-State Circuits
operating near the frequency limits of 65 nm CMOS technol- Conf. Dig. Tech. Papers (ISSCC07), 2007, pp. 230599.
[14] C. Thakkar, N. Narevsky, C. D. Hull, and E. Alon, Design techniques
ogy. The energy-efficient equalization functionality is achieved for a mixed-signal I/Q 32-coefficient Rx-feedforward equalizer, 100-
by a current-integration technique combined by cascode gate- coefficient decision feedback equalizer in an 8 Gb/s 60 GHz 65 nm
voltage gain control for efficient RX FFE, passive-clock delay LP CMOS receiver, IEEE J. Solid-State Circuits, vol. 49, no. 11,
pp. 25882607, Nov. 2014.
between the current-integrating and dynamic-latch stages, and [15] T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, A 12-Gb/s 11-
optimized reset device settling time/sizing. In addition to imple- mW half-rate sampled 5-tap decision feedback equalizer with current-
menting the 60 Gb/s equalizer, we focused on developing all integrating summers in 45-nm SOI CMOS technology, IEEE J. Solid-
State Circuits, vol. 44, no. 4, pp. 12981305, Apr. 2009.
high-frequency data and clock paths for the complete receiver [16] B. Kim, Y. Liu, T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, A
frontend operation, including interleaved deserializing slicers, 10-Gb/s compact low-power serial I/O with DFE-IIR equalization in 65-
an oscillator, and clock dividers. The design achieves 60 Gb/s nm CMOS, IEEE J. Solid-State Circuits, vol. 44, no. 12, pp. 35263538,
Dec. 2009.
operation with > 0.2 UI-timing margin at 1e-9 BER (and error- [17] Y. Duan and E. Alon, A 12.8 GS/s time-interleaved ADC with 25 GHz
free operation over 1e12 bits at the center of the eye), while effective resolution bandwidth and 4.6 ENOB, IEEE J. Solid-State
consuming 173 mW from 1.2 V and 1.0 V supplies in 65 nm Circuits, vol. 49, no. 8, pp. 17251738, Sep. 2014.
[18] R. Bai, S. Palermo, and P. Y. Chiang, A 0.25 pJ/b 0.7V 16 Gb/s 3-tap
CMOS. decision-feedback equalizer in 65 nm CMOS, in IEEE Int. Solid-State
Circuits Conf. Dig. Tech. Papers (ISSCC14), 2014, pp. 4647.
[19] V. Stojanovic et al., Autonomous dual-mode (PAM2/4) serial link
transceiver with adaptive equalization and data recovery, IEEE J. Solid-
ACKNOWLEDGMENT State Circuits, vol. 40, no. 4, pp. 10121026, Apr. 2005.
[20] M.-J. E. Lee, W. J. Dally, and P. Chiang, Low-power area-efficient high-
The authors would like to thank Systems on Nanoscale speed I/O circuit techniques, IEEE J. Solid-State Circuits, vol. 35, no. 11,
Information fabriCs (SONIC), BWRC, Berkeley Design pp. 15911599, Nov. 2000.
Automation, Integrand EMX, Lorentz PeakView, the TSMC [21] L. Kong, Y. Lu, and E. Alon, A multi-GHz area-efficient comparator
with dynamic offset cancellation, in Proc. IEEE Custom Integr. Circuits
University Shuttle Program, and B. Casper of Intel, K. Chang Conf. (CICC11), 2011, pp. 14.
of Xilinx, P. Y. Chiang of OSU, C.-K. K. Yang of UCLA, P. K. [22] M. Jeeradit et al., Characterizing sampling aperture of clocked compara-
Hanumolu of UIUC, and V. Stojanovic of UC Berkeley. tors, in IEEE Symp. VLSI Circuits Dig., 2008, pp. 6869.
[23] J. Kim, B. S. Leibowitz, R. Jihong, and C. J. Madden, Simulation and
analysis of random decision errors in clocked comparators, IEEE Trans.
Circuits Syst. I Reg. Papers, vol. 56, no. 8, pp. 18441857, Aug. 2009.
R EFERENCES [24] J. Crossley et al., BAG: A designer-oriented integrated framework for
the development of AMS circuit generators, in Proc. IEEE/ACM Int.
[1] S. Narendra, L. Fujino, and K. Smith, Through the looking glass - Conf. Comput.-Aided Des. (ICCAD), 2013, pp. 7481.
the 2015 edition: Trends in solid-state circuits from ISSCC, IEEE
Solid-State Circuits Mag., vol. 7, no. 1, pp. 1424, Feb. 2015.
[2] Y. Lu and E. Alon, Design techniques for a 66 Gb/s 46 mW 3-tap deci-
sion feedback equalizer in 65 nm CMOS, IEEE J. Solid-State Circuits,
vol. 48, no. 12, pp. 32433257, Dec. 2013. Jaeduk Han (S15) received the B.S. and M.S.
[3] T. Shibasaki et al., A 56-Gb/s receiver front-end with a CTLE and 1-tap degrees in electrical engineering from Seoul National
DFE in 20-nm CMOS, in Symp. VLSI Circuits Dig., 2014, pp. 12. University, Seoul, Korea, in 2007, and 2009, respec-
[4] A. Awny, L. Moeller, J. Junio, J. C. Scheytt, and A. Thiede, Design and tively. He is currently working toward the Ph.D.
measurement techniques for an 80 Gb/s 1-tap decision feedback equal- degree in electrical engineering at the University of
izer, IEEE J. Solid-State Circuits, vol. 49, no. 2, pp. 452470, Feb. California at Berkeley.
2014. He was a Circuit Design Engineer at TLI from
[5] J. Han, Y. Lu, N. Sutardja, K. Jung, and E. Alon, A 60 Gb/s receiver 2009 to 2012, and has held engineering intern posi-
frontend in 65 nm CMOS technology, in Symp. VLSI Circuits Dig., 2015, tions at Altera, Intel, and Xilinx, in 2012, 2014,
pp. 230231. and 2015, respectively, where he worked on high-
[6] T. Toifl et al., A 2.6 mW/Gbps 12.5 Gbps RX with 8-tap switched capac- speed wireline communication circuits and power
itor DFE in 32 nm CMOS, IEEE J. Solid-State Circuits, vol. 47, no. 4, management circuits. His research interests include high-speed wireline com-
pp. 897910, Apr. 2012. munication circuit design and analog circuit design automation.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Yue Lu (S08M14) received the B.E. degree in Elad Alon (M06SM12) received the B.S., M.S.,
electronic science and technology from Shanghai and Ph.D. degrees in electrical engineering from
Jiao Tong University, Shanghai, China, in 2008, and Stanford University, Stanford, CA, USA, in 2001,
the Ph.D. degree in electrical engineering from the 2002, and 2006, respectively.
University of California, Berkeley, CA, USA, in In January 2007, he joined the University of
2014. He also studied at Carnegie Mellon University, California, Berkeley, CA, USA, where he is currently
Pittsburgh, PA, USA, in 2007, as an Undergraduate an Associate Professor of Electrical Engineering
Exchange Student. and Computer Sciences as well as a Codirector of
He is currently with Qualcomm Atheros Inc., San the Berkeley Wireless Research Center (BWRC),
Jose, CA, USA. Berkeley, CA, USA. He has held consulting, vis-
Dr. Lu was the recipient of the 20132014 IEEE iting, or advisory positions at Lion Semiconductor,
Solid-State Circuits Society Predoctoral Achievement Award, the 2013 James Wilocity, Cadence, Xilinx, Oracle, Intel, AMD, Rambus, Hewlett Packard, and
H. Eaton Memorial Scholarship from UC Berkeley, the 2013 ADI Outstanding IBM Research, where he worked on digital, analog, and mixed-signal integrated
Student Designer Award, and the 2012 Custom Integrated Circuits Conference circuits for computing, test and measurement, and high-speed communications.
Best Student Paper Award. His research interests include energy-efficient integrated systems, such as the
circuit, device, communications, and optimization techniques used to design
them.
Nicholas Sutardja (S12) received the B.S. degree Dr. Alon was the recipient of the IBM Faculty Award in 2008, the 2009
in electrical engineering and computer science and Hellman Family Faculty Fund Award as well as the 2010 UC Berkeley
the B.A. degree in applied mathematics from the Electrical Engineering Outstanding Teaching Award, and has co-authored
University of California, Berkeley, CA, USA, in papers that received the 2010 ISSCC Jack Raper Award for Outstanding
2012. He is currently working toward the Ph.D. Technology Directions Paper, the 2011 Symposium on VLSI Circuits Best
degree in electrical engineering at the University of Student Paper Award, and the 2012 and 2013 Custom Integrated Circuits
California, Berkeley. Conference Best Student Paper Award.
Additionally, he worked on high-speed wireline
receivers at Altera and sensors for pulse oximetry
at ADI in the summer of 2011 and 2014, respec-
tively. His research interests include mixed signal
ICs, energy efficient high-speed link systems, analog design methodologies,
biomedical devices, and sensors.