You are on page 1of 10

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE JOURNAL OF SOLID-STATE CIRCUITS 1

Design Techniques for a 60 Gb/s 173 mW Wireline


Receiver Frontend in 65 nm CMOS Technology
Jaeduk Han, Student Member, IEEE, Yue Lu, Member, IEEE, Nicholas Sutardja, Student Member, IEEE,
Kwangmo Jung, Student Member, IEEE, and Elad Alon, Senior Member, IEEE

AbstractDesign techniques for a complete 60 Gb/s receiver severe power constraint. Furthermore, in order to cancel precur-
frontend with equalization, output slicing/demultiplexing, and sor ISI as well as postcursor ISI and hence to cover a broader
clocking capabilities are described. Current integration combined range of channel characteristics, a receive feed-forward equal-
with a cascode gate-voltage bias gain-control technique enables
energy-efficient implementation of CTLE, FFE, and DFE cir- izer (FFE) is a desirable addition to the CTLE and DFE. Along
cuits while operating near the speed limits of the technology. with the improvements to the equalizers themselves, a complete
Despite following the DFE that has already in principle sliced the frontend must include additional circuitry to support equalizer
data, adaptive error-sampling requires high gain to resolve small adaptation and clock data recovery (CDR).
residual error signalsthis challenge is addressed by the addi- To demonstrate that all of these requirements can be ful-
tion of interleaved, offset-canceled deserializing samplers. Clock
generation as well as distribution circuits are implemented to com- filled while meeting the overall power-budget target, this
plete the receiver frontend. The proposed 65 nm CMOS receiver work presents design techniques used to realize a 65 nm
operates at 60 Gb/s, consuming 173 mW from 1.2 V and 1.0 V 60 Gb/s receiver frontend (Fig. 1) including CTLE, FFE, DFE,
supplies. output slicers, and clock generation as well as distribution
Index TermsChip-to-chip communication, current integra- circuits [5]. Despite a substantial expansion in equalization
tion, decision feedback equalizer (DFE), feedforward equalizer capabilities (CTLE + FFE), the power consumption of the
(FFE), high-speed links. complete equalizer path remains essentially identical to our
earlier 46 mW DFE-only design [2]. This paper therefore
I. I NTRODUCTION begins by introducing the two key techniques that enabled this
advancementspecifically, current integration combined with
T HE everlasting expansion of Internet connectivity has
raised data traffic substantially, increasing demands
on high-bandwidth wireline communication systems.
the dynamic-latch-based DFE architecture from [2], and cas-
code gate-voltage bias as an efficient means of analog gain
Extrapolation from recent technical trends [1] indicates control for the FFE. Section III then describes the components
the need for 60 + Gb/s chip-to-chip transceiver systems in necessary to complete the receiver front-end design: interleaved
the near future. In contrast to this rapid bandwidth increase, and offset-canceled deserializing slicers to enable adaptive
the allowable power consumption of high-speed transceivers error-sampling as well as LC-based clock generation/buffering.
remains relatively constant. Specifically, assuming the same As shown in the measurement results in Section IV, all of
total power budget as current designs, receivers operating in these techniques combined result in a receiver frontend sup-
the range of 5060 Gb/s must achieve 24 pJ/bit efficiency porting 60 Gb/s while consuming 173 mV from 1.2 V and 1.0 V
to remain within the current total power window. supplies.
Several 5680 Gb/s receiver circuits have been reported
in recent years [2][4] as responses to the aforementioned II. 60 G B / S E QUALIZER D ESIGN
trends. These designs primarily focused on implementing high-
speed continuous-time linear equalizer (CTLE) and decision- As mentioned in Section I, multiple equalizers are necessary
feedback equalizer (DFE) circuits demonstrating the ability to to cancel various types of ISI in band-limited channel pulse
cancel intersymbol interference (ISI), while typically operating responses. While the most significant postcursor ISI taps are
close to the intrinsic speed limits of the underlying technol- typically handled by a receive DFE [2][9], a CTLE is com-
ogy. As these designs highlighted the feasibility of 60+Gb/s monly placed in front of the DFE to cancel long-tail postcursor
equalization, further power reduction is critical to enable inte- ISI [3], [7][9] that can span over 1030 taps and is, thus, too
gration of complete receiver functionality, while meeting the costly to be removed by the DFE (due to the large number of
delay latches required to store the data). However, the precur-
Manuscript received September 10, 2015; revised January 10, 2016; accepted sor ISI taps cannot be canceled by DFEs (since they rely on a
January 05, 2016. This paper was approved by Guest Editor Andreia Cathelin.
causal feedback loop), and typical CTLE transfer functions do
J. Han, N. Sutardja, K. Jung, and E. Alon are with the Berkeley Wireless
Research Center, University of California, Berkeley, CA 94704 USA (e-mail: not directly correct for precursor ISI either.
jdhan@eecs.berkeley.edu). A feed-forward equalizer (FFE) [10][13] is therefore often
Y. Lu is with Qualcomm Atheros Inc., San Jose, CA 95110 USA (e-mail: used to handle these precursor ISI taps. FFEs are often realized
yue.lu.phd@ieee.org).
Color versions of one or more of the figures in this paper are available online
at the transmitter side due to the simplicity of implementation
at http://ieeexplore.ieee.org. [10], [11]. However, at the receive side, the underlying cir-
Digital Object Identifier 10.1109/JSSC.2016.2519389 cuits are not required to drive a 50 termination, opening
0018-9200 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 1. Receiver architecture.

up the opportunity to save power (via techniques like cur-


rent integration) over implementing the equivalent function
Fig. 2. Block diagram of the 60 Gb/s equalizer with CTLE, 2-tap FFE, and
in the transmitter. Consequently, in this work, we set out to 3-tap DFE.
demonstrate an efficient receive FFE implementation integrated
along with DFE and CTLE to cancel all three common forms
of ISI (precursorFFE, large nearby postcursorDFE, and
long-tailCTLE).
While the DFE architecture inherited from [2] served as
the basis for this design, as described above, substantial addi-
tional capabilities were added to enable robust operation over
a broader variety of channels. In order to remain within our
targets for overall power consumption, particular attention was
therefore paid to further improve the energy-efficiency of the
equalizer circuits, so that the additional capabilities would not
come at the cost of increased power. We therefore next describe
how current-integration techniques were integrated into the
overall equalizer design to meet this goal.

A. Current-Integrating Equalizers
Fig. 3. Schematics of CTLE + DMUX and following FFE latches. VCKC,
Current-integrating equalizers were originally proposed in VCKP, and VCKN are in the same phase.
[13] for a low-power DFE and have been applied in a number
of designs since then [13][16], [18]. An extensive analysis of
the current-integrating DFE/FFE [14] reveals that the current-
integration technique typically provides 3 power-saving
compared to resistive loaded stages. Hence, we have applied
this scheme throughout the entire equalizer chain except for
the final 1-tap dynamic-latch-based DFE, which generates NRZ
outputs for DFE feedback.
The receiver-equalizer architecture is shown in Fig. 2.
The source-degenerated current-integrating CTLEs transfer
function is tailored to mitigate long-tail ISI, and the circuit
performs demultiplexing so that the following stages can oper-
ate at half-rate with wider integration time windows. Since
the front-end CTLE integrates over an entire bit period, as
described in [15], it can introduce 3.9 dB of pattern-dependent
high-frequency loss (typically one pre- and/or one postcursor Fig. 4. FFE + DFE integrator schematics.
tap), but these taps are readily handled by the power-efficient
current-integrating FFE and DFE stages. each subbranch, after the integration phase, PMOS precharge
Fig. 3 shows the CTLE integrator structure. The input dif- devices reset the output nodes to VDD. Within the context
ferential pair (M0, M1) is capacitively degenerated for linear of the overall multistage equalizer that contains such multi-
equalization, while switches (M2M5) are inserted between ple current-integration stages, this resetting (return-to-zero or
the input pair and output nodes to support the demultiplexing RZ) behavior must be carefully managed since the output sig-
operation. Inspired by the cascoded sampling technique pro- nal of one stage would not naturally be available to the next
posed in [17], the M2M5 switches are alternatively turned stage during its integration time. Dynamic latches (L0L3) are
on (by VCKC and /VCKC) to steer the differential input. For therefore inserted after the CTLE to convert the RZ waveforms
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: DESIGN TECHNIQUES FOR A 60 Gb/s 173 mW WIRELINE RECEIVER 3

Fig. 5. Potential implementations for variable transconductance stages.

Fig. 7. Illustration of incomplete reset settling resulting in postcursor ISI.

Fig. 6. Variable cascode gate-voltage bias: (a) schematics and (b) linearity
comparison with a variable tail current-control scheme. In (b), Vin1dB is the
input voltage at which the gain drop by 1 dB from its target value, while Vinmax
is the maximum input-swing expected to be fed into the FFE stage.

to nonreturn-to-zero (NRZ) ones, extending the available sig-


nal window to be latched properly by the following stages.
These latches also generate UI-delayed signals for the two-tap
FFE operation. The basic structure of FFE latches is inherited
from the DFE feedback latches in [2], but with increased input-
Fig. 8. (a) Simplified current integrating DFE circuit diagram. (b) Integrator
transistor overdrive voltages to ensure that the latches retain
current consumption vs. target reset accuracy.
linear signal-transfer characteristics.
It is worth noting that the resetting nature of current-
integration stages leads to several design challenges further The transconductances of both the FFE- and DFE-integration
within the overall equalizerparticularly within the timing- branches must be programmable to realize variable equalizer
critical final DFE stage. After examining the proposed approach coefficients needed to support varying channel characteris-
for implementing FFE variable gain control, these issues and tics. Fig. 5 shows three potential methods to realize variable
our solutions to them will be described next. transconductance (i.e., tap weight). The conventional variable-
current-bias control [Fig. 5(a)] works well for the DFE-
integration paths [2] and is applied to the second and third DFE
tap branches (gm2 and gm3 in Fig. 4). However, this approach
B. Variable Cascode Gate-Voltage Bias for FFE (bias current control) is unfortunately infeasible for the FFE
After the CTLE and delay latches, two half-rate current- because of the negative impact of reducing tail current on the
integrating two-tap FFE and three-tap DFE stages receive linearity of the FFE stages differential input pair. Specifically,
the UI-spaced signals. These circuits were evolved in several as the gain of this stage is reduced (e.g., to compensate for chan-
respects from the architecture proposed in [2]. First, the power- nels with relative small precursor ISI), the overdrive of the input
hungry continuous-time summers for the second and third DFE pair will drop. The FFE tap input pair will therefore clip when
taps were replaced with current-integration summers, and an presented with relatively large signals (which will almost cer-
FFE tap was added to each integrator to support the FFE func- tainly occur since the signal has not yet been fully equalized
tionality without introducing additional summers, as shown in at this point in the chain), leading to nonlinear signal distortion
Fig. 4. that cannot be directly compensated by the equalizers.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE JOURNAL OF SOLID-STATE CIRCUITS

One potential method to realize variable gain while avoid-


ing this linearity issue is to modulate the device sizing as the
bias-current scales, as shown in Fig. 5(b). Specifically, mul-
tiple differential-pair unit cells are configured in parallel, and
the number of cells turned on determines the variable gain.
This scalable differential-pair approach preserves the oper-
ating conditions of the input devices regardless of the gain
setting, maintaining the linear characteristic. Alternatively, the
transconductance can also be controlled by modulating the
source-degeneration resistance [Fig. 5(c)]. Unfortunately, both
of these approaches suffer from both power efficiency due to
dynamic range and minimum device-size limitations [14]. To
illustrate the issue using the example, let us assume that the Fig. 9. One-tap DFE structure and the associated clock-delay technique.
tap requires 6 bits of gain-control range. This would require
the input pair (or degeneration resistor) to be broken up into
64 unit cells; considering that each unit cell cannot be made
smaller than a certain minimum size, the total device-width
(and routing parasitics) associated with as many units typically
substantially exceeds the width that would have been needed
just to meet the required signal swing (gain)/bandwidth. For
the variable-width input pair, this directly results in substantial
power-consumption overhead (since current and device width
are directly related), whereas for the variable-source degenera-
tion, the extra parasitics at the source node will force increased
bias current to ensure that the resistive degeneration remains
effective at the frequencies of interests. As a variant of Fig. 5(c),
the degeneration resistor array in Fig. 5(c) may be replaced with
one transistor with variable gate-voltage bias to modulate its
on-resistance. However, it should be noted that the transistor
on-resistance (and the overall transfer function) becomes highly
nonlinear at low-gate bias, causing nonlinear distortions, which
are not desirable for FFE operations as indicated in the previous
paragraph. Fig. 10. (a) DFE behavior without clock delay (dashed) and with clock delay
In order to simultaneously support a wide range of variable (solid). (b) Clock delay versus DFE output-swing.
gain without suffering in terms of signal linearity or a direct
increase in power consumption as the dynamic range in gain is
increased, we propose a control scheme based on adjusting the
gate bias of cascode devices embedded within the FFE stage
[Fig. 6(a)]. NMOS devices (M2, M3) are inserted between the
input differential pair and output nodes, and the gate voltage
of the cascode devices (which is set by a 6 bit voltage DAC)
controls the gain of the cascoded differential pair. This cascode
gate bias sets the drain voltage and output impedance of the
input differential pair devicesthus setting the overall gain,
but since the overdrive voltage of the input devices is largely
preserved, the signal linearity (in terms of diff. input voltage
to diff. output current) remains largely unaffected by the gain
setting. Fig. 6(b) shows a linearity comparison between the
variable current source and the variable cascode gate-voltage
bias schemes, highlighting the ability of the cascode control Fig. 11. Sampling rate versus minimum sampler input-swing.
scheme to maintain linearity over the entire gain-control range.

use variable current sources for coefficient control. As also


C. Three-Tap Hybrid DFE Design mentioned earlier, the FFE and DFE taps are summed in a single
As mentioned previously, since DFE taps do not need to stage to reduce the number of stages, and the main integration
maintain linearity in terms of their input- to output-signal char- branch is a clocked differential pair whose tail current is shut
acteristics (their inputs are sliced digital values in any case), off during the reset phase to reduce static power consumption
current-integration branches for the second and third DFE taps (Fig. 4).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: DESIGN TECHNIQUES FOR A 60 Gb/s 173 mW WIRELINE RECEIVER 5

Fig. 12. Interleaved deserializing slicers.

One of the key implementation issues to be addressed


regarding the current-integrating DFE is resetting the integra-
tor. Among various resetting schemes in [13][16], we chose
precharging PMOS loads to minimize the number of devices
connected to the output nodes. Despite the strong gate-drive of
PMOS loads, in the 65 nm process, it was very challenging to
achieve precise reset settling within the 16 ps of reset time
available for the half-rate 60 Gb/s equalizer. Therefore, in this
work, we used relatively small reset switches that settle to only
Fig. 13. Clock-path architecture.
85% of the final value (i.e., 2 of settling) and relied on
the DFE to compensate for the resulting circuit-induced ISI.
the canceled ISI and the cursor, which is originally denoted
Specifically, this incomplete reset translates into second and
by k Ncoef in [14], is the drain-to-gate-capacitance ratio,
third tap ISI1 (Fig. 7), which are readily handled by overdriving
= T,tap /T,in is the ratio between the transit frequency of
the current-integrating DFE coefficients. This greatly relaxes
the tap transistors and the input transistor, and kreset is the fac-
the reset-bandwidth constraints and regulates the power con-
tor by which the capacitance of the summing node needs to be
sumption by reducing the device sizes of the 30 GHz reset
increased to include reset capability. Instead of deriving kreset
network and their parasitic capacitances.
in terms of digital fanout-of-4 delay, as in [14], the effects on
To further quantify the efficacy of this incomplete reset
the incomplete reset are captured by an equivalent RC network
approach, next, we will analytically examine the trade-offs
composed of the PMOS reset device (Mp) and load capaci-
between reset device size and the required current of the
tances at the output. We can then express the time constant of
current-integration stage. In [14], the power consumption of a
the reset network as
current-integrating DFE (Pint ) shown in Fig. 8(a) at data rate  
fS , gain G, and load capacitance CL was shown to be 1 + kreset r
reset = p = (2)
kreset fs N,reset
Inom 2(1 + kreset )
Pint   . (1) where p = CDp rop is the time constant of PMOS resetting
Gfs
1 T ,in 1 + ktap0 Vvc 2(1 + kreset ) devices without additional loads, r is the fraction of each UI
i

spent for resetting, and N,reset is the number of time constants


In the above equation, Inom = CL Vi G fS is the of exponential RC settling for the desired accuracy. kreset is
current-consumption of a class-A amplifier without self- therefore given by
loading, T,in is the transit frequency of the input transistor
p
(Min), Vi is defined by 2ID,in /gm,in of the input transistor, kreset = r . (3)
vc is the input cursor-amplitude, ktap0 is the ratio between fs N,reset p

1 The finite resetting also generates ISI in later post taps, but these are It is also important to note that the tap strength has to be
assumed to be small due to the accumulated suppression from multiple resets. increased by eN,reset to cancel the translated ISI from finite
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 14. (a) Resonant clock buffer with clock distribution network. (b) 30 GHz clock divider.

resetting, which limits the gain from this reset bandwidth-


relaxation technique. By combining (1) and (3), the normalized
summer-current requirement with different target accuracies
can be derived, as shown in Fig. 8(b). From the plot, 85%
settling accuracy is chosen for optimal power dissipation.
Another design issue regarding the PMOS precharging reset
is that the output nodes experience significant common-mode
(CM) fluctuations during the integration and reset phases. This
causes several problems; first, during the integration phase, the
substantial drop in CM can reduce the output resistance of the
input devices (M0, M1 in Fig. 4) of that stage, degrading gain
and linearity [14]. In addition, especially for multistage equal-
izer implementations, excessive CM fluctuations can impact
the operation of the following stages. These effects led to
the need for CM control schemes to reduce these fluctua-
tions. In this design, a resistor DAC (RRST ) along with a Fig. 15. Die photo (a) 56 m 56 m inductor for oscillator. (b) 53 m
bypass capacitor is inserted between VDD and the reset volt- 53 m inductor for clock driver.
age node to create a virtual supply that regulates the reset
voltage level, and pull-up current sources [18] connected to
the output nodes maintain the CM level during integration. III. C OMPLETING T HE F RONTEND
The combination of these two techniques regulates the output A complete receiver frontend must support CMOS level to be
CM fluctuation and ensures proper operation of the following compatible with the final digital backend processing/consumer
stage. of the data. In this particular architecture, the final DFE stage
As in the original architecture from [2], the stringent-timing already offers a relatively large output swing, since at least
latency requirements in the first postcursor tap DFE feedback 250 mV swing is required to force the DFE feedback tap cur-
path justify the use of a dedicated summer for the first tap rents to be fully steered [2]. Thus, further slicing of the DFE
DFE. After the FFE + DFE integration summer, a separate output to resolve the received symbol (data) might be easily
dynamic-latch-based stage [2] provides the one-tap postcursor done with a sampler stage with moderate gain (and likely with-
cancelation, as shown in Fig. 9. Noting that the integrator output out requiring offset-cancelation). However, in addition to the
is a signal that ramps up from differential zero, it is desirable to data itself, the frontend must provide additional digitized sig-
finetune the dynamic-latch transparent window-timing to best nals; in particular, adaptive equalization requires digital signals
match the peak overall output of the integrator. This can be representing signs of the differences between the signal ampli-
achieved by delaying the clock of the dynamic latch relative tude and the estimated cursor-amplitude (error) [12], [14], [19],
to that of the integrator. Fig. 10(a) shows simulated first tap and the gain and bandwidth of the data-/error-sampling paths
DFE output waveforms with and without the latch-clock delay. should be matched to avoid incorrect equalizer settings for the
In this figure, the DFE output swing increases by 40% when actual signal path.
the clock is delayed by 3 ps. Since this delay is only 10% of Since the adaptive loop will intentionally be adjusting the
the clock period, an RC passive network is chosen as the most equalizers (and its own) settings to minimize the error ampli-
efficient means of generating this delay, which can be readily tude, the error-slicing path inherently has to sample very small
utilized without suffering significant clock-amplitude attenua- signals. Thus, significant additional gain is required to support
tion. One potential design issue involving the RC delay is the the error-sampling operation, no matter how wide of an output-
variability of the passive elements. Fig. 10(b) shows the output swing range is secured from the dynamic DFE-latch stage.
amplitude with various delay settings (varying the resistance Therefore, clocked regenerative samplers [14], [19][23] with
value): up to 20% delay variations result in only 5% variation offset-cancelation capability are essential to meet this high-gain
in output amplitude. requirement.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: DESIGN TECHNIQUES FOR A 60 Gb/s 173 mW WIRELINE RECEIVER 7

Fig. 16. Measurement setup and measured waveforms. (All clock sources are synchronized).

For this design, a StrongArm latch with offset calibra-


tion in the preamplifier (preamp) stage [14] was selected
due to its narrow-aperture window and CMOS output levels.
Furthermore, instead of pushing individual StrongArm latches
to operate at the extremely high frequency, multiple latches
are interleaved to achieve the overall 30 GHz sampling rate.
This interleaving technique is possible since these samplers
are outside of the DFE loop and, hence, do not have tight
latency requirements. The relationship between the minimum-
detectable signal level and the sampling frequency (Fig. 11)
reveals that 4x interleaving provides sufficient sensitivity with-
out adding too many samplers. Individual samplers are clocked
by one of the 7.5 GHz quad-phased CMOS-level clocks, pro-
vided by clock dividers and a phase interpolator.
As hinted earlier, offset-cancelation is required within the Fig. 17. LC oscillator characterization curve.
deserializing samplers to ensure sufficient sensitivity as well as
to support an adaptive threshold for error-slicing. While adding
the offset-canceling branches to preamp output nodes works auxiliary differential pair for 3x injection-locking for bathtub
for the offset-cancelation of individual samplers, it is not effec- curve measurement.
tive for the error-sampling purpose, which requires a large-scale The differential 30 GHz clocks need to be buffered before
decision-threshold shift that poses a linearity requirement and driving the clock ports of the equalizer frontend, which can
substantial sideload to the preamp stage. In order to inject the consume substantial power at these high frequencies [3]. A res-
cursor-amplitude offset intentionally for the error-sampling, AC onant clock buffer [Fig. 14(a)] is, therefore, utilized to provide
coupling capacitors are inserted between the DFE output and clock signals to the equalizer frontend through an AC-coupled
preamp input ports, and the DC-operating points of the coupler clock-distribution network with proper bias settings, as pro-
outputs are controlled to set the decision thresholds. Fig. 12 posed in [2]. The clock buffer is a differential stage with a
shows the resulting implementation of the sampling array with differential on-chip balun that resonates with its capacitive load
4x interleaving, offset calibration, and AC coupling for the error from the rest of the network. A tunable capacitor array is
sampling. inserted to control the resonance frequency, as in the oscillator
The last high-frequency circuits enabling a complete receiver circuitry. Although these LC oscillator and clock buffer require
frontend are clock generation and distribution circuits (Fig. 13). inductors (baluns), the area occupied by these inductors is mit-
An LC oscillator is utilized to generate a differential pair of igated by the high operating frequency. Specifically, in this
30 GHz clocks, with frequency control knobs for band selec- design, the inductor for LC oscillator and clock buffer occupies
tion, proportional and integral control to support CDR, and an only 56 m 56 m and 53 m 53 m, respectively.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 19. Bathtub-curve after equalization.


Fig. 18. (a) Eye diagram at the channel output. (b) Estimated pulse response.

TABLE I
Subrate clocks have to be generated as well for the inter- P ERFORMANCE S UMMARY
leaved sampler and back-end functions, such as deserialization,
adaptation, and CDR. The same dynamic-latch design used in
the DFE feedback paths is reused to implement the 30 GHz
clock divider [Fig. 14(b)]. Due to placement considerations
and improved distribution of the capacitive-loading between
the oscillator and the output buffer, the dividers input clock is
driven directly by the oscillator rather than by the clock buffer.
Following the 30 GHz divider, a phase interpolator is inserted
to enable optimal placement of the deserializing clock phases.

IV. M EASUREMENT R ESULTS


The receiver front-end test-chip was designed and fabricated
in a 65 nm CMOS process (Fig. 15). The receiver frontend
occupies 0.16 mm2 except the pad, ESD, and t-coil area, and
the equalizer core-circuit occupies 0.012 mm2 ; both the spe-
cific device sizes and the layout of all equalizer circuits (CTLE, Includes equalizer, 4:16 DES, clock distribution.
FFE, and DFE) in Fig. 2 were generated via the utilization of 
Includes output buffer.
the analog-circuit-generator framework described in [24].
LC oscillator + divider + PI.
The measurement setup used to characterize the test-chip is
shown in Fig. 16. The fabricated receiver chip was directly
soldered to a PCB made of Nelco 4000-13 material via (b) shows the 60 Gb/s pulse response and eye diagram mea-
flip-chip bumps to minimize parasitic loadings from bonding sured at the input of the receiver front-end evaluation board.
and packaging structures. Since there was no 60 Gb/s signal The total amount of ISI in the pulse response is calculated to be
source available for the measurement, we reused the pattern- 1.54 times the cursor amplitude. With the oscillator injection-
generator/channel-emulator circuit from the chip described in locking enabled, a 10 GHz clock generator (Keysignt E8267D)
[2]. This band-limited transmitter with built-in programmable synchronized with a 30 GHz transmitter clock source (Keysight
ISI generation provides differential PRBS7 signals with emu- E8257D) provides the injection clock with different phases for
lated channel profiles. bathtub characterization. A Keysight 86130A BERT measures
As the first step in receiver front-end testing, the performance the BER of the reconstructed PRBS7 pattern from 1/128x sub-
of the clock generator was characterized to determine whether samplers, clocked by an external source (Keysight E8267D).
the oscillator supports the desired operating frequencies. Fig. 17 Under these conditions, the receiver front-end recovers the
shows the oscillation frequency with various band-select set- transmitted PRBS7 pattern and operates error-free over 1012
tings. The LC oscillator covers a 2530 GHz frequency range, bits in the center region (Fig. 19). The total power consumption
and the receiver front-end operation is verified at the maximum of the design under these evaluation conditions is 173 mW,
frequency setting, which corresponds to 60 Gb/s data rate. 138 mW from 1.2 V and 35 mW from 1.0 V supplies.
Following the clock-path characterization, a BER bathtub- Table I compares the design with prior 5680 Gb/s equalizer
curve measurement was performed to test the equalization designs. While incorporating the integrated, the highest level
capability of the receiver front-end data path. Fig. 18(a) and of functionality/capability as compared to previous works, it
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: DESIGN TECHNIQUES FOR A 60 Gb/s 173 mW WIRELINE RECEIVER 9

achieves excellent power consumption in a mature 65 nm tech- [7] J. E. Proesel and T. O. Dickson, A 20-Gb/s, 0.66-pJ/bit serial receiver
nology. Specifically, despite the addition of CTLE and FFE, with 2-stage continuous-time linear equalizer and 1-tap decision feedback
equalizer in 45 nm SOI CMOS, in Symp. VLSI Circuits (VLSIC11) Dig.,
the equalizer core achieves nearly the same power consumption 2011, pp. 206207.
(48 mW) as the previous three-tap DFE design (46 mW) [2]. [8] P. Upadhyaya et al., 3.3 A 0.5-to-32.75 Gb/s flexible-reach wireline
The receiver frontend further supports 30 GHz clock generation transceiver in 20 nm CMOS, in IEEE Int. Solid-State Circuits Conf.
(ISSCC13) Dig. Tech. Papers, 2013, pp. 4243.
and distribution as well as the high-speed hardware necessary to [9] S. Parikh et al., A 32 Gb/s wireline receiver with a low-frequency equal-
support adaptive equalization and a baud-rate CDR, and despite izer, CTLE and 2-tap DFE in 28 nm CMOS, in IEEE Int. Solid-State
being implemented in a substantially more mature process tech- Circuits Conf. Dig. Tech. Papers (ISSCC13), 2013, pp. 2829.
[10] A. A. Hafez, M.-S. Chen, and C.-K. K. Yang, A 32-to-48Gb/s serializing
nology (65 nm vs. 20 nm), does so while dissipating slightly transmitter using multiphase sampling in 65 nm CMOS, in IEEE Int.
lower power and operating at slightly higher data rate than [3]. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC13), 2013, pp. 38
39.
[11] M.-S. Chen and C.-K. K. Yang, A 5064 Gb/s serializing transmitter
with a 4-Tap, LC-ladder-filter-based FFE in 65 nm CMOS technol-
V. C ONCLUSION ogy, IEEE J. Solid-State Circuits, vol. 50, no. 8, pp. 19031916, Aug.
2015.
This paper reports a 60 Gb/s receiver frontend in 65 nm [12] J. E. Jaussi et al., 8-Gb/s source-synchronous I/O link with adaptive
CMOS technology. The receiver incorporates CTLE, FFE, receiver equalization, offset cancellation, and clock de-skew, IEEE J.
Solid-State Circuits, vol. 40, no. 1, pp. 8088, Jan. 2005.
and DFE equalizers, output slicing, and clocking blocks to [13] M. Park, J. Bulzacchelli, M. Beakes, and D. Friedman, A 7 Gb/s 9.3 mW
demonstrate the possibility of serial-link receiver operation 2-tap current-integrating DFE receiver, in IEEE Int. Solid-State Circuits
operating near the frequency limits of 65 nm CMOS technol- Conf. Dig. Tech. Papers (ISSCC07), 2007, pp. 230599.
[14] C. Thakkar, N. Narevsky, C. D. Hull, and E. Alon, Design techniques
ogy. The energy-efficient equalization functionality is achieved for a mixed-signal I/Q 32-coefficient Rx-feedforward equalizer, 100-
by a current-integration technique combined by cascode gate- coefficient decision feedback equalizer in an 8 Gb/s 60 GHz 65 nm
voltage gain control for efficient RX FFE, passive-clock delay LP CMOS receiver, IEEE J. Solid-State Circuits, vol. 49, no. 11,
pp. 25882607, Nov. 2014.
between the current-integrating and dynamic-latch stages, and [15] T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, A 12-Gb/s 11-
optimized reset device settling time/sizing. In addition to imple- mW half-rate sampled 5-tap decision feedback equalizer with current-
menting the 60 Gb/s equalizer, we focused on developing all integrating summers in 45-nm SOI CMOS technology, IEEE J. Solid-
State Circuits, vol. 44, no. 4, pp. 12981305, Apr. 2009.
high-frequency data and clock paths for the complete receiver [16] B. Kim, Y. Liu, T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, A
frontend operation, including interleaved deserializing slicers, 10-Gb/s compact low-power serial I/O with DFE-IIR equalization in 65-
an oscillator, and clock dividers. The design achieves 60 Gb/s nm CMOS, IEEE J. Solid-State Circuits, vol. 44, no. 12, pp. 35263538,
Dec. 2009.
operation with > 0.2 UI-timing margin at 1e-9 BER (and error- [17] Y. Duan and E. Alon, A 12.8 GS/s time-interleaved ADC with 25 GHz
free operation over 1e12 bits at the center of the eye), while effective resolution bandwidth and 4.6 ENOB, IEEE J. Solid-State
consuming 173 mW from 1.2 V and 1.0 V supplies in 65 nm Circuits, vol. 49, no. 8, pp. 17251738, Sep. 2014.
[18] R. Bai, S. Palermo, and P. Y. Chiang, A 0.25 pJ/b 0.7V 16 Gb/s 3-tap
CMOS. decision-feedback equalizer in 65 nm CMOS, in IEEE Int. Solid-State
Circuits Conf. Dig. Tech. Papers (ISSCC14), 2014, pp. 4647.
[19] V. Stojanovic et al., Autonomous dual-mode (PAM2/4) serial link
transceiver with adaptive equalization and data recovery, IEEE J. Solid-
ACKNOWLEDGMENT State Circuits, vol. 40, no. 4, pp. 10121026, Apr. 2005.
[20] M.-J. E. Lee, W. J. Dally, and P. Chiang, Low-power area-efficient high-
The authors would like to thank Systems on Nanoscale speed I/O circuit techniques, IEEE J. Solid-State Circuits, vol. 35, no. 11,
Information fabriCs (SONIC), BWRC, Berkeley Design pp. 15911599, Nov. 2000.
Automation, Integrand EMX, Lorentz PeakView, the TSMC [21] L. Kong, Y. Lu, and E. Alon, A multi-GHz area-efficient comparator
with dynamic offset cancellation, in Proc. IEEE Custom Integr. Circuits
University Shuttle Program, and B. Casper of Intel, K. Chang Conf. (CICC11), 2011, pp. 14.
of Xilinx, P. Y. Chiang of OSU, C.-K. K. Yang of UCLA, P. K. [22] M. Jeeradit et al., Characterizing sampling aperture of clocked compara-
Hanumolu of UIUC, and V. Stojanovic of UC Berkeley. tors, in IEEE Symp. VLSI Circuits Dig., 2008, pp. 6869.
[23] J. Kim, B. S. Leibowitz, R. Jihong, and C. J. Madden, Simulation and
analysis of random decision errors in clocked comparators, IEEE Trans.
Circuits Syst. I Reg. Papers, vol. 56, no. 8, pp. 18441857, Aug. 2009.
R EFERENCES [24] J. Crossley et al., BAG: A designer-oriented integrated framework for
the development of AMS circuit generators, in Proc. IEEE/ACM Int.
[1] S. Narendra, L. Fujino, and K. Smith, Through the looking glass - Conf. Comput.-Aided Des. (ICCAD), 2013, pp. 7481.
the 2015 edition: Trends in solid-state circuits from ISSCC, IEEE
Solid-State Circuits Mag., vol. 7, no. 1, pp. 1424, Feb. 2015.
[2] Y. Lu and E. Alon, Design techniques for a 66 Gb/s 46 mW 3-tap deci-
sion feedback equalizer in 65 nm CMOS, IEEE J. Solid-State Circuits,
vol. 48, no. 12, pp. 32433257, Dec. 2013. Jaeduk Han (S15) received the B.S. and M.S.
[3] T. Shibasaki et al., A 56-Gb/s receiver front-end with a CTLE and 1-tap degrees in electrical engineering from Seoul National
DFE in 20-nm CMOS, in Symp. VLSI Circuits Dig., 2014, pp. 12. University, Seoul, Korea, in 2007, and 2009, respec-
[4] A. Awny, L. Moeller, J. Junio, J. C. Scheytt, and A. Thiede, Design and tively. He is currently working toward the Ph.D.
measurement techniques for an 80 Gb/s 1-tap decision feedback equal- degree in electrical engineering at the University of
izer, IEEE J. Solid-State Circuits, vol. 49, no. 2, pp. 452470, Feb. California at Berkeley.
2014. He was a Circuit Design Engineer at TLI from
[5] J. Han, Y. Lu, N. Sutardja, K. Jung, and E. Alon, A 60 Gb/s receiver 2009 to 2012, and has held engineering intern posi-
frontend in 65 nm CMOS technology, in Symp. VLSI Circuits Dig., 2015, tions at Altera, Intel, and Xilinx, in 2012, 2014,
pp. 230231. and 2015, respectively, where he worked on high-
[6] T. Toifl et al., A 2.6 mW/Gbps 12.5 Gbps RX with 8-tap switched capac- speed wireline communication circuits and power
itor DFE in 32 nm CMOS, IEEE J. Solid-State Circuits, vol. 47, no. 4, management circuits. His research interests include high-speed wireline com-
pp. 897910, Apr. 2012. munication circuit design and analog circuit design automation.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Yue Lu (S08M14) received the B.E. degree in Elad Alon (M06SM12) received the B.S., M.S.,
electronic science and technology from Shanghai and Ph.D. degrees in electrical engineering from
Jiao Tong University, Shanghai, China, in 2008, and Stanford University, Stanford, CA, USA, in 2001,
the Ph.D. degree in electrical engineering from the 2002, and 2006, respectively.
University of California, Berkeley, CA, USA, in In January 2007, he joined the University of
2014. He also studied at Carnegie Mellon University, California, Berkeley, CA, USA, where he is currently
Pittsburgh, PA, USA, in 2007, as an Undergraduate an Associate Professor of Electrical Engineering
Exchange Student. and Computer Sciences as well as a Codirector of
He is currently with Qualcomm Atheros Inc., San the Berkeley Wireless Research Center (BWRC),
Jose, CA, USA. Berkeley, CA, USA. He has held consulting, vis-
Dr. Lu was the recipient of the 20132014 IEEE iting, or advisory positions at Lion Semiconductor,
Solid-State Circuits Society Predoctoral Achievement Award, the 2013 James Wilocity, Cadence, Xilinx, Oracle, Intel, AMD, Rambus, Hewlett Packard, and
H. Eaton Memorial Scholarship from UC Berkeley, the 2013 ADI Outstanding IBM Research, where he worked on digital, analog, and mixed-signal integrated
Student Designer Award, and the 2012 Custom Integrated Circuits Conference circuits for computing, test and measurement, and high-speed communications.
Best Student Paper Award. His research interests include energy-efficient integrated systems, such as the
circuit, device, communications, and optimization techniques used to design
them.
Nicholas Sutardja (S12) received the B.S. degree Dr. Alon was the recipient of the IBM Faculty Award in 2008, the 2009
in electrical engineering and computer science and Hellman Family Faculty Fund Award as well as the 2010 UC Berkeley
the B.A. degree in applied mathematics from the Electrical Engineering Outstanding Teaching Award, and has co-authored
University of California, Berkeley, CA, USA, in papers that received the 2010 ISSCC Jack Raper Award for Outstanding
2012. He is currently working toward the Ph.D. Technology Directions Paper, the 2011 Symposium on VLSI Circuits Best
degree in electrical engineering at the University of Student Paper Award, and the 2012 and 2013 Custom Integrated Circuits
California, Berkeley. Conference Best Student Paper Award.
Additionally, he worked on high-speed wireline
receivers at Altera and sensors for pulse oximetry
at ADI in the summer of 2011 and 2014, respec-
tively. His research interests include mixed signal
ICs, energy efficient high-speed link systems, analog design methodologies,
biomedical devices, and sensors.

Kwangmo Jung (S09) was born in 1980, in


Seoul, Korea. He received the B.S. and M.Sc.
degrees in electrical engineering from Seoul National
University, Seoul, Korea, in 2003 and 2005, respec-
tively. He is currently working toward the Ph.D.
degree in electrical engineering at University of
California, Berkeley, CA, USA. His research interests
include high-speed wireline I/O transceivers and the
design automation of analog/mixed-signal circuits.
Dr. Jung was the recipient of Analog Devices
Outstanding Student Designer Award in 2012.

You might also like