You are on page 1of 14

ARTICLE IN PRESS

Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457


www.elsevier.com/locate/nima

A 96-channel FPGA-based Time-to-Digital Converter


(TDC) and fast trigger processor module with multi-hit
capability and pipeline
Mircea Bogdana, Henry Frischa, Mary Heintza, Alexander Paramonova,,
Harold Sandersa, Steve Chappab, Robert DeMaatb, Rod Kleinb, Ting Miaob,
Peter Wilsonb, Thomas J. Phillipsc
a

Enrico Fermi Institute, University of Chicago, USA


b
Fermilab National Accelerator Laboratory, USA
c
Duke University, USA

Received 2 July 2005; received in revised form 15 August 2005; accepted 18 August 2005
Available online 7 September 2005

Abstract
We describe an eld-programmable gate arrays based (FPGA), 96-channel, Time-to-Digital converter (TDC) and
trigger logic board intended for use with the Central Outer Tracker (COT) [T. Affolder et al., Nucl. Instr. and Meth.
A 526 (2004) 249] in the CDF Experiment [The CDF-II detector is described in the CDF Technical Design Report
(TDR), FERMILAB-Pub-96/390-E. The TDC described here is intended as a further upgrade beyond that described in
the TDR] at the Fermilab Tevatron. The COT system is digitized and read out by 315 TDC cards, each serving 96 wires
of the chamber. The TDC is physically congured as a 9U VME card. The functionality is almost entirely programmed
in rmware in two Altera Stratix FPGAs. The special capabilities of this device are the availability of 840 MHz LVDS
inputs, multiple phase-locked clock modules, and abundant memory. The TDC system operates with an input
resolution of 1.2 ns, a minimum input pulse width of 4.8 ns and a minimum separation of 4.8 ns between pulses. Each
input can accept up to 7 hits per collision. The time-to-digital conversion is done by rst sampling each of the 96 inputs
in 1.2-ns bins and lling a circular memory; the memory addresses of logical transitions (edges) in the input data are
then translated into the time of arrival and width of the COT pulses. Memory pipelines with a depth of 5:5 ms allow
deadtime-less operation in the rst-level trigger; the data are multiple-buffered to diminish deadtime in the second-level
trigger. The complete process of edge-detection and lling of buffers for readout takes 12 ms. The TDC VME interface
allows a 64-bit Chain Block Transfer of multiple boards in a crate with transfer-rates up to 47 Mbytes/s. The TDC
module also produces prompt trigger data every Tevatron crossing via a deadtimeless fast logic path that can be easily
reprogrammed. The trigger bits are clocked onto the P3 VME backplane connector with a 22-ns clock for transmission
to the trigger. The full TDC design and multi-card test results are described. There is no measurable cross-talk between
Corresponding author. Tel.: +1 773 7027479; fax: +1 773 8345959.

E-mail address: paramon@hep.uchicago.edu (A. Paramonov).


0168-9002/$ - see front matter r 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.nima.2005.08.071

ARTICLE IN PRESS
M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

445

channels; linearity is limited by the least-count time bin. The physical simplicity ensures low-maintenance; the
functionality being in rmware allows reprogramming for other applications.
r 2005 Elsevier B.V. All rights reserved.
PACS: 29.40.Gx; 07.50.E; 29.40.+r
Keywords: TDC; FPGA; Pipelined; Multi-hit; Fast trigger processer

1. Introduction
The Collider Detector at Fermilab (CDF), is a
large (5000-ton) detector of particles produced in
protonantiproton collisions at 1.96 TeV at the
Fermilab Tevatron [2]. The detector consists of a
solenoidal magnetic spectrometer surrounded by
systems of segmented calorimeters and muon
chambers. Inside the solenoid, precision tracking
systems measure the trajectories of particles; the
particle momenta are measured from the curvature
in the magnetic eld and the energy deposited in
the calorimeters. The tracking systems consist of a
silicon-strip system with 4750,000 channels
around the beam-pipe, followed by the Central
Outer Tracker (COT), a large cylindrical drift
chamber with 30,240 sense wires arranged in 96
layers divided into 8 superlayers of 12 wires
each [1]. Four of the layers have the wires parallel
to the beam axis; the remaining four are tilted by
2 to provide small-angle stereo for 3D reconstruction of tracks. The maximum drift time of the
COT is 200 ns; the maximum drift length is
0.88 cm.
During the present Run II, which started in
2001, the peak luminosity of the Tevatron has
grown to over 1032 cm2 s1 , a factor of more than
ve higher than in Run I. The Tevatron operates
with a time between beam crossings of 396 ns, with
the result that the occupancy (hits/channel) in the
COT increases with luminosity as the average
number of protonantiproton collisions per bunch
crossing is now substantially greater than one.
A broad range of efforts are underway to upgrade
the readout bandwidth to allow operation at
luminosities up to 3  1032 cm2 s1 .
The COT is used to provide a precise measurement in the magnetic spectrometer of the trajectories of the particles produced in the high-energy

protonantiproton collisions. These measurements


are made by recording the time of arrival of
electrical charge at sense wires in the COT.
A particle traversing the COT perpendicular to
the beams traverses 96 layers of sense wires; there
are a total of 30,240 sense wires in the COT. The
measurement of the time of arrival at each wire is
made with a multi-channel Time-to-Digital converter (TDC) [3].
The conversion of time to a digital signal is a
well-developed and sophisticated eld [47]. In this
note we describe the design of a new 96-channel
TDC and trigger logic module designed for the
COT implemented in two 48-channel eld-programmable gate arrays (FPGAs) [8]. A third,
smaller, FPGA serves as the VME interface. The
other chips on the board are limited to delay lines,
buffers on the input and output signals to the
connectors, and DC-to-DC converters to supply
voltages not available in the existing VME [9,10]
crates.
Using FPGAs for the combined TDC and
trigger logic functionalities has the benets of
negligible cross-talk, linearity limited only by the
least count, integrated exible signal processing for
trigger logic, and high-reliability due to very-low
chip count. In addition, because the implementation is in rmware, the chip can easily be
recongured for enhanced capability; the ability
to recongure also shortens the design cycle as the
board does not need to be rebuilt (in general)
during prototyping.
Thirty working prototypes have been built and
tested. A comprehensive suite of test routines,
including some that exploit the capabilities of large
FPGAs to implement sources of test data, has
been implemented and documented. We present
results on performance and readout bandwidth.
Fig. 1 shows one of the 30 preproduction boards.

ARTICLE IN PRESS
446

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

Fig. 1. The CDF-II TDC board. The two large chips with silver
heat sinks are the TDC FPGAs; the large black objects are DCto-DC converters (the layout allows addition DC-to-DC
converters, not needed and hence not stuffed on this board).
The FPGA for the VME interface can be seen in the upper lefthand corner. Connector headers and dip switches near the
center of the board allow debugging with a logic analyzer.

2. TDC Specications
A summary of the TDC physical and operational characteristics is given in Table 1. The
schematics of the board are available at [11].
Details of how the TDC operates are given in the
text below.

3. Principle of operation of the TDC


Two functions are implemented in the rmware
of the FPGAs: a conventional TDC, digitizing the
time of arrival of signals, and the processing of
COT data for the trigger system. In this section we
describe the TDC implementation.
Secondary particles from antiprotonproton
(pp) collisions traverse the Tevatron beam pipe,
the silicon-strip vertex detector, and then the COT
drift chamber volume. The charged particles ionize
the gas in the drift chamber volume; the tracks are
measured from the time of arrival of the ionization
on the sense wires of the COT [1]. These electrical
pulses (hits), are amplied and shaped by the
Amplier Shaper Discriminator (ASDQ) cards [1]
directly on the end-plates of the COT, and

transmitted to Repeater cards that drive the


cables to the VME crates on the outside of
the magnet yoke that contain the TDC boards.
The TDC is used to digitize the time of arrival and,
as a measure of pulse height, the width of the
pulses from the Repeater cards.
The time-to-digital conversion is implemented
with two Altera Stratix FPGAs, each handling 48
sense wires. This device has an LVDS differential
I/O interface that consists of one dedicated serializer/deserializer circuit [13] for each of the 48
differential I/O pairs. Serial data are received
along with a low-frequency clock. An internal
Phase-Locked Loop (PLL) multiplies the incoming
clock by a factor of 1 to 10. Each input signal is
sampled at the resulting high-frequency clock
rate, converting it to a (1-bit) serial stream, which
is then shifted serially through a shift register.
The shift register is read out as a parallel word
at the low-frequency clock rate, thus converting the serial data stream into a parallel data
stream that contains the input data sampled at the
higher clock rate. In this application the lowfrequency clock is internally generated with a 12 ns
period and the multiplier factor is set to 10, for a
resulting 1.2 ns sampling rate of the incoming
LVDS signal and a 10-bit wide parallel output
data stream.
Fig. 2 illustrates the serial-to-parallel conversion,
as seen in the Altera Quartus II [14] simulation
window. The input pulse is converted into a 10-bit
parallel data stream, clocked out as successive
words at a 12 ns period. The leading edge of a hit
in the tracking chamber is determined by inspecting
this stream, two words at a time, and counting the
number of 0 bits before the rst 1 bit of a
string of at least four 1 bits in 2 consecutive
words at a time. The width of a hit is calculated by
counting the number of successive bits (either 0
or 1) until the start of a string of at least four
consecutive 0 bits occurs.

4. TDC BoardBlock diagram


The physical layout and the data ow on the
board are presented in Fig. 3. We step through
each element in turn below.

ARTICLE IN PRESS
M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

447

Table 1
The physical and operational characteristics of the CDF-II TDC
Characteristic

Values

Comment

TDC digitization performance


Channels
Time bin size
Hits/channel
Min time between hits
Full scale range
Min width/hit
Max width/hit
Differential non-linearity
Accuracy (jitter)
Channel-to-channel skew
L1 pipe-line size
Test data RAM size
L2 buffer length
Processing time
Min interval between L2As

96
1.2 ns
p7 hits
4.8 ns
304.8 ns
4.8 ns
304.8 ns
o200 ps
p1 count
o100 ps
512 words=6:144 ms
512 words=6:144 ms
p64 words/768 ns
12 ms after L2A
12 ms

48/FPGA

Prompt (Trigger) Outputs


XFT trigger bits/wire
XFT time-window
# of XFT time-windows
Trigger latency
Trigger output freq.
Trigger output

6
6 ns
11/wire
80 ns after BC
32 bits/ 22 ns
43 pins

From 11 time windows


Min 1.2 ns; Max 12 ns
Mapped into 6 trigger bits
First word out

Readout characteristics
VME interface
VME readout modes
CBLT64 transfer rate
Test modes

VME64
A32/D32, A32/D64
47 MBytes/s
Data generator

Implemented in FPGA
D64 in CBLT mode only
Burst speed
Internal 8192 wordmemory

Physical characteristics
Physical format
Power requirements (V/A)
Input connectors
Input levels
Front panel LEDs
Trigger output connector

9U VME
5 V=15 A; 5 V=2 A
68-pin
LVDS
1 Triple LED/FPGA
VME P3

ANSI VIPA [9]

 The Front Panel (on the left in Fig. 3) receives


96 differential inputs, arranged in four 24channel connectors, that receive pulses from
the amplier/shaper circuits of the COT. The
signals are rst applied to a receiver block
that converts them from a CDF-specic
quasi-LVDS signal [12] into standard LVDS
and passes 48 of them directly to each of the
two FPGA TDC chips, which have identical
designs.

Congurable via VME


254  1.2 ns

See Section 8.1


See Section 8.1
480-bit words
480-bit words
480-bit words; Congurable
Includes readout packing
L2A is Level-2 Accept

TTL, on VME P3

Mini-D Ribbon
CDF uses quasi-LVDS [12]
Congurable in rmware

 Each of the two TDC FPGAs (TDC Chips)


does the time-to-digital conversion for 48 wires
to generate the Hit-Count and Hit-Data results.
These are stored in internal VME Readout
buffers implemented on the chip. The TDC
Chips also generate prompt data for the Level-1
track trigger processor (XFT).
 The VME interface block is implemented with
an Altera Apex FPGA [15]. This block coordinates VME access to the TDC Chips for

ARTICLE IN PRESS
448

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

Leading edge

Trailing edge

0ns
Input pulse
12ns clock

bit 9
bit 8
bit 7
OUTPUT

bit 6
bit 5
bit 4
bit 3
bit 2
bit 1
bit 0
Word 0

Word 1 Word 2

Word 3

Word 4

Word 5

Word 6

0000000111 1111111111 1111000000

Fig. 2. An example serial-to-parallel conversion of one of the 96 input channels into a 10-bit-wide parallel data stream as seen in the
Quartus II [14] simulation window. The top two traces are one input line from the tracking chamber and the locally-generated 12-ns
clock. The next 10 traces are the 10 bits of parallel data from the serial-to-parallel conversion, output as one word every 12 ns. The
parallel data, shown at the bottom of the gure, are then examined two words at a time for transitions that signify a leading or trailing
edge of a hit in the COT.

regular and Chain Block Transfers (CBLT) [9]


in both 32- and 64-bit modes. The VME chip
itself is connected to only the 16 least signicant
bits of the VME data bus (the TDC Chips
connect to all 32 data lines).
 Control signals from the CDF data acquisition
and trigger systems are brought onto the TDC
board using user-dened pins of the VME P2
backplane connector. The control signals are
bussed on the backplane of the CDF-standard
9U VME/VIPA crate [9,10] from a CDF Tracer
Card [16] to each of the TDC boards. The
specic control signals used by the TDC are as
follows:
 The CDF system clockthis is the master
reference signal for the CDF data-acquisition system. The clock has a 132-ns period
and is synchronous with the accelerator RF
structure. On the TDC board the differential
PECL signal from the backplane is rst
converted to TTL, then phase-locked, buf-

fered and applied to the TDC chips. The


clock signal is also applied to the TDC chips
after passing through a pair of programmable delay lines (0.25 ns granularity, 64 ns
range).
The Bunch Crossing signal (BC), indicating
that a clock corresponds to a crossing of
proton and antiproton bunches.
The Bunch 0 signal (B0) marks the rst
proton bunch which comes once per cycle
around the Tevatron ring.
The Level l and 2 trigger Accept/Reject
signals, as well as the Level 2 Buffer address
bits.
A calibration pulse from the VME backplane. This is converted from PECL to ECL
and can be applied to a pair of pins on each
of the four front panel connectors. The pulse
is thus sent to the amplierdiscriminator
shaper card (ASDQ) of the COT[1] and is
used for testing and calibration.

ARTICLE IN PRESS

TDC Chip_1

VME

STRATIX

EP20K100QC240-3V

FP Conn3
68 pin LVDS
From COT

Receiver

EP1S30F780C6

CDF_CLK_DEL

FP Conn2
68 pin LVDS

449

VME

P
2

CDF Control
VME

Receiver

2 x 3D3418-0.25S

VME Buffer

VME Buffer

FP Conn4
68 pin LVDS

Receiver

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

TDC Chip_0

Buffer

FP Conn1
68 pin LVDS

Receiver

STRATIX
EP1S30F780C6
P
to XFT
3

Fig. 3. The physical layout of the TDC board. The four input connectors, each with 24 LVDS channels, are on the left; the VME
backplane connectors are on the right. The elements are described in turn in the text.

 The VME P3 backplane connector is used to


transmit trigger ags generated by the TDC to
the eXtremely Fast Tracker (XFT) processor to
identify tracks in the COT for the Level-1
trigger.
 Each FPGA is connected to a 20-pin header
so that it is easy to use a logic analyzer for
testing and diagnosis. Signals can be routed
to the header by programming the FPGA
rmware.

5. The TDC FPGA Chip


The block diagram of a TDC Chip is presented
in Fig. 4. There are two major data paths inside
the TDC Chip, one to record the COT hits for
VME readout, and one for the generation of the
prompt trigger bits (Trigger Primitives) for the
XFT trigger track processor. The Chip is also

provided with a Test Data generator, an LVDS


pulse generator and a PLL clock generator.
The major functional blocks inside the TDC
Chip are described individually below. These are
implemented as rmware and are optimized for the
CDF application; other applications can be
accommodated by rmware changes.
5.1. The input block
Each TDC Chip has four banks of 12 highspeed LVDS inputs. From each bank a 120-bit
wide data bus passes data to the MUX/MASK
block (see Fig. 4), which can be set under VME
control to block out any unwanted channel (for
example, a COT wire that is continuously set true
due to a some failure). The MUX/MASK block
also allows internal testing of the TDC Chip by
allowing the inputs to be switched to a test pattern
generated inside the Chip with the Test Data
Generator block (described in Section 5.5). The

ARTICLE IN PRESS
450

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

L2 Buffers
Inputs from COT

00
The Pipeline

SERDES

MUX

IN

MASK*

01

RAM*
512 words

10
write

r ead

RAM*
64 words

Edge
Detector*

11

TEST-DATA RAM*

L2A

To VME
Interface

VME
Decoder

L1A

VME
Access(*)
XFT
Block*

CDF CLOCK

XFT-DAQ*

Hit Data Buffer*


168 VME32 words

To P3

PLL
Delayed

Hit Count Buffer*


7 VME32 words

Tx Pulse
12ns , 22ns , 66ns
clocks

RAM*
512 words

SERDES
OUT
Tx Out Pulse
to front panel

Fig. 4. The functional block diagram of the TDC FPGA (TDC Chip). All processing is determined by programming in rmware. Each
Chip handles 48 LVDS channels (shown as coming into the SERDES block in the upper left). The prompt trigger ags for the XFT
trigger processor are output through the P3 VME connector. Data are read out by the CDF Data Acquisition system from the Hit
Count and Hit Data Buffers. An asterisk indicates registers or memories that are VME-accessible. The individual blocks in the diagram
are described in the text.

fast digitization and conversion to a 10-bit wide


data stream for each channel then follows.
5.2. The pipeline and the Level 2 buffer system
The CDF Level-1 trigger is deadtime-less, with
all front-end data held in a pipeline for 5:544 ms
while the Level-1 trigger decision is being made
[17]. On the TDC card the delay is implemented
with a clocked pipeline. On receipt of a Beam
Crossing (BC) signal, an input memory address
counter is set to zero; the counter then increments
on every 12 ns clock. The phase of this input pulse
can be adjusted on each board in 0.25 ns steps, set
by VME, to compensate for signal propagation
and input signal length differences. The memory is
512 words deep, each word containing 480 bits (48
channels  10 bits/channel). The memory has two
ports; the rst port is always writing and the

second port is always reading. The adjustable


address of the second port, an offset from the rst
port, is set to establish the desired delay period for
the pipeline. The maximum delay is 6:1 ms.
To reduce deadtime in the CDF Level-2 (L2)
trigger system, on a Level-1 Accept signal (L1A)
the data from a given beam crossing are transferred to one of four Level-2 buffers, awaiting a L2
trigger decision [17]. The four L2 buffers are
independently controlled by an accompanying L2
buffer-selection signal. These signals are in phase
with the CDF clock pulse. The L2 buffers are
two-port memories; each has a respective write
address counter for the input write port and all
share a single address counter for their (output)
read port. This allows the writing of a second
Level-1 buffer while a rst is still collecting data.
The write clock is synchronous with the logic in
the pipeline, 12 ns per tick. The data are written

ARTICLE IN PRESS
M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

10-bits-wide per channel, so that 48 channels are


written at each tick in a 480-bit wide word. To
achieve a maximum of 384 ns for the time range,
the maximum length is 32 words, set via VME at
initialization (the maximum drift time in the CDF
COT is 200 ns).
A Level-2 Accept signal, together with an
address pointer, selects one of the four Level-2
buffers to be transferred to the Edge Detector
Block. The read port of the selected memory is
driven through the full range of stored data
addresses to present all the stored data. As not
all buffers contain data from beam crossings that
pass the Level-2 trigger (and therefore these would
not receive a Level-2 Accept signal), the logic
allows any Level-2 buffer to be overwritten
whether or not it has been read. No data memory
buffer is ever erased.
5.3. The edge detector block
The purpose of the Edge Detector Block is to
nd hits on the wires. Hits are dened as pulses of
at least 4.8 ns in width. Since the pulses can be
of indenite length, the techniques of pattern
matching or look-up tables cannot be used. The
technique used in the Edge Detector Block is to
look for leading and trailing edges of a pulse. A
leading edge is dened as a transition from low (0)
to high (1) and a trailing edge is dened as a
transition from high (1) to low (0). It is assumed
that all wires start out in a low (0) state and the
rst transition to nd is a leading edge.
The Edge Detector Block is made up of two
modules. The rst, called the ED, nds and stores
the edges on each of the 48 wires. The second,
called the ED48, controls the timing of the data
transfer into the Edge Detector Block, collects and
packs the output hit data, and signals when the
Block is nished. Each wire has its own dedicated
ED module, making 48 on each TDC Chip; there
is only one ED48 module on each Chip.
The data from the Level-2 Buffer are fed into
the Edge Detector Block in 10-bit words. A single
ED looks for hits in two consecutive 10-bit words
at a time. The beginning of a hit is dened as a
zero followed by at least four ones, or in the case
of the rst word, four ones in a row. The end of a

451

hit is dened as a one followed by four zeroes.


There are three possible transitions in each word
and each transition needs a memory cycle.
The maximum number of words looked at per
beam crossing for each wire is variable, with a
maximum of 33 words (396/12). The maximum is
set by the time between Tevatron beam crossings,
396 ns, and the data clocking period of 12 ns for
each word (1.2 ns per bit times 10 bits per word).
The number of words to be searched by the ED is
set in a register via VME at initialization.
5.3.1. A worked example
Table 2 shows the relationship between the
position of bits in the words sent to the ED and
their respective time value as an example in a word
that has three possible transitions.
In this case, there is a leading edge (transition
#1) starting at time value 1 and ending at time
value 4 (transition #2). There is also possibly
another leading edge (transition #3), starting at
time value 9, depending on what is in the next
word. In this case, the ED, which looks at 2 words
at a time, would nd a hit if the next word started
with three ones.
Once a hit has been found, the data describing
the hit are stored in a RAM in the ED and the hit
total is incremented. Each hit is characterized by
the number of the time bin of the leading edge and
the width, expressed as the number of time bins in
the hit. Thus if the data in the example of Table 2
were the rst word in the data stream, the leading
edge stored would be 1 and the width would be 4.
If instead the example data were the third word,
the leading edge time would be 21 (10 bits each for
Table 2
A sample word showing three possible transitions in the data
word sent to the Edge Detector
Bit position

[9]

[8]

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

Sample data
Time value

0
0

1
1

1
2

1
3

1
4

0
5

0
6

0
7

0
8

1
9

The rst is a leading edge starting at time value 1 and ending at


time value 4, which is the second transition. There is also
possibly another leading edge transition starting at time value 9,
depending on what is in the next word. Time values go from
early (9) to late (0).

ARTICLE IN PRESS
452

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

the rst and second words plus time value 1 in the


third word) and the width would still be 4.
5.4. The XFT Block
The choice of implementing the TDC functionality by rmware in an FPGA facilitates integrating an associated functionality, the implementation
of complex deadtimeless pattern recognition to
feed the tracking trigger system. The TDC XFT
block generates Trigger Primitives used by the
eXtremely Fast Tracker (XFT) [18], which identies tracks in the Central Outer Tracker (COT).
The tracks are used in the Level 1 Trigger, as well
as in Level 2, and consequently the complete
pattern recognition and momentum reconstruction
have to be available within 5:5 ms after the beam
crossing. The sense-wire planes of each superlayer
in the COT are tilted in the r  f plane so that a
high momentum track will traverse each superlayer
[1], and consequently will travel between a pair of
sense wires in each plane, resulting in some hits
that are prompt. The present Run IIa XFT splits
the 396 ns interval between beam crossings into
three time bins, a prompt, a not sure, and a delayed
time bin. These three bins are logically combined to
give 2 trigger bits per wire every 396 ns. The new
TDC/XFT design described here uses 11 time
windows to produce 6 output trigger bits per COT
wire every 396 ns. The 6 bits, referred to below
as Trigger Primitives, are derived from the hit

Trigger Bits

Data Stream
48 Occupancy Detectors

Time-window
enable bits

Time-window
clear bits

CDF_CLK
BC

occupancies in the 11 time windows with Boolean


logic. The larger number of time windows allows
better momentum resolution and fake track rejection.
The Trigger Primitives, consisting of the 6 bits
per wire times 48 wires, are output each 396 ns
from the XFT block on each of the 2 FPGAs.
Every 22 ns the TDC Chip sends out 16 bits to
the P3 backplane (48 bits for 66 ns, and so 48 
6 288 bits for 396 ns). In order to speed up the
transmission of data to the XFT, the bit for each
of the time bins is transmitted in turn after the
calculation for its corresponding three time-windows (see below) is completed. The time windows
can be reprogrammed to optimize performance by
changing data in VME registers without rmware
changes.
The TDC XFT block receives two types of
signals, the CDF control pulses Bunch 0 (B0) and
Bunch Crossing (BC), and the primary (digital)
data-stream from the COT. The XFT block sends
out timing alignment signals and the Trigger
Primitives (trigger bits) to the P3 connector on
the VME backplane for transmission to the XFT.
The TDC XFT block is controlled by two VME
registers and an internal RAM [19]. The values of
the registers determine the input and output
delays, and the RAM contents dene the timewindow intervals.
The TDC XFT block includes three main blocks
(Fig. 5):

Trigger
Logic
Control

Output
Multiplexer

Trigger
Primitives

B0

BC Delayed

Fig. 5. Block diagram of the TDC XFT Logic. The data stream comes from the MUX/MASK Block (Section 5.1) as 480-bit words
every 12 ns. The CDF_CLK, BC and B0 signals are the 132 ns CDF master clock, and the Bunch Crossing and Bunch Zero signals
generated by the Tevatron, and are transmitted to the VME backplane through the CDF DAQ system [16]. The TDC XFT block sends
out the Trigger Primitives, which are the multiplexed trigger bits, and trigger control signals.

ARTICLE IN PRESS
M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

 Occupancy Detectors (OD).


The 48 Occupancy Detectors, one per channel,
receive the digitized COT data stream (48
channels  10 bits/channel every 12 ns), perform hit recognition, and send out 48  6
Trigger Primitive bits every bunch crossing to
the Output Multiplexer. Each OD looks for a
1111 pattern (a hit) in two consecutive words
in the data stream (20-bits), which corresponds
to a 24 ns time interval. The search for hits is
performed separately by two Hit Scanners
inside each OD; one for hits starting in the rst
5 bits (6 ns), and the second for hits in the next 5
bits of the 20-bit data segment. Each Occupancy Detector contains 11 independent 1-bit
registers, (Time-Window Registers), controlled
by the Trigger Logic Control, that store hit
information. Each register records if there is a
hit in during the corresponding 12-ns timewindow. In the current rmware design each
time-window is controlled by two bits (timewindow control bits), used to implement
separate hit scanning for the rst and the last
6 ns of each 12-ns clock interval. If the rst
control bit is high, the register is sensitive to the
Hit Scanner for the rst 6 ns of every 12 ns
interval (the rst 5 bits of 10-bit data word).
Similarly, if the second bit is high, the register is
sensitive to the Hit Scanner for the last 6 ns of
every 12 ns (last 5 bits of 10-bit data word). The
time-window register records the logical OR of
the output of the two Hit Scanners. For each of
the 11 registers, the time-windows covering a
396 ns cycle (33 12-ns intervals) are dened by a
2  33 bitmap stored in the Trigger Logic
Control. At the end of every 396-ns cycle the
Time Window Register bits are sent to the
Output Multiplexer. The registers are then
cleared.
This implementation allows dening time-window ranges in units of 6 ns for the CDF XFT.
However, hit scanning can be done separately
for every bit of the 10-bit data word (this would
require conguring the rmware for 10 timewindow control bits per channel). In this case
the time-window unit would be 1.2 ns. The 6 ns
resolution is thought to be sufcient for the
existing XFT system.

453

 Trigger Logic Control (TLC). The TLC controls the XFT logic. Every 12 ns the Trigger
Logic Control block sends 22 bits, two per timewindow to the Occupancy Detectors dening
which 11 time-windows are enabled for this
clock cycle. All Occupancy Detectors get the
same 22 time-window bits. These time-window
bits are stored in a 22-bit wide RAM, accessible
via VME. The RAMs Address Counter is
incremented every 12-ns clock and hence the
time-window bits refresh every 12 ns. The
Address Counter starts to count on receiving
an XFT-Enable pulse, which is adjusted to be
delayed by the same amount as the COT data
stream relative to the Bunch Crossing (BC) and
Bunch Zero (B0).
 Output Multiplexer (OM). This receives the
trigger bits from the 48 Occupancy Detectors
and sends the Trigger Primitives in parallel
to the P3 backplane. Each OM sends out a
16-bit Trigger Primitive word every 22 ns on
the P3 connector to the XFT. The signals
are buffered to the P3 connector as TTL levels.
The OM also sends synchronously a Word_0
marker, the B0 marker if appropriate (i.e. the
crossing is that of the Tevatron bunch 0), and
an alignment signal (Data Strobe). The OM
does not perform any logical operations on
the trigger bits, but sends them in the order
required by the XFT [18].
The existing cables to the XFT have too low a
band-width to transmit the 22-ns Main Clock as
a data strobe, and so Trigger Primitive bits are
sent on the leading and trailing edges of a
slower clock which is also transmitted on the
cables. The Data Strobe (DS) is a 44 ns clock
formed from doubling the period of the Main
Clock. Thirty-two bits, 16 from each TDC
Chip, are sent every 22 ns, with 18 such cycles in
the beam-crossing period of 396 ns.
5.4.1. TDC XFT-DAQ Block
The TDC XFT-DAQ block of the rmware,
used for testing and diagnostic purposes only,
connects to a dedicated DAQ system similar to the
hit-data stream. It has the same structure, consisting of a Pipeline, L2 buffers, and VME-readout
buffers, and follows the same L1A/L2A sequence

ARTICLE IN PRESS
454

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

as that of the main hit-data stream. The length of


the XFT-L2 buffers is also VME controlled. For
testing, the XFT block is tted with another simple
VME Readout RAM that contains the current
XFT Trigger Primitives. This RAM can be frozen
and read out via VME for diagnostic tests, in
particular for debugging the TDC-XFT connection.
5.5. The Test Data Generator
The Test Data Pattern Generator inside the
TDC Chip allows testing of the functionality of the
TDC by running test data through the entire
sequence. The Pattern Generator is implemented
in the rmware as a VME accessible dual-port,
512-bit wide RAM, which can store 8192 32-bit
words. The rst 480 bits of the 512 are used to
drive the 48 channels with 10-bit words of test
data. The Pattern Generator is used in the tests
described below in Section 8.
5.6. The ASDQ Pulse Generator
The COT electronics chain of ASDQ-RepeaterTDC boards can currently be calibrated by having
each TDC send a differential-ECL calibration
pulse upstream on a pair of dedicated lines in the
multi-conductor signal cable to its associated
ASDQ cards. Each ASDQ uses this input to
generate a calibration pulse at its input, which then
propagates back through the data lines of the same
signal cable [1]. This loop allows a time calibration
of the signal path, including the actual cable
length. Multiple chains can be calibrated simultaneously by providing the bussed calibration backplane signal to each TDC (see Section 4). The
TDC rmware also provides for local generation
on the TDC of the pulses to the ASDQ cards. The
choice of the two calibration modes is made by
VME register selection.
In the local generation mode, the SERDES
OUT block (see Fig. 4) generates a serial LVDS
pulse pattern, available as an ECL signal on the
front panel. The timing and number of pulses are
controlled via VME by writing the contents of the
Tx Pulse RAM (Fig. 4), which is implemented as a
10-bit wide, 512-word memory.

5.7. The Clock Generator Block


The Clock Generator Block, implemented in
each TDC FPGA with PLLs, generates the 12-ns
and 22-ns clocks used inside the Chip. All the
clocks are synchronous with the delayed CDF
clock. The prompt CDF clock is also received and
is used to latch the CDF-specic back-plane
control signals (described in Section 4). Four 12ns clocks are generated onto output pins of the
FPGA as LVDS signals and routed back into the
Chip onto the FPGAs dedicated high-speed clock
input pins, one for each high-speed I/O bank.

6. The VME Interface Block


The VME Interface is implemented with a third,
smaller, Altera FPGA [15]. The design permits
Chained Block Transfer (CBLT) read commands
in both 32- and 64-bit modes for data transfer.
CBLT uses geographical addresses different from
the ones normally used in the crate, recognized
only by the participating modules. In CDF there
are typically 18 TDC cards per VME crate.
There are two possible CBLT read commands:
(1) Read Block Transfer from virtual slot 30: Hit
Count words are read from every TDC module
in the crate. Each TDC produces 14 words/
board in 32-bit mode and 8 words/board in 64bit mode.
(2) Read Block Transfer from virtual slot 31: Hit
Data words are read from every TDC module
in the crate. Each TDC produces up to 336
words/board in 32-bit mode and up to 168
words/board in 64-bit mode.
In this rmware implementation, the CBLT mode
is enabled by default. Up to 18 TDC cards sit in a
VME crate on the CDF detector; the TDC module
closest to the Crate CPU is automatically considered rst in the chain. The setting of a module
as last in the chain, or the possible removal of a
module from the readout, is done by writing to a
register in the modules VME Chip to disable
CBLT mode [19].

ARTICLE IN PRESS
M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

7. Module power requirements

455

200

The TDC board receives 5 V=15 A and


5 V=2 A on the P0, P1, and P2 backplane
connectors. The board generates 1:5 V=15 A
and 3:3 V=10 A with DC/DC converters, and
2:5 V=3 A using a linear regulator. A 3:3 V
voltage is also generated and passed through the
front panel connectors to the amplierdiscriminatorshaper card (ASDQ) [1]. Spare locations on
the TDC board are provided for two additional
DC/DC converters.

8. Performance of the TDC and Readout


The implementation of both the TDC functionality and the processor with its extensive trigger
logic and self-testing capabilities in one chip with
digital inputs and outputs has an impact on
characterizing the properties of the TDC. We
discuss the linearity, cross-talk, and channel-tochannel time skew in turn below [20]. We also
describe the bandwidth of the module for both 32and 64-bit VME block transfers.
8.1. Linearity
Unlike in more conventional analog TDCs, the
time-to-digital conversion is basically digital, with
the time binning determined by the sampling of the
input signal during de-serializing. This is controlled by a phase-locked loop. The PLL locks a
local oscillator; an error bit monitors loss of lock.
The jitter in the clock driven by the PLL is
specied to be less than 200 ps; the drift in the PLL
is negligible [13]. The differential non-linearity
consequently is less than 200 ps. Channel-tochannel skew is specied to be less than 100 ps
[13]. Each of these effects is signicantly smaller
than the least count of 1.2 ns; we (conservatively)
quote 1 count to be the inherent precision of the
device.
In the CDF application described here, the data
arrive at the front face of the TDC asynchronously
with the CDF clock. There is consequently an
undetermined phase between the clock and the
data, resulting in a possible additional 1 count

Hit Data Count (Time)

Avg CNT=175.0
Avg CNT=158.4
Avg CNT=142.0
Avg CNT=125.0

150

100

Avg CNT=108.1
Avg CNT=91.5
Slope is 1-count/1.202 ns

50

00
T0 +0 ns

+20 ns

+40 ns

+60 ns

+80 ns

+100 ns

Time Change of ASDQ Input Pulse

Fig. 6. Measurement of linearity of one channel: counts versus


delay in the input pulse. Because the sampling is controlled by a
phase-locked loop the differential non-linearity is expected to
be less than the least count, consistent with the measurement.

jitter.1 Fig. 6 shows measurements of the linearity,


with no deviation from expectations.
8.2. Channel-to-channel skew
Skew measurements were made from the LVDS
receivers output pins to the FPGA input pins. The
measuring point on the FPGA was accessed by
soldering test pick-off wires to the FPGA BGA
hole pattern on the solder-side of the PC board.
The channel-to-channel skew from contributions
other than the FPGA was measured to be less than
100 ps. Channel-to-channel variations inside the
FPGA are negligible.
8.3. Crosstalk
To test for crosstalk we compare the output of
the TDC boards with a reference set, computed
automatically by the test routines from the
input data stored in the RAM in the TDC Chips.
All test routines are implemented as C-code,
which is executed on the PowerPC (MVME2301
or MVME5500) crate controller after compilation.
Other crate controllers can also be used without
any signicant differences. Both walking 1s and
0s, and random patterns were used. The test
1
This is not inherent in the TDC electronics, but is from the
application. For a synchronous application this does not apply.

ARTICLE IN PRESS
456

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

routines exercise the Edge Detector (ED48 test),


the TDC XFT block (XFT test) and the Chain
Block Transfer in both 32- and 64-bit modes
(CBLT32/64 tests).
A total of 5  109 cycles were performed; zero
bit-errors were observed. This puts a limit at 90%
C.L. on the cross-talk bit-error rate due to crosstalk inside the FPGA of 5  1010 .
8.4. Readout bandwidth
The results of tests of the readout speed are
described below in more detail. The burst-mode
readout speed achieved in CBLT64 mode is
47 MB/s. The Edge Detector (ED48) processing
time is always less than 12 ms (for 7 hits/wire).
8.5. Testing chain block transfer: the CBLT32/64
Test
The CBLT 32/64 tests read out multiple boards
in a crate sequentially. Tests were performed with
up to 18 TDC boards in a VME crate. The tests
are designed to check the VME chain-block
transfer capabilities of the board, and also cratewide characteristics such the crate backplane
capability, stability, and multi-board performance.
In the CBLT tests the initial contents of the
Level 2 buffer RAMs are used to predict the
output of the ED48 modules as read from the Hit
Count and the Hit Data RAMs. In the full-crate
test 18 TDC boards are used. For each of the 36
TDC Chips in the test (2/board), the L2 buffer
lengths are set so that all possible patterns are
sampled. The order of accessing the four L2
buffers is also selected differently for each TDC
board, so that all combinations of L2 buffer and
buffer length are sampled. The CBLT32 and
CBLT64 tests were repeated 5  109 times each
without a single failure, such as would be due to
cross-talk or timing jitter in the logic.

9. Conclusions and summary


A new 96-channel TDC and trigger processor
module has been designed for the CDF Experiment at Fermilab using the multichannel bit-

sampling capabilities of the AlteraStratix FPGA


family. The board, built in a 9U VME format,
contains few other components other than the 2
TDC FPGAs, a VME controller implemented in a
3rd FPGA, DC-to-DC-converters, and input/output buffers. The functionality is exceptionally
exible, being controlled by rmware, so that it
can be reprogrammed for different applications.
The TDC has extensive test capabilities, implemented directly in the FPGAs. Thirty boards have
been built and tested. The reliability of the board is
high as the chip count is very low. The TDC
differential non-linearity is 200 ps, i.e. within 1
count at any point in the range, and no cross-talk
was detected in 5  109 96-channel tests. The
channel-to-channel skew is less than 100 ps. A full
crate of the CDF-II TDC has been operated and
read out in 64-bit block-transfer mode at a speed
of 47 Mbytes/s.

Acknowledgements
We thank Bill Badgett, Frank Chlebana, Pat
Lukens, Aseet Mukherjee, and Kevin Pitts for
help, support, and advice. Nils Krumnack and Ed
Rogers deserve special thanks for providing
critical input on the XFT specications and the
design of the XFT sections of the TDC. We thank
Rich Northrop for the picture of the board.
This work was supported in part by the
National Science Foundation under Grant No.
5-43270, and the US Department of Energy.

References
[1] T. Affolder, et al., Nucl. Instr. and Meth. A 526 (2004)
249.
[2] The CDF-II detector is described in the CDF Technical
Design Report (TDR), FERMILAB-Pub-96/390-E. The
TDC described here is intended as a further upgrade
beyond that described in the TDR.
[3] R.S. Moore [the CDF Run II collaboration], A custom 96channel VME TDC for the CDF detector for Tevatron
collider Run II; FERMILAB-CONF-04-262 Prepared for
2004 IEEE Nuclear Science Symposium and Medical
Imaging Conference (NSS/MIC), Rome, Italy, 1622
October 2004.
[4] D.I. Porat, IEEE Trans. Nucl. Sci. NS-20 (1973) 36.

ARTICLE IN PRESS
M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457
[5] J. Kalisz, Metrologia 41 (2004) 17
http://www.iop.org/EJ/abstract/0026-1394/41/1/004.
[6] J.F. Genat, Nucl. Instr. and Meth. A 315 (13) (1992) 411.
[7] An extensive list of references can be found in:
A. Mantyniemi, A High Resolution Time-to-Digital
Converter Based on Stabilised Three-stage Delay Line
Interpolation, Department of Electrical and Information
Engineering, University of Oulu; OULU 2004 Thesis;
ISBN 951-42-7460-I;ISBN 951-42-7460-X; http://herkules.
oulu./isbn951427461X/isbn951427461X.pdf
[8] J. Kalisz, R. Szplet, J. Pasierbinski, A. Poniecki, IEEE
Trans. Instrum. Meas. 46 (1997) 51.
[9] ANSI/VIPA 23-1998, March 22, 1998. These crates
support geographical addressing. Chain Block Transfer is
described in Appendix E.
[10] T. Shaw, G. Sullivan, A Standard Front-End Trigger
VME Bus-Based Readout Crate for the CDF Upgrade,
CDF/DOC/TRIGGER/CDFR/2388, May 12, 1998. The
CDF readout crate is a 21-slot 9U VME crate based on the
VIPA standard, with added bussed signals on rows A and
C of the P2 connector. All 64 of the user-dened pins on
these rows are bussed between slots 2 and 21 using the
same termination scheme as the standard VME bussed
lines. Additional power pins are provided on the P0
connector. The J3 backplane, which is physically separate
from J1/J2, is customized for I/O specic to each CDF
subsystem.
[11] The schematics and test results are available online
at: http://edg.uchicago.edu/bogdan/tdc/index.html. The
rmware is in CVS at http://www-cdfonline.fnal.gov/
cgi-bin/cvsweb.cgi/NuTDC. The schematics and code are
available on request.

457

[12] B. Bevensee, et al., IEEE Trans. Nucl. Sci. NS-43 (1996)


1725;
K. Pitts, private communication.
[13] Altera Corporation, Stratix Device Handbook, v3.0, Apr.
2004. The serializer/deserializer block is labeled SERDES
in this Handbook, and in the gures and text here. The
specications on differential non-linearity and skew in the
chip are from communications from Altera.
[14] Altera Corporation, Quartus II Handbook, v2.1, August
2004.
[15] Altera Apex AP20K100.
[16] T. Shaw, Specication for Trigger And Clock + Event
Readout Module (TRACER) Terri Shaw, CDF Internal
Note 4686, 8/1/98.
[17] M. Campbell, H. Frisch, M. Shochet, G. Sullivan,
D. Toback, J. Wahl, P. Wilson, CDF Internal Note
2038, April 1993.
[18] E. Rogers, N. Krumnack, CDF Internal Note 7193,
August 2004. This note is the specication for the XFT
implementation.
[19] M. Bogdan, H. Sanders, CDF Internal Note 6998, June
2004.
[20] Tests were made with VxWorks V5.3c [21] and FISION
V2.12 [22] running on a RedHat V6.2 Linux operating
system. CDF control pulses were provided by a CDF
TESTCLK card [23].
[21] r2003 WindRiver Systems, Inc. MCL-DS-VXW-0309.
[22] J. Pangburn, FISION V2.12 Users Guide, http://
www-cdfonline.fnal.gov/vme/FISION.html.
[23] T. Shaw, W. Stuermer, Internal note TESTCLK V7, CDF
Crate Clock and Trigger Driver, Fermilab, ETT/CDF
Upgrade Group, July, 1998.

You might also like