You are on page 1of 3

DS-P17

80-GHz Operation of an 8-bit RSFQ


Arithmetic Logic Unit

Yuki Ando, Ryo Sato, Masamitsu Tanaka, Kazuyoshi Takagi, and Naofumi Takagi
TABLE I. I NSTRUCTION SET OF THE ALU
AbstractWe have designed and demonstrated an arithmetic
logic unit (ALU) based on rapid single-flux-quantum (RSFQ) Instruction Control Signal Meaning
technology. The ALU has been developed toward demonstration ADD 1100 X+Y
MV 1101 0+Y (Move)
of high-performance information processing using a general pur- SUB 1110 XY (Subtract)
pose microprocessor CORE e4. The target operation frequency of CMP 1111 XY (Compare)
the ALU is 50 GHz. In this paper, we present the design and high- INC 0100 X+1 (Increment)
speed functionality test results of the ALU. The ALU can execute DEC 0110 X1 (Decrement)
AND 1010 X AND Y
10 instructions. We aimed at high frequency operation with XOR 1001 X XOR Y (Exclusive OR)
precise timing design using logic gates synchronized to the clock OR 1011 X OR Y
signal. The ALU contains 1555 Josephson junctions (JJs) and NOR 1000 X NOR Y
occupies a circuit area of 0.53mm2 . We have demonstrated the
correct operation of the ALU using on-chip high-speed test. The
estimated maximum operating frequency based on measurement
results is 80 GHz. have 256 bits (=8-bit32) each. We designed a new bit-serial
ALU to perform 10 instructions for use in CORE e4. The
KeywordsDigital arithmetic, rapid single-flux-quantum, su- target operation frequency of the ALU is 50 GHz. For the
perconducting integrated circuits, ALU. target operation frequency, strict timing design becomes more
important than ever. For this aim, the ALU selects the operation
I. I NTRODUCTION of the ALU not by non-destructive readout (NDRO) gates
but by AND gates unlike the previous designs. The operation
In recent years, increase in power consumption of the to be executed is determined by 4-bit control signals. This
CMOS circuits become a problem. A rapid single-flux- ALU performs the decoding of 4-bit control signals and ALU
quantum (RSFQ) circuit is a next-generation circuit using operations at the same time.
superconductor devices and is able to realize ultra-high-speed
computation of tens of GHz and low power consumption [1].
II. L OGICAL AND PHYSICAL ALU DESIGN
There have been several designs of arithmetic logic units
The ALU has 10 functions: arithmetic operations (ADD,
(ALUs) for SFQ circuits. The first successful demonstration
MV, SUB, CMP, INC, and DEC) and logical operations (AND,
of a bit-serial RSFQ microprocessor CORE1 was reported
XOR, OR, and NOR). Table I indicates the instruction set.
in [2]. Then, CORE1 with improved CORE1 has been
SUB instruction and CMP instruction have the same behavior
developed [3]. CORE1 is equipped with two ALUs and can
in the ALU. These are different in that whether the subtraction
execute 14 instructions (if you count each ALU function as
result is written back to the register in the microprocessor. The
individual instruction). Each ALU execute 2 arithmetical and
operation to be executed is determined by 4-bit control signals.
3 logical operations between two bit-serial data. The maximum
Each bit is named alu1, alu2, alu3, and alu4 from the left.
operation frequency based on measurement results is 23 GHz.
An 8-bit parallel ALU with a much larger set of operations was The inputs of the ALU are two signed integers (X =
designed and its correct operation is demonstrated [4]. This x7 , ..., x0 ,Y = y7 , ..., y0 ), four control signals (alu1, alu2, alu3,
ALU design is based on a Kogge-Stone adder and employs and alu4), increment signal (incr), and check signal (check).
an asynchronous wave pipelined approach scalable for wide An SFQ pulse is fed to the incr input when SUB instruction,
data path processors. The maximum operation frequency based CMP instruction or INC instruction is executed. The outputs
on measurement results is 20 GHz. An 8-bit asynchronous are the result (Z = z7 , ..., z0 ) and two flags, not zero (NZ) and
sparse-tree RSFQ ALU has significantly reduced circuit com- negative (NEG).
plexity while maintaining robust operational margins at high
Fig. 1 shows a schematic diagram of the bit-serial ALU.
frequency [5]. This ALU executes 8 arithmetical and 12 logical
The clock line is omitted. The ALU has five gate-level pipeline
operations. Simulations show that the ALU can operate at the
stages (stage 1 stage 5). In stages 1 to 4, the ALU performs
maximum frequency of 42 GHz.
decoding of the 4-bit control signal and arithmetic operation
With the advances in the process technology [6], it has simultaneously. In stage 4, the ALU performs logical opera-
become possible to design more complex circuits. We began tions. SUB instruction, CMP instruction, and DEC instruction
the development of microprocessors that have higher perfor- are realized by addition of the twos complement. Logical
mance than ever. Our goal is to show that it is possible to operations are performed using an AND cell, a NOR cell, and
make a stored-program computer by RSFQ circuit, CORE e4. an XOR cell. OR operation is realized by taking an OR of the
CORE e4 is equipped with a bit-serial ALU, four 8-bit registers results of AND and XOR. Arithmetic operations are performed
(Reg0, Reg1, Reg2, and Reg3), an instruction memory and using a bit-serial adder. In stage 5, the output of the operation
a data memory. The instruction memory and data memory result is selected by a combinational circuit using AND cells.

978-1-4673-8348-6/15/$31.00 2015 IEEE


DS-P17

NZ and NEG flags are set by examining the operation result


using a D flip-flop (D) and a D flip-flop with reset function
(RD). If there is a 1 in the operation result, NZ flag is set.
If the last bit of the operation result is 1, NEG flag is set.
We have carried out the design of the ALU using the
CONNECT cell library [7] developed by Yokohama National
University and Nagoya University. The ALU design has a
target operation frequency of 50 GHz with sufficiently wide
setup/hold timing margins for clocked gates. We simulated
the design and obtained frequency-dependent DC bias margins Fig. 2. Microphotograph of the ALU
for our circuit. We wrote the test bench in Verilog HDL. All
of the instructions have confirmed to work properly. At the
target operation frequency of 50 GHz, this ALU has a DC input. This results show the correct operation of 00010101
bias margins in a range of 95% to 125%. The DC bias margin OR 00000011 = 00010111 and NZ flag is set.
is normalized by the designed value.
Fig. 5 and Fig. 6 are the frequency dependence of bias
check
margins. The simulation was executed with the designed circuit
D D parameters. The critical currents of the measured chips were
alu4 1000 NZ NEG 10 percent greater than the designed values. We think that
D D D the upper side of the measured DC bias margins agreed with
10*1
D RD
the simulation, and that the lower margins were narrowed,
D D considering the shift in the critical currents. When operation
alu1 101*
D D frequency is 50 GHz, SUB instruction had a DC bias margin
01** in a range of 114136% of the designed value as illustrated
D D D in Fig. 5. The maximum operation frequency was 76 GHz,
alu2 11** Z when the DC margin is 135136% of the designed value.
D D At the same time, when operation frequency is 50 GHz, OR
D D instruction had a DC bias margin in a range of 111136%
alu3 of the designed value as illustrated in Fig. 6. The maximum
D D D operation frequency was 80 GHz, when the DC margin is 133
135% of the designed value.
D D D

Y
X D D jandf
NZ
incr bit serial adder
NEG
Fig. 1. Scheatic diagram of bit-serial ALU Z
01001000
clock out
III. I MPLEMENTATION AND TESTING OF THE ALU alu3
alu2
We implemented the bit-serial ALU using the AIST 10- alu1
kA/cm2 Advanced Process (ADP2). Fig. 2 shows a micropho- clock
cg_trg
tograph of the ALU. The ALU contains 1555 JJs and occupies incr
a circuit area of 0.53 mm2 . It has bias current of 194 mA and reset
check
the latency of 586 ps at the bias voltage of 2.5 mV. Latency is
the time from the first clock input to the output of the first bit.
We have some additional input interface and output interface
for the ALU testing.
Fig. 3. Input and output waveforms of the SUB instruction
All operations of the ALU have been successfully demon-
strated at 50 GHz. Fig. 3 shows the input and output waveforms IV. C ONCLUSION
in operation of SUB at the high-frequency test. We applied the
control signal (1110) and X and Y as input. The second line We have designed, fabricated, and tested an 8-bit bit-serial
(green) represents a clock line and the third line (red) repre- RSFQ ALU. It has the simulated DC bias margins in a range of
sents an output result. The fourth line (cyan) represents NEG 95% to 125% at the target operating frequency of 50 GHz. As
flag and the fifth line (blue) represents NZ flag. This results a result of the measurement, all instructions were successfully
show the correct operation of 00010101 (21) 00000011 (3) demonstrated at 50 GHz. The estimated maximum operating
= 00010010 (18) and NZ flag is set. Fig. 4 shows the input and frequency based on measurement results is 80 GHz. By using
output waveforms in operation of OR at the high-frequency this ALU, we will be able to develop a microprocessor with
test. We applied the control signal (1011) and X and Y as high performance.
DS-P17

NZ
NEG
11101000
Z
clock out
alu3
alu2
alu1
clock
cg_trg
incr
reset
check
Fig. 6. Frequency dependence of bias margins at the OR instruction

Fig. 4. Input and output waveforms of the OR instruction


R EFERENCES
[1] K.K. Likharev and V.K. Semenov, RSFQ logic/memory family: a new
Josephson-junction technology for sub-terahertz-clock-frequency digital
systems, Applied Superconductivity, IEEE Transactions on, vol. 1, pp. 3
28. 1991.
[2] M. Tanaka, T. Kondo, N. Nakajima, Y. Yamanashi, A. Fujimaki,
H. Hayakawa, N. Yoshikawa, H. Terai and S. Yorozu, A single-flux-
quantum logic prototype microprocessor, in Tech. Dig.IEEE Int. Solid-
State Circuit Conf, vol. 1, 2004.
[3] Y. Yamanashi, M. Tanaka, A. Akimoto, H. Park, Y. Kamiya, N. Irie,
N. Yoshikawa, A. Fujimaki, H. Terai and Y. Hashimoto, Design and Im-
plementation of a Pipelined Bit-Serial SFQ Microprocessor, CORE1 ,
Applied Superconductivity, IEEE Transactions on, vol. 17, pp. 474-477,
2007.
[4] T.V. Filippov, A. Sahu, A.F. Kirichenko, I.V. Vernik, M. Dorojevets,
C.L. Ayala and O.A. Mukhanov, 20 GHz operation of an asynchronous
wave-pipelined RSFQ arithmetic-logic unit, Phys. Proc, vol. 36, pp. 59-
65. 2012.
[5] M. Dorojevets, C.L. Ayala, N. Yoshikawa and A. Fujimaki, 8-Bit Asyn-
Fig. 5. Frequency dependence of bias margins at the SUB instruction chronous Sparse-Tree Superconductor RSFQ Arithmetic-Logic Unit With
a Rich Set of Operations, Applied Superconductivity, IEEE Transactions
on, vol. 23, 2013.
ACKNOWLEDGMENT [6] S. Nagasawa, K. Hinode, T. Satoh, M. Hidaka, H. Akaike, A. Fujimaki,
N. Yoshikawa, K. Takagi and N. Takagi, Nb 9-Layer Fabrication
This work is partly supported by JST-ALCA and the VLSI Process for Superconducting Large-Scale SFQ Circuits and Its Process
Design and Education Center (VDEC) of the University of Evaluation, IEICE Trans. Electron. vol. E97-C, no. 3, pp. 132140,
Tokyo, in collaboration with Cadence Design System, Inc. 2014.
[7] H. Akaike, M. Tanaka, k. Takagi, I. Kataeva, R. Kasagi, A. Fujimaki,
The circuits were fabricated in the clean room for analog- M. Igarashi, H. Park, Y. Yamanashi, N. Yoshikawa, K. Fujiwara, S. Na-
digital superconductivity (CRAVITY) of National Institute of gasawa, M. Hidaka and N. Takagi, Design of signal flux quantum
cells for a 10-Nb-layer process, Phys.C:Supercond., vol. 469, no. 15-20,
Advanced Industrial Science and Technology (AIST) with the pp. 1670-1673, Oct. 2009.
Advanced Process (ADP2).

You might also like