Alu Design

Computers and Electrical Engineering 39 (2013) 20452052
Contents lists available at SciVerse ScienceDirect
Computers and Electrical Engineering

journal homepage: www.elsevier.com/locate/compeleceng
An arithmetic controller design for numerical control q

Shu-Chung Yi a,b,, Chin-Fa Hsieh c
a
b
c
Department of Computer Science and Information Engineering, National Changhua University of Education, No. 2, Shi-Da Road, Changhua 500, Taiwan, ROC
Department of Computer Science and Information Engineering, Asia University, Taiwan, ROC
Department of Electronic Engineering, China University of Science and Technology, No. 245, Sec. 3, Academia Rd., Nangang Dist., Taipei City 11581, Taiwan, ROC
a r t i c l e
i n f o
Article history:
Received 18 July 2012
Received in revised form 9 June 2013
Accepted 10 June 2013
Available online 4 July 2013
a b s t r a c t
This paper provides an arithmetic controller comprising: an arithmetic logic unit having a
plurality of arithmetic instructions, such as ADD (Addition), SUB (Subtraction), MUL (Multiplication), and DIV (Division) instructions. This arithmetic processor was implemented by
a cell-based ow and supports the basic mathematical operations, and numerical control.
All mathematical instructions are composed of three bytes. In the rst byte, it contains
the operation code and the address of the operand, while the operands are in the second
and third bytes. While the processor architecture compared with the conventional CPU,
the performance is speed up for the number reduction of instruction cycle. The number
of instruction cycle is decreased to ve T-states. All of these circuits were implemented
by the TSMC 0.35 lm cell library. A 20-pin I/O PAD was selected to package this processor.
The experimental results are showed and discussions are made.
2013 Elsevier Ltd. All rights reserved.
1. Introduction
NC (Numerical Control) is to control a machine or a process by symbolic codes that consisting of characters and numerals
[1]. The development of numerical control owes much to the United States air force. In 1949, the U.S Air Force awarded Parsons a contract to develop new type of machine tool that would be able to speed up production methods. Parsons sub-contracted the Massachusetts Institute of Technology (MIT) to develop a practical implementation of the new machine. This
prototype was produced by retrotting a conventional tracer mill with numerical [2] control servo mechanisms for the three
axes of the machine. Commercial NC units were then introduced by the machine tool builders. The NC control unit became
smaller, more reliable, and less expensive while integrated circuits were developed [3].
CNC (Computer Numerical Control) is a control system that a dedicated processor performs the basic and advanced NC
functions. CNC is also comprised of a computer assisted process. The process controls a general purpose machine from
the stored instructions generated by a processor.
The computer allows for the following: storage of additional programs, program editing, running of program from memory, machine and control diagnostics, special routines, inch/metric [4]. CNC machines can be used as stand alone units or in a
network of machines such as exible machine centers. The controller uses a permanent resident program called an executive
program to process the codes into the electrical pulses that control the machine. In any CNC machine [5], executive program
resides in ROM and all the NC codes in RAM. The information in ROM is written into the electronic chips and cannot be
erased and they become active whenever the machine is on. The contents in RAM are lost when the controller is turned
off. Some use special type of RAM called CMOS memory, which retains its contents even when the power is turned off.
q
Reviews processed and approved for publication by Editor-in-Chief Dr. Manu Malek.
Corresponding author at: Department of Computer Science and Information Engineering, National Changhua University of Education, No. 2, Shi-Da
Road, Changhua 500, Taiwan, ROC. Tel.: +886 4 7232105x8425, mobile: +886 930524443; fax: +886 4 7211258.
E-mail addresses: scyi@cc.ncue.edu.tw (S.-C. Yi), c0935@cc.cust.edu.tw (C.-F. Hsieh).
0045-7906/$ - see front matter 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.compeleceng.2013.06.002
2046
S.-C. Yi, C.-F. Hsieh / Computers and Electrical Engineering 39 (2013) 20452052
Many application of numeric control was proposed, such as fuzzy control [6], digital control [7], motor control [8,9], and
co-processor systems optimization [10].
The architecture of SAP-1 [11,12] is shown in Fig. 1. The system clock of SAP-1 is divided into six T-states, composing an
instruction cycle and being denoted as T1, T2, T3, T4, T5, and T6. The fetch stage, the decode stage, and the execute stage
were nished completely through T1 to T6, for each instruction. An instruction was fetched from the memory and was transferred to the decoder, through period T1 to T3. The execute stage occurred through period T4 to T6.
In Section 2, classical model control strategies are briey depicted by a simple four-block representation in order to introduce the (single-loop) BMC. The principle of the behaviour model control is then presented. A double-loop structure is suggested and analysed in Section 3. Finally, this BMC extension is applied to a DC machine drive control in Section 4. Simulation
and experimental results are provided to validate the double-loop BMC.
This paper is organized as follows. Section 2 presents the design and implementation results for the multiplexer based
recongurable VD. Section 3 presents the design method and implementation results for the dynamic partial recongurable
VD. Section 4 presents the results for static reconguration VD and for multiplexer based recongurable VD design. Comparison of the different implementation methods are discussed in Section 5. Concluding remarks are given in Section 6.
2. Architecture of the proposed processor

The architecture of the mathematical processor is shown in Fig. 2. The function of each block is described as follows:
The architecture has ve T-states (two fetch and three execute cycles) reserved for each instruction. Each T-state is called
a machine cycle. A ring counter is used to generate a T-state at every falling edge of clock pulse.
A Program Counter (PC) is used to generate the address for the next instruction. At start, the PC is zero.
Fig. 1. The architecture of SAP-1.
2047
Fig. 2. The architecture of the proposed arithmetic processor.
A Memory Address Register (MAR) is used to latch the address from PC. The address is latched into MAR during the leading edge of T1.
A memory (RAM) is used to store the instruction. Each instruction is coded on eight bits. The most signicant bits correspond to the instruction itself, while the least signicant bits are the address of operands. The two 8-bit operands are stored
continuously in the RAM assigned by the above address.
An Instruction Register (IR) is used to latch the instruction from memory. The 8-bit information is split into two parts: the
most signicant four bits correspond to the instruction, while the least signicant four bits are address. The instruction register generates two successive addresses to RAM while decoding mathematical instructions. Hence mathematical unit receives two successive operands.
A control unit is used to generate all the control signals. It controls the most important signals such as the enable and
latch controls. The input of the control unit comprises the instruction code and phase information.
A mathematical unit is used to execute the mathematical operations, addition, subtraction, multiplication and division.
An output unit is used to display the result. The mathematical operations are completed after T4. The addition/subtraction
is based on full adder sub circuits. The ripple-carry-adder (RCA) is adopted in addition/subtraction unit. The multiplication
operation is based on add-and-shift algorithm as shown in Fig. 3. The number A is multiplicand and the number B is multiplier. The division is also based on subtract-and-shift algorithm. The multiplication/division circuit requires three-level tree
adders. Each adder, used by multiplication/division, is a carry-look-ahead adder (CLA). Hence the computation time required
is three times of the time of one CLA.
In this architecture, the control unit of the processor had a new feature. At T2 state, the fetch stage was just for incrementing the value of PC. Included this new feature, all the control signals were generated by only one T-state ahead required.
Therefore, the feature omitted one T-state in the instruction cycle. The fetch cycle occurs through T1 to T2, while the execute
cycle occurs through T3 to T5. Among the MUL (Multiplication) and DIV (Division) instructions [13] added, the arithmetic
2048
Fig. 3. The block diagram of multiplier of AxB.
unit choose the suitable functions by the operation code directly. Each instruction cycle of the proposed processor can reduce
one state in the fetch cycle.
3. Experimental results
The architecture was implemented and veried by VerilogHDL [1416]. The architecture was simulated by the TSMC
0.35 lm CMOS technology. The implementation was also synthesized by a FPGA development board. The board provides
a hardware platform for developing embedded systems based on Altera Stratix devices. The NIOS development board contains a Stratix EP1S40F780C5 device under a xed frequency 50 MHz. The experimental results of each instruction are shown
as follows:
3.1. Addition cycle (ADD instruction)
The addition is performed on two eight-bit operands. For example, considering the two operands are placed in the address 01h and 02h. The instruction ADD 01h corresponds to perform the addition of two operands located in the address
01h and 02h. The mathematical unit generates 0Fh after the addition of 0A and 05. The timing diagram is shown in
Fig. 4. The value 0Fh appears at the state T5. The 8-bit ripple-carry-adder (RCA) is used in addition/subtraction unit.
3.2. Subtraction cycle (SUB instruction)
The subtraction is performed on two eight-bit operands. For example, considering the two operands are placed in the address 01h and 02h. The instruction SUB 01h corresponds to perform the operand located in address 01h minus the operand
Fig. 4. Timing diagram of ADD instruction.
2049
Fig. 5. Timing diagram of SUB instruction.
located in address 02h. We take the operation of 0Fh-0Ah as an example. The timing diagram is shown in Fig. 5. The value
05h appears at the T5 period.
3.3. Multiplication cycle (MUL instruction)

The operands 05h and 0Ah were taken into consideration in this multiplication. The timing diagram is shown in Fig. 6. The
value 032h appears at the T5 period after multiplication. The operation of multiplication is started at T3 and is stopped at T5.
The multiplication circuit requires three-level tree adders. Hence the computation time required is three times of the time of
one CLA [17].
3.4. Division cycle (DIV instruction)

The operands 032h/05h were taken into consideration in division operation. The timing diagram is shown in Fig. 7. The
operation of division is started at T3 and is stopped at T5. The division circuit requires three-level tree adders. Hence the
computation time required is three times of the time of one CLA. The value 0Ah appears at the T5 period.
4. The implementation of chip and discussion

The arithmetic processor was implemented by the TSMC 0.35 lm CMOS technology. The Verilog-XL and Debussy were
used in simulation and synthesis process. Silicon Ensemble was used for the layout editor. The software Calibre was used
for the verication process. Finally, a 20-pin I/O PAD was selected to package this chip. The layout of the processor is shown
in Fig. 8. The post-layout specication is also shown in Table 1.
The architecture is also implemented by a FPGA development board. The board provides a hardware platform for developing embedded systems based on Altera Stratix devices. The NIOS development board contains a Stratix EP1S40F780C5 device under a xed frequency 50 MHz. Figs. 47 show the measured results.
Fig. 6. Timing diagram of MUL instruction.
2050
Fig. 7. Timing diagram of DIV instruction.
Fig. 8. Layout of the arithmetic processor.
Table 1
Performance of the arithmetical processor.
Technology
Supply voltage
Max clock frequency
Power consumption
Die area without pad
0.35 lm cell-based
3.3 V
200 MHz
86.6 mW@200 MHz
260 lm 260 lm
The PC is built by a chain of edge-sensitive D-latches, which counts from 0 to 15. At T1 state, the instruction is fetched
from RAM directly. The fetch process is better in speed than that of SAP-1. Hence one T state is omitted to speed up the time
of instruction execution.
The counter is incremented by three during the phase T2 of the microinstruction sequence. The two operands follow the
rst microinstruction. Hence each complete instruction occupies three bytes, that is one byte instruction including address
and two operands.
Each instruction cycle of the proposed processor can reduce one state in the fetch cycle. The critical path of the CPU execution time is in the fetch cycle. The fetch cycle access the instructions from memory bus. The capacitance of memory bus is
the heaviest in controller or CPU. The aim of this architecture is to lessen the load of W-bus. Decreasing the state number can
speed up the performance of controller. The approach also decreases the period of clock.
The total execution time of instruction cycle, Tinstruction_cycle, of the conventional controller is shown in Eqs. (1)(3).
2051
Table 2
Comparisons among some processors.
Max clock frequency

Power
Supply voltage
Multiplication operand
Division operand
Instructions per second
Instruction pipeline
T instruction
cycle
Tfetch
dsPIC 30F4011[20]
Pic 12F50x[19]
Pic 16F5x[18]
SAP-1 [11]
This work
120 MHz
660 mW
@120 M
5V
16 16(bits)
16/16(bits)
30 MIPS
20 MHz
800 mW
@20 M
2.05.5 V
20 MIPS
2 stages
20 MHz
800 mW
@20 M
2.05.5 V
20 MIPS
2 stages
200 MHz
118.5 mW
@200 MHz
3.3 V
33 MIPS
200 MHz
86.6 mW
@200 MHz
3.3 V
8 8(bits)
8 8(bits)
40 MIPS
cycle
Texecute
cycle ;
3=5T 2=5T;
T;
where Tfetch_cycle and Texecute_cycle equal (T1 + T2 + T3) and (T4 + T5 + T6), respectively. Owing to the heavy capacitance of
memory bus, Tfetch_cycle occupy about the 60% of all the execution time.
The total execution time of the proposed arithmetic controller T 0nstruction cycle , is shown in Eqs. (4)(7).
T0instruction
cycle
T0fetch
cycle
T0execute
2=3Tfetch
cycle
cycle Texecute
cycle ;
2=33=5T 2=5T;
2=5T 2=5T 4=5T
5
6
7
The performance of the proposed arithmetic is improved by 20% than conventional SAP-1 processors from Eq. (7). The
performance is better than that of SAP-1 as the omission of one T-state in the fetch cycle.
Fig. 8 showed the layout photo of the arithmetic processor.
Since the architecture is simple and the size is small, the capacitance of W-bus and all modules are small. The power consumption and latency are all small. The maximum clock is about 200 MHz by the TSMC 0.35 lm CMOS technology. Table 2
shows comparisons among several popular processors.
A two-stage pipeline overlaps fetch and execution of instructions in Pic16F5x [18] and Pic12F50x [19] series controllers.
Each instruction comprises 8 cycles. As two-stage pipeline in instruction, the speed of the above two controller goes up to 20
MIPS.
The instruction cycle of dsPIC30F4011 [20] is four. The maximum clock is about 120 MHz. Hence the speed of
dsPIC30F4011 is 30 MIPS.
Since the proposed architecture is simple and the size is small, the proposed controller has the maximum operational
speed and low power consumption. It is suitable for low cost and high speed controller. It is also suitable to play the role
of co-processor for arithmetical operation.
5. Conclusions
The arithmetic processor that can calculate the basic arithmetic operations was proposed. The performance is improved
about 20% better than general purpose processor SAP-1 as the omission of one T-state in the fetch cycle.
The arithmetic processor is suitable for numerical control. The architecture of the controller was implemented in 0.35 lm
TSMC cell library. The timing diagram, layout photo and architecture are shown. The proposed processor was also veried by
a FPGA development board for functional test. It is suitable for low cost and high speed controller. The architecture is also
suitable to play the role of co-processor for arithmetical operation.
References
[1]
[2]
[3]
[4]
Amstead BH, Ostwald PF, Begeman ML. Manufacturing processes. 8th ed. John Wiley & Sons Inc.; 1987.
Andrews Michael. Inuence of architecture on numerical algorithms. Microprocessors 1978;2(3).
Nave Ra. Implementation of transcendental functions on a numerics processor. Microproc Microprog 1983;11(34):2215.
Anderson KW, Shannon GF. Numerical techniques for the inverse optimal control problem in limited state feedback systems. Comput Electr Eng
1975;2(1):5366.
[5] Wang Jun, Xu Xun, Sun Jun, Li Rentang, Wang Wanshan. Development of an NC controller for next generation CNCs. Int J Innovative Comput, Inf Control
2008;4(3):593604.
[6] Jamshidi M, Barak D, Baugh S, Vadiee N. Computational and experimental environments for fuzzy logic and control. Comput Electr Eng
1993;19(4):28998.
[7] Patra A, Mukhopadhyay S. A software tool for performance evaluation of digital control algorithms on nite wordlength processors. Comput Electr Eng
1996;22(6):40319.
2052
[8] Hadj Sad S, MSahli F, Mimouni MF, Farza M. Adaptive high gain observer based output feedback predictive controller for induction motors. Comput
Electr Eng 2013;39:15163.
[9] Yi Shu-Chung. An 8-bit current-steering digital to analog converter. Int J Electron Commun 2012;66:4337.
[10] Guo S, Peters L. A high-speed fuzzy co-processor implemented in analogue/digital technique. Comput Electr Eng 1998;24:8998.
[11] Malvino Albert Paul, Jerald A. Brown digital computer electronics. 3rd ed. MCGRAW-HILL inc.; 1983.
[12] http://help.sap.com/saphelp_nw04/helpdata/en/1a/7dc33a0f374932e10000000a11402f/content.htm.
[13] Patterson David A, Hennessy John L. Computer organization & design. 2nd ed. Morgan Kaufman publishers inc.; 1998.
[14] Dawson C, Pattanam SK, Roberts D. The verilog procedural interface for the verilog hardware description language. In: 1996 IEEE international
conference on Verilog HDL; 2628 February 1996. p. 1723.
[15] Sauge P, Thuau G. Integrating of verilog HDL and VHDL languages in the smashtm mixed-signal multi-level simulator. In: Proceedings of verilog HDL
conference and VHDL, international users forum; 1619 march 1998. p. 26.
[16] Gannot G, Ligthart M, Lyon RF. Verilog HDL based FPGA design. In: 1994 international conference on verilog HDL; 1416 march 1994. p. 8692.
[17] Yi Shu-Chung. A new construction adder based on Chinese abacus algorithm. Comput Electr Eng 2012;38:18593.
[18] PIC16F5X Data Sheet http://ww1.microchip.com/downloads/en/DeviceDoc/41213D.pdf.
[19] PIC12F508/509/16F505 8/14-Pin, 8-Bit Flash Microcontrollers http://ww1.microchip.com/downloads/en/DeviceDoc/41236E.pdf.
[20] dsPIC30F4011/4012 Data Sheet http://ww1.microchip.com/downloads/en/devicedoc/70135C.pdf.
Shu-Chung Yi received the B.S., M.S. and Ph.D. degrees in Electrical Engineering from National Cheng Kung University, National Taiwan University, National
Taiwan University, in 1986, 1990 and 1998, respectively. He is currently an associate professor in the Department of Computer Science and Information
Engineering, National Changhwa University of Education. His research interests include algorithms, VLSI architectures, and circuit level techniques.
Chin-Fa Hsieh was born in Tainan, Taiwan, R.O.C. He received the M.S. degree in electrical engineering from the National Cheng Kung University, Tainan,
Taiwan, in 1988. In 1993, he joined the Department of Electronic Engineering, China University of Science and Technology, Taiwan. His research interests
include VLSI signal processing, video coding algorithms and System-On-Chip design.

Alu Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Alu Design

Uploaded by

Copyright:

Available Formats

Computers and Electrical Engineering 39 (2013) 20452052

Contents lists available at SciVerse ScienceDirect

Computers and Electrical Engineering

An arithmetic controller design for numerical control q

2. Architecture of the proposed processor

Fig. 1. The architecture of SAP-1.

Fig. 2. The architecture of the proposed arithmetic processor.

Fig. 3. The block diagram of multiplier of AxB.

Fig. 4. Timing diagram of ADD instruction.

Fig. 5. Timing diagram of SUB instruction.

3.3. Multiplication cycle (MUL instruction)

3.4. Division cycle (DIV instruction)

4. The implementation of chip and discussion

Fig. 6. Timing diagram of MUL instruction.

Fig. 7. Timing diagram of DIV instruction.

Fig. 8. Layout of the arithmetic processor.

Max clock frequency

You might also like