You are on page 1of 6

3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004), 11-13 May, Kuala Lumpur, Malaysia

HDL-BASED DESIGN METHODOLOGY OF 16-bit RISC MICROPROCESSOR


Ismail Saad1, Pukhraj Vaya1, Abu Bakar Abd Rahman2 Lecturer, 2MSc Student School of Engineering and Information Technology University Malaysia Sabah, Locked Bag 2073, 88999 Kota Kinabalu, Sabah, Malaysia Tel: +60-8-832-0000 x 3147/3066, Fax: +60-8-832-0348 (e-mail: ismail_s@ums.edu.my, vaya@ums.edu.my , abubakar@seit.ums.edu.my)
1

Abstract This paper presents the design and simulation of 16-bit RISC processor architecture behavioral model based on HDL methodology using Verilog-HDL software. The processor system consists of ROM, RAM, I/O and CPU. The CPU module is merely a shell which instances the real processor definition in cpu_core.v, control.v, datapath.v and alsu.v file. Behavioral model of control module which comprises of controller state machine, Instruction Register (IR) and a group of Control Signals are explained thoroughly. The tasks of modeling Read, Write and Tristate buffer operation for datapath module are also deeply being explained. The functionality of the processor design was tested by executing three instructions type. Thus, it is shown that Verilog-HDL can be used to improve the design process of new microprocessor architecture. Keywords: Verilog- HDL, RISC, Behavioral Model, Register, Microprocessor 1. Introduction Microprocessor application is not limited to personal computer but also used in a specific field such as robotics, communications, control systems, etc [1-5]. However, the process of designing a new processor for such application is very complicated, as it involves million transistors in single chips [6-9]. Therefore, in order to improve the design process and thus minimizing error, time and cost, Verilog Hardware Description Language (Verilog-HDL) is a software tool that can be used to simulate and verify the functionality of the microprocessor components before the real device were fabricated [9-13]. Thus, the paper presents a design and simulation of 16-bits RISC processor based on HDL methodology using Verilog-HDL on Synopys Front-end Compiler. 2. Processor Architecture The processor has a multiplexed 16-bit data and address path. The instructions has a variable length, as it take one word for instruction that operates within Registers only and two words for instructions operates on Registers/Memory and Register/Immediate . The 16-bits instruction fields consists of 2-Mode bit, 1-bit each for Set Condition (set_bit) and Test Condition (test_bit), 3-bit ALU Function (ALU_func) and 3-bit each for Destination Register (Rd), Source- 1 Register (Rs1) and Source-2 Register (Rs2). The processor can execute 36 instructions, which are grouped into 2 instruction type; Arithmetic/Logical and Load/Store Instructions. There are six registers in the processor where 3 of them are general purpose while the other 3 are dedicated register that is PC (Program Counter), IR (Instruction Register) and DR (Direct Register). On top of that, a dummy register (always zero) is also included in the register file. 1 4. 3. Verilog-HDL Model for Processor System

The system of processor consists of 256 words of ROM (addresses 0-255), 256 words of RAM (addresses 256-511) and I/O consisting of a bank of 16 switches (mapped at address 512) and a bank of 16 LEDs (mapped at address 513). The cpu.v file is merely a shell which simulates the pad ring and which instances the real processor definition in the following files: cpu_core.v control.v datapath.v alsu.v Verilog HDL Module Codes

4.1 Cpu_Core Module This module has a single internal system address/data bus. Because of a single bus system, all the data from memory, Data_in must pass through a tri-state, TrisMem control signal before connection to system bus. Furthermore, this module instances the definition of control and datapath modules of processor. 4.2 Control Module Two main functions of control module is to execute operations in proper sequence by means of controller state machine and to generates the control signals that cause each instructions to be executed.

3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004), 11-13 May, Kuala Lumpur, Malaysia

The control module consists of 16-bit Instruction Register (IR), 1-bit zero flag register the controller state machines and sub states of memory cycles and the different type of generated control signals as illustrated in figure 1 below.
State

4.2.2 Instruction Register and Zero Flag Register Any instructions that stored in rom.v file will be taken into system bus if the tri-state buffer, TrisMem goes high. This instruction from the system bus will be taken into Instruction Register (IR) during the Fetch1 state and data_setup sub state of memory cycle and sequentially the IR will be updated. This operation will only occur in the positive edge of the clock cycle. For behaviourally model the task of taken instruction from system bus to IR in control module, the if statement of verilog code has been used as below.
always @(posedge Clock) begin if ((state == `Fetch1) && (sub_state == `data_setup )) IR <= #20 Sysbus; end

CONTROL
ReadPC_1 ReadR0_1

Zero Function TrisPC ReadR1_1

0: Fetch1 3: Fetch2 1: Execute Sub_state : 0: address_setup 1: address_hold Zero zero_flag_reg


Zero Flag

TrisALU ReadR2_1 ReadR3_1 ReadPC_2 nTrisRd ReadR0_2 PC_inc ReadR1_2 Rs2_sel ReadR2_2 ReadR3_2 WriteR2 WriteR1 TrisRs2 TrisRd

3: data_setup 2: data_hold

IR
testbit setbit ModeBit Opcode ALUfunc Rd Rs1

LoadDR LoadPC WriteR3 WritePC

Rs2

15 14 13

12

11 10

Continuously, the control module will coded the instruction in the IR into an Opcode, ModeBit, Destination Register (Rd), Source1 Register (Rs1), Source2 Register (Rs2), ALU function, Set Bit and Test bit field format. There is also a 1-bit zero flag register inside control unit that specifically design for an execution of conditional instruction. The Zero signals from ALU unit will only taken into zero flag register when the following condition has been satisfied and it happen only in the positive edge of the clock: 1. 2. setbit of instruction is TRUE The processor is in Execute state, data_hold sub state and ModeBit is 01 or in the Fetch1 state, data_setup sub state and ModeBit is 00.

Fig. 1. Processor Control Module Architecture 4.2.1 Controller State Machine The controller state machine has three states: Fetch1 (00), Fetch2 (11) and Execute (01) that coded by using gray code. Furthermore, it also has 4 memory cycles sub states: address_setup (00), address_hold (01), data_setup (11) and data_hold (10). For distinguish transitions of operation from one state to another, the data_hold sub state of memory cycle and the 2-mode bit fields of instruction has been used. Generally, Fetch1 state is for Register + Register instruction type, which use 4 clock cycle or 1 memory cycle to be executed. Execute state is for Register + Immediate instruction type, that use 8 clock cycle or 2 memory cycles to be executed. For Load and Store instruction type, which is the longest instruction to be executing, Fetch2 and Execute state is used for 3 memory cycles or 12 clock cycles. Hence, all instructions are complete in exactly 12 clock cycles. This controller state machine has been coded in verilog by using case statement and in general the algorithm can be view as below.
always @(positive edge of clock) begin if (nReset = 0) state => high impedance; else begin case (state) 0: if (data_hold & ModeBit =00) state => Fetch1; else if (data_hold & ModeBit =01) state => Execute; else if (data_hold & (ModeBit =10 || 11) state => Fetch2; else if ( ModeBit =01|| 00) state => Fetch1; 3: if (data_hold & (ModeBit =10||11) state => Execute; else if ( ModeBit =10||11) state => Fetch2; 1: if (data_hold) states =>Fetch1;

The behavioural model of this zero flag register is carried out by using if statement in verilog code as return below:
always @ (posedge Clock) begin if ((((state == `Execute && ModeBit == 2'b01) && sub_state == `data_hold) || (state == `Fetch1 && ModeBit == 2'b00 && sub_state == `data_setup)) && setbit == 1'b1) zero_flag_reg <= Zero; end

4.2.3 Control Signals There are 4 groups of control signals that must be generated in control unit as listed below: 1. The memory control signals (nME, nALE, RnW, nOE and ENB) and signals to identify memory write 2. Tri-state buffer control signals for System bus 3. Datapath control signals 4. ALU function control signal

3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004), 11-13 May, Kuala Lumpur, Malaysia

The behavioural model of the first group of control signals is written in verilog using assignment as follow:
assign memory_write = (Opcode == `ST) && (state == `Execute); assign nME = ( sub_state == `address_setup ) || ( sub_state == `data_hold ); assign nALE = ( sub_state == `address_setup ); assign RnW = ( sub_state == `address_setup ) || ( sub_state == `address_hold )|| ~memory_write; assign nOE = ( sub_state == `address_setup ) || ( sub_state == address_hold ) || memory_write; assign ENB = ~nOE;

1. En_read_dec: state is Fetch1 and ModeBit is 00 or state is Execute and ModeBit is either 01 or 10 or 11 or state is Fetch2 and ModeBit is either 11 or 10. 2. En_wrt_dec: state is Fetch1 and sub state is data_hold and ModeBit is 00 or state is Execute and sub state is data_setup and ModeBit is 10 or state is Execute and sub state is data_hold and ModeBit is 01 and zero_flag reg is TRUE and testbit is either 1 or 0. As for example, the behavioral model of decoder (En_read_dec) is return in verilog as below:
assign En_read_dec = ((state == `Fetch1 && ModeBit == 2'b00) || (state == `Execute && (ModeBit == 2'b01 || ModeBit == 2'b10 || ModeBit == 2'b11)) || (state == `Fetch2 && (ModeBit == 2'b11 || ModeBit == 2'b10)));

For the second group of control signals, there are 5 tri-state buffers (TrisALU, TrisPC, TrisRs2, TrisRd and nTrisRd) for datapath unit and only 1 (TrisMem) for memory. The behavioural model for each of these signals is carried out in verilog by using a continuous assignment statement. Following are the conditions of a continuous assignment statement for each of the tristate buffer control signals for System bus assignment: 1. TrisMem: sub state is either data_setup or data_hold and state is either Fetch1 or Fetch2 or Execute with ModeBit is either 10 or 01. 2. TrisALU: sub state is either address_setup or address_hold and state is Execute with ModeBit is either 11 or 10. 3. TrisPC: sub state is either address_setup or address_hold and state is either Fetch1 or Fetch2 or Execute with ModeBit of 01. 4. TrisRs2: sub state is either data_setup or data_hold and on memory_write and ModeBit is 11. 5. TrisRd: sub state is data_setup and state is Execute and ModeBit is 10. 6. nTrisRd: inverse of TrisRd (~TrisRd). As for example, the behavioral model of TrisMem Tri-state buffers is return as follow:
assign TrisMem = ( (sub_state == `data_setup || sub_state == `data_hold ) && (state == `Fetch1 || state == `Fetch2 || ( state == `Execute && (ModeBit == 2'b10 || ModeBit == 2'b01))));

Both read and write operation that coded by using the multiplexors using continuous assignment must satisfied the En_read_dec and En_wrt_dec control signals respectively together with the source1 register (Rs1) and source2 register (Rs2) instruction field for read and destination register (Rd) field for write operation. The control signals for read operation for source1 register (Rs1) are named as: ReadR0_1, ReadR1_1, ReadR2_1, ReadR3_1 and ReadPC_1 and for source2 register (Rs2) is called as: ReadR0_2, ReadR1_2, ReadR2_2, ReadR3_2 and ReadPC_2. In contrast, the control signals for write operation are identify as: WriteR1, WriteR2, WriteR3 and WritePC. For example, the behavioral model of multiplexors of verilog code for read (read from R0_1) and write (write to R1) operation is given below:
assign WriteR1 = (En_wrt_dec && Rd == 3'b001 ? 1 : 0; assign ReadR0_1=( En_read_dec && Rs1 == 3'b000 ?1: 0;

Another control signals for datapath that generated in control unit are Rs2_sel, PC_inc, LoadDR and LoadPC. All of this control signals is behaviourally coded in verilog by using continuous assignment statement. As for example, following is the behavioral model of PC_inc: assign PC_inc = ((sub_state == `address_hold ) && (( state == `Fetch1 ) || ( state == `Fetch2 ) || (state == `Execute && ModeBit == 2'b01 ) )); The control signal for alsu function, called as Function is coded as ALUfunc or the 3-bit of alsu function in instruction field format that define by using a continuous assignment statement of verilog code as follows:
assign Function = ALUfunc;

For behaviourally model the third groups of control signals, which is the datapath unit control signals, 2 decoder for read and write control signals is needed. This decoder is required as a control for read the contents of any one of 5 general-purpose registers in datapath unit (PC, R0, R1, R2 and R3) and for writes any results or computed data into any one of 4 general-purpose register (R1, R2, R3 and PC). A continuous assignment statement of verilog is used to code the two-decoder signal (En_read_dec and En_wrt_dec) and the multiplexors using continuous assignment is used to select which signals for read and write from and to selected registers in datapath unit respectively. Following is the condition for a statement of En_read_dec and En_wrt_dec control signals: 3

The asynchronous reset assign that overrides the synchronous action of state, sub state, IR and zero flag register must be included in this system of control unit as a normal method for the description of an

3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004), 11-13 May, Kuala Lumpur, Malaysia

asynchronous reset within synchronous sequentially system as follow:


always @(nReset) if (!nReset) begin assign state = 0; assign sub_state = 0; assign IR = 0; assign zero_flag_reg = 0; end else begin deassign state; deassign sub_state; deassign IR; deassign zero_flag_reg; end

assign Rs1 = ( ReadR2_1 ) ? R2 : 16bz;

The same structure applied for all others register (R0, R1, R3, and PC) both for Rs1 and Rs2. In second tasks, the Procedural Block and if statement of verilog is used. For write operation, it can only happen in the rising or positive edge of clock, thus the procedural block used here. For write to destination register (Rd), all the generated control signals from control unit specifically for this operation (LoadPC, WriteR1, WriteR2, WriteR3 and LoadDR) is used as a condition inside the if statement. The complete verilog code of this task is return as below:
always @(posedge Clock) begin if (LoadPC) PC = Mux1_out; if (WriteR1) R1 = Rd; if (WriteR2) R2 = Rd; if (WriteR3) R3 = Rd; if (LoadDR) DR = Sysbus; end

4.3 Datapath Module The following are the main task of datapath unit that has to be behaviourally model by verilog code to suit the definition of design processor as can be view in figure 2: 1. To model the read operation for both Rs1 (source1 register) and Rs2 (source2 register) from any one of 5 available register (R0, R1, R2, R3, PC). 2. To model the write operation for Rd (destination register) from any one of 4 available register (R1, R2, R3, PC). 3. To model the tri-state buffer (TrisALU, TrisPC, TrisRs2, TrisRd) for system bus.
16
PC
ReadPC_2

PC+1 1

PC_inc

Rs
ReadPC_1

Rs

0 Mux1_out

WritePC
WriteR1

R0 R1 R2 R3 DR nTris

ReadR0_2

ReadR0_1 ReadR1_1

Rd
16 LoadDR

ReadR1_2

16
16 Rs2_sel 1 0
Mux2_out

WriteR2

ReadR2_2 ReadR2_1 ReadR3_1 ReadR3 2

Zero ALU result Function16 TrisALU TrisPC

WriteR3

16

16

TrisRd

TrisRs2

Note that, LoadPC and LoadDR control signals is not directly use for destination register, (Rd). LoadDR is for controlling a write operation from system bus to Data Register (DR), while LoadPC is use for control a write operation from the multiplexor1 output (Mux1_out) to Program Counter, PC. LoadPC control signal is necessary because of the architecture of the design processor that allow any value to be write into PC, thus a multiplexor is needed for selecting between 2 of the PC functions: automatically increased or accept value that write to it as a destination register (Rd). From figure 2, there are 2 multiplexors, one is for selecting PC function, called as Mux1_out and the other is for selecting between Rs2 (source2 register) and DR (Data Register) data value, called as Mux2_out. Mux2_out multiplexor used Rs2_sel control signal generated from control unit for accomplish the task of selecting Rs2 or DR, while Mux1_out used PC_inc control signal. This Mux2_out is necessary for distinguish between an execution of R+R instruction (Rs2 is selected) or R+I instruction (DR is selected). Both multiplexors have been coded in verilog by using continuous assignment for multiplexor as below:
assign Mux1_out = (PC_inc) ? PC + 1:Rd; assign Mux2_out = (Rs2_sel) ? Rs2 : DR;

Fig. 2. Processor Datapath Module Architecture For the first task, the multiplexors using continuous assignment structure of tri-state is used. All of the previously defined control signals for datapath from control unit specifically for read operation have been used in this structure. As can be seen from figure2, each register for read is control by tri-state, thus if the register (Rs1 or Rs2) is not selected it will be in high impedance state. For example, the continuous assignment for Rs1 that will selects R2 if the control signal ReadR2_1 is 1 and if ReadR2_1 is 0 Rs1 will be in high impedance state is coded in verilog as follow: 4

For the third tasks, the same structure of multiplexor using continuous assignment for tri-state that previously used for behaviourally model the first task is used. The following are the verilog code for each of the tri-state buffer in datapath unit:
assign Sysbus = (TrisALU) ? result:16bz; assign Sysbus = (TrisPC) ? PC:16bz; assign Sysbus = (TrisRs2) ? Rs2:16bz; assign Sysbus = (TrisRd) ? Rd:16bz; assign Rd = (nTrisRd) ? result:16bz;

3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004), 11-13 May, Kuala Lumpur, Malaysia

Lastly, the asynchronous reset assign that overrides the synchronous action of all register in datapath unit must be included in this system as a normal method for the description of an asynchronous reset within synchronous sequentially system. 4.3 ALSU Module The final module in the Microprocessor is ALSU (Arithmetic & Logic Shift Unit) where basically it will perform seven basic arithmetic & logic shift operation. On top of that, there is one Zero flag register included. ncluded. The Zero flag was modeled using assign statement as below:
assign Zero = (result =0);

5.2 Register + Register operation Rd (R3) Rs1 (R1) ADDr Rs2 (R2) This instruction is used to perform add operation within Registers. In this example the instruction involves arithmetic ADD operation between Register 1 and Register 2 then output will be stored in the Register 3. The immediate value 259(103hex) in the Register 1 will be added with immediate value in the Register 2: 93(5Dhex) that have been stored initially then result: 352(160hex) will be stored in the Register 3. Details of the process shown in the figure 4 below:

The arithmetic & shift logic operation been coded using procedural assignment with sequential logic case statement as below ;
always @(input1 or input2 or Function) case (Function) `ADD : result = input1 + input2; `SUB : result = input1 - input2; `AND : result = input1 & input2; `OR : result = input1 | input2; `XOR : result = input1 ^ input2; `NOT : result = ~input1; `SRA : result = input1 >> 1; default : result = input1; endcase

Fig. 4. Timing Diagram of Register + Register operation 5.3 Load operation Rd (R2)mem[Rs1 (R0)+ SWITCHES] This instruction is used to perform load operation. In this example the instruction involves load from `SWITCHES to Register 2. The content of memory locations addresses at [`SWITCHES + Register 0(always zero)] will be loaded into destination register (Register 2). In the processor system the memory location for SWITCHES is mapped at address 512 and the contents of SWITCHES is unsigned value 7. Details of the process shown in the figure 5 below:

5. Processor Functionality Verification of processor functionalities has been done for the basic operation which includes arithmetic, logic and shift operation. The processor architecture offers 36 types of instruction available to be used. At the simulation level the functionalities of the processor been verified through timing diagram in every module. As for example only 3 types of instructions showed here i.e: Register + Immediate, Register + Register and Load Instructions. 5.1 Register + Immediate Value operation
Rd Rs1 (R1) Addi Imm 16 (259 / 103hex);

This instruction perform add operation within Register1 (R1) and Immediate value (259) where the immediate value 259 will be stored into Register 1. Details of the process as shown in the figure 3 below:

Fig. 5. Timing Diagram of Load operation 6. Conclusions A new and simple 16-bit RISC processor architecture has successfully been design based on HDL methodology and also a simulation with verification of processor functionalities has been effectively done using Verilog-HDL software on Synopsys Compiler. This simple processor model can be used as a basic platform in designing any specific-application in specific field. The advantage of using HDL methodology i.e Verilog-HDL software for designing any system such that it will improve the design 5

Fig. 3. Timing Diagram of Register + Immediate operation

3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004), 11-13 May, Kuala Lumpur, Malaysia

process by minimizing error, time and cost and also the design system model are fully reusable as the code can be changes accordingly for any specific need of application. 7. References [1] D.D Gajski, Principles of Digital Design, Prentice Hall, 1997 [2] M. Zwolinski, Digital System Design With VHDL, Prentice Hall, 2000 [3] D. A. Patterson & J.L. Hennesy, Computer Organization and Design - The Hardware/ Software Interface, Morgan Kaufmann, 1999 [4] M. Morris Mano, Digital Logic and Computer Design, Prentice Hall, 1997 [5] I. McNally, EZ431/631 VLSI Group Design Project-Microprocessor Specification Document, University of Southampton, 2000 [6] Dally, W-J. Chang, A. The Role of Custom Design In ASIC Chips, Proceedings of the 37th conference on design automation, ACM Press, pg 643-647, 2000. [7] Flynn, M-J. Winner, R-I. ASIC microprocessor, Proceedings of the 22nd annual international workshop on microprogramming and microarchitecture, ACM Press, pg 237-243, 1989.

[8] Samir Palnitkar, Verilog HDL A Guide to Digital Design and Synthesis, Printice Hall, 1995 [9] Lioupis, D. Papagiannis, A. Psihogiou, D. A Systematic approach to software peripherals for embedded system, Proceedings of the ninth International symposium on hardware/software codesign, ACM Press, pg 14-145, 2001. [10] Diaz, J-C. Plaza, P. Merayo, L-A. Scarfone, P. Zamboni, M, Design and validation with HDL of a complex input/output processor for an ATM switch : the CMC, Verilog HDL conference, Proceedings, pg 67-71, 1995. [11] Mahdi, A-E. Grout, I-A, PLL based ASIC system for DSP real-time analogue interface, www.ece.ul.ie/hompage/ian_grout/publications.ht ml ,2002. [12] Arnold, M-G. Bailey, T-A. Cowles, J-R. Cupal, JJ. Wallace, A-W., A purely data structure for accurate high level timing simulation of synchronous designs, Verilog HDL Conference, pg 101-107, 1994. [13] Hebert, O. Kraljic, I-C. Savaria, Y., A Method to Derive Application-Specific Embedded Processing Cores, International Conference on Hardware Software Codesign, 2000, San Diego, California, United States, ACM Press, pg 88-92,

You might also like