You are on page 1of 18

Defence Against the Dark ASM

Harun iljak

Preface
This short text is meant to serve as a quick reference for Nios II assembly
language. It does not attempt to replace the official Nios II processor reference
guide, as it is the most authoritative reference, providing comprehensive information
and all necessary details. This booklet on the other hand gives the basic information
needed for drafting relatively simple code for Nios II, troubleshooting assembly
language and hex files and getting a brief overview of Nios II architecture and
assembly syntax from the practical perspective.
Technical details on how to use Altera software for both hardware implementation of Nios II and its programming are not provided, as they are not within the
scope of this text.
Trademarks for Harry Potter references belong to J.K. Rowling, her publishers
and Warner Bros. Nios II is a registered trademark of the Altera Corporation. This
text is not an official publication and no copyright infringement is intended.

M`ar`gin`al

n`otes

`a`d`d`edffl by th`e H`alfBloodffl Prin`ce `ar`e


usfiu`ally `comm`ents
`about

`excepti`ons

`an`dffl `oth`erffl un`exp`ectedffl bfleh`avi`ourffl.


If youffl sfitill thinkffl
`copyri`ght

is

inffl-

frin`gedffl, `confun`d`o.
LOAD
MOV
OR
TRIM
DEL

Contents
Preface
Chapter 1.

3
Marauders Map of Nios II

Chapter 2. ASM Spells with examples for muggles


2.1. Types of instructions
2.2. Arithmetic & logic instructions
2.3. Comparison instructions
2.4. Branching instructions
2.5. Subroutines and exception handling instructions
2.6. Miscellaneous instructions
2.7. Moving and data manipulation instructions
2.8. Assembler macros

9
9
9
11
12
12
13
14
15

Chapter 3. ASM spells for wizards


3.1. Instruction fields
3.2. Machine code example

16
16
18

CHAPTER 1

Marauders Map of Nios II


Nios-II based media computer used in this course is built on Alteras DE2-115
development board. Its schematic is given in Figure 1.
Memory: The computer has 128 MB of SDRAM organised as 32M x 32 bits.
SRAM on the board is organised as 1M x 16 bits, but has a 32-bit interface. Finally,
8 Kbyte of memory on the FPGA chip itself is used as a character buffer for the
video-out port and is organised as 8K x 8 bits.
Parallel ports: Parallel ports have up to four 32-bit registers: writable or
readable Data register, optional Direction register for input-output registers, as
well as Mask and Edge registers used for interrupts. 18 red and 9 green LEDs have
only Data register as they are output ports. Similarly, the 18 slider switches have
only Data register as they are input ports. The four pushbutton parallel port can
be used for interrupts (except for KEY0, which is the reset button for the Nios-II).
Figure 2 shows the organisation of registers for pushbuttons. For the access of Data
register in case of general purpose pins from JP5, take note that some of them are
not used as inputs/outputs but as supply pins (see Figure 3).
Communication: JTAG port is communicating between the DE2-115 media
computer and the host computer, for programming and monitoring. It also includes

Figure 1.0.1. The DE2-115 media computer


5

Upp`erffl bits inffl


wor`dffl `a`ccesfis to
th`esfi`e D`ataffl r`egisters `ar`e i`gn`or`edffl.

1. MARAUDERS MAP OF NIOS II

Figure 1.0.2. Pushbutton parallel port

Figure 1.0.3. General purpose pins and their Data register numeration

Figure 1.0.4. UART registers


a UART , and the registers for it are shown in Figure 4. The serial port of the
media computer also implements an UART, now connected to RS-232 chip. Register
organisation is equivalent to the JTAG one. JTAG port can also be used to query
the System ID module and confirm that the media computer is properly configured.
Timer: The media computer also features a 50-Mhz clock based timer able to
produce interrupts with a register structure shown in Figure 5.
Media components: LCD display is controlled with two 8-bit registers, one
for the instruction related to placing the cursor and the other one for the character
to be placed (Figure 6). For the video output, each pixel can be coloured with an
RGB value. For the audio port, four related registers are shown in Figure 7. For
PS/2 port (for keyboard/mouse), see Figure 8.

Youffl

`canffl

usfi`e

UART to sfi`en`dffl
`anffl `owl to th`e
h`om`e
`computerffl
`an`dffl vi`ce vflersfi`affl

UART `canffl tri`g`gerffl `anffl interrupt,


`d`onfflt for`get th`at.

1. MARAUDERS MAP OF NIOS II

Figure 1.0.5. Registers of the interval timer

Figure 1.0.6. LCD display registers

Figure 1.0.7. Audio port registers

Figure 1.0.8. PS/2 registers


In Table 1, base and end addresses of I/O peripherals are given for reference.
Between these two addresses, relevant registers are positioned and may be accessed
for peripheral control. For more details on the use of some more complicated
peripherals, check the Altera University Program Media Computer Manual.

1. MARAUDERS MAP OF NIOS II

Base address End address


I/O peripheral
0x00000000 0x07FFFFFF
SDRAM
0x08000000 0x81FFFFFF
SRAM
0x10003020
0x1000302F
Pixel buffer control
0x09000000
0x09001FFF
On-chip memory character buffer
0x10003030
0x10003037
Character buffer control
0x10000000
0x1000000F
Red LED parallel port
0x10000010
0x1000001F
Green LED parallel port
0x10000020
0x1000002F 7-segment HEX3-HEX0 displays parallel port
0x10000030
0x1000003F 7-segment HEX7-HEX4 displays parallel port
0x10000040
0x1000004F
Slider switch parallel port
0x10000050
0x1000005F
Pushbutton parallel port
0x10000060
0x1000006F
JP5 expansion parallel port
0x10000100
0x10000107
PS/2 port
0x10000108
0x1000010F
PS/2 port dual
0x10001000
0x10001007
JTAG UART port
0x10001010
0x10001017
Serial port
0x10002000
0x1000201F
Interval timer
0x10002020
0x10002027
System ID
0x10003000
0x1000301F
Audio/video configuration
0x10003040
0x1000304F
Audio port
0x10003050
0x10003051
LCD display port
Table 1. Memory map of DE2-115 Media Computer

CHAPTER 2

ASM Spells with examples for muggles


2.1. Types of instructions
There are three basic types of instructions in Nios II assembly language.
R-type instructions are executed with values from registers (at most three
of them, usually denoted in this text as RA, RB, RC) and possibly a 5-bit
constant (immediate constant, usually denoted as IMM5).
I-type instructions are executed on at most two registers (usually denoted
in this text as RA, RB) and a 16-bit constant (immediate constant, usually
denoted as IMM16).
J-type instructions are executed on a 26-bit constant (immediate constant,
usually denoted as IMM26) and perform a jump to the address defined by
IMM26.
Note that there is a certain number of pseudo-instructions in the Nios II instruction
set: they are translated into real instructions before making the machine code, i.e.
they exist only before compilation. Also bear in mind that assembly macros in Nios
II assembly are used to extract high and low bits from constants efficiently, and
they are not instructions. Which instructions are pseudo-instructions and which are
real doesnt matter for the muggles. For wizards investigating the machine code, it
is important, as the pseudo-instructions have no machine code of their own.
2.2. Arithmetic & logic instructions
2.2.1. Basic Arithmetics. Nios II assembly provides instructions for addition, subtraction, division and multiplication, although not all Nios II processors
may have the latter two implemented.
In case you wish to add numbers in two registers and place the result into a
third register, use an R-type instruction add RC, RB, RA. In case you are adding
an immediate constant (16-bit) to contents of a register, use an I-type instruction
addi RB, RA, IMM16. Note that the constant is sign-extended to 32 bits, so if
the constant was 0x8000, it would be padded with ones to 0xFFFF8000, and if it
was 0x7000 it would be padded with zeros to 0x00007000. This is arithmetical
padding.
In the same fashion, subtraction is done by using the R-type instruction sub
RC, RA, RB and I-type subi RB, RA, IMM16. However, subi is a pseudo-instruction,
implemented as addi RB, RA, -IMM16.
Division, if implemented in the particular Nios II processor, is only possible with
numbers already in registers and it results in the integer part of the quotient. Two
possible R-type instructions are div RC, RA, RB which assumes the two numbers
in input registers are signed, and divu RC, RA, RB where numbers being divided
are taken as unsigned.
If the processor has multiplication implemented, there are five possible ways to
perform it. If an immediate 16-bit constant is multiplied with contents of a register,
use an I-type instruction muli Rresult, Rone, IMM16. Result is the 32 low-order
bits. Analogously, mul RC, RA, RB results in 32 low-order bits of a product of
9

If

`divisfii`onffl

`an`dffl/`orffl
multifflpli`cati`onffl `ar`e n`ot
sfiupp`ortedffl
th`e
`anffl

by

pr`ocesfisfi`orffl,
`excepti`onffl

Unimplem`entedffl
insfitru`cti`onffl
is
`gen`er`atedffl.

2.2. ARITHMETIC & LOGIC INSTRUCTIONS

10

numbers in two registers. If you need the 32 high-order bits and you are multiplying
two signed integers in registers, use mulxss RC, RA, RB. If the integers are to be
taken as unsigned, use mulxss RC, RA, RB. Finally, the instruction mulxsu RC,
RA, RB treats the contents of Rone as signed and Rtwo as unsigned integer.
As an example, we will calculate the division remainder for numbers in two
registers.
div
mul
sub

r6 , r4 , r 5
r7 , r6 , r 5
r8 , r4 , r 7

Note that in this case, we know that multiplication is not going to produce
a larger number than the number stored in r4, hence it is known that it only
comprises of 32 low-order bits.
2.2.2. Basic bitwise logical operations. Logical bitwise operations directly
implemented in Nios II assembly are AND, OR, XOR and partially NOR.
Logical bitwise conjunction for contents of two registers is performed by R-type
instruction and RC, RA, RB. When a 16-bit constant is used, it can be conjuncted
with the 16 low-order bits of a register (i.e. padding it with 16 zeros to the left before
conjunction) using I-type instruction andi RB, RA, IMM16. If the conjunction of
the constant should be performed with the 16 high-order bits of a register (i.e.
padding it with 16 zeros to the right first), instruction andhi RB, RA, IMM16 is
used. Note that there is no sign extension, padding is always done with zeros. This
is called logical padding.
Logical bitwise disjunction is performed in the same manner: for numbers
stored in two registers, R-type instruction or RC, RB, RA is used. If an immediate
16-bit constant is used with the 16 low-order bits of a register, the instruction is
ori RB, RA, IMM16, while disjunction with the 16 high-order bits is performed
with orhi RB, RA, IMM16.
In the same manner, xor RC, RA, RB is used for exclusive disjunction of two
registers. If an immediate 16-bit constant is used with the 16 low-order bits of a
register, the instruction is xori RB, RA, IMM16, while exclusive disjunction with
the 16 high-order bits is performed with xorhi RB, RA, IMM16.
Bitwise logical NOR operation only exists in R-type instruction form for two
registers as nor RC, RA, RB.
As an example, we will change the first and the last bit in a register.
ori
orhi

r4 , r4 , 0 x0001
r4 , r4 , 0 x8000

2.2.3. Shifting. Contents of a register can be rotated (i.e. circularly shifted)


either to the left or to the right. Rotation to the left can be performed by means
of an R-type instruction as rol RC, RA, RB where the first register will contain
the contents of the second register shifted to the left n positions, where n is the
number represented with 5 least significant bits of the third register (other 27
bits are ignored). Since this is rotation, the bits leaving the register on the most
significant bit side reappear on the least significant side. If the number of rotation
positions should be given as an immediate value, another R-type instruction is
used, roli RC, RA, IMM5. Note that this is an R-instruction as it takes a 5-bit
immediate value, and not an 16-bit one as the I-instructions.
If you need a right circular shift (rotation) it can be done as ror RC, RA, RB
where again the first register takes the value of the second register after rotation to

An`oth`erffl

`excep
ffl-

ti`onffl,
Divisfii`onffl
`err`orffl `d`etects `difflvi`d`e insfitru`cti`ons
th`at pr`odu`ce `affl
`qfiu`oti`ent th`at `canfflt
bfle

r`epr`esfi`entedffl:

`an`dffl

`divisfii`onffl

`divisfii`onffl by zer`o
th`e
tivfle
-1.

lar`gesfit

`of

n`egaffl-

numbflerffl

by

2.3. COMPARISON INSTRUCTIONS

11

the right n positions, where n is the number represented with 5 least significant bits
of the third register. However, in the default implementation of Nios II processor,
there is no rori, but its behaviour can be achieved by using roli with 32 n as
the immediate value.
If the shift is supposed to be non-circular, then it is assumed that free places in
a register whose contents are being shifted are filled with zeros. Left logical shift is
done either with R-type instruction sll RC, RA, RB or another R-type instruction
slli RC, RA, IMM5. The logic is the same as in the case of rotation.
Logical shift to the right is similarly done with srl RC, RA, RB or with srli
RC, RA, IMM5.
Another type of shift to the right is the arithmetic shift done with sra RC, RA,
RB or with srai RC, RA, IMM5. Unlike logic shift, filling the newly freed places in
the register is in this case done with duplicating the sign bit, just like in arithmetic
operations covered in the beginning of this chapter. Notice that the arithmetic shift
is not implemented for left shift, as in the case of left shift, the empty places appear
on the least significant bits, making the padding with sign bit nonsensical.
As an example, we will introduce multiplication by 4, signed and unsigned
division by 4 respectively using shifts.
slli
srai
srli

r4 , r4 , 0 x02
r4 , r4 , 0 x02
r4 , r4 , 0 x02

2.3. Comparison instructions


Comparison instructions place a boolean value (zero or one) in a register based
on comparison ARB where A is a register, and B can be a register or an immediate
value.
An R-type instruction cmpeq RC, RA, RB compares RA and RB and if they
are equal, places 1 in RC, otherwise 0. The I-type equivalent cmpeqi RB, RA,
IMM16 does the same but with an immediate value. Inversely, cmpne RC, RA, RB
compares RA and RB and if they are not equal, places 1 in RC, otherwise 0. The
I-type equivalent cmpnei RB, RA, IMM16 does the same but with an immediate
value.
Similarly, cmpge RC, RA, RB compares RA and RB and if RA RB, places 1 in
RC, otherwise 0, while the I-type equivalent cmpgei RB, RA, IMM16 does the same
but with an immediate value. If RA and RB are supposed to be considered unsigned
values, this comparison is done with cmpgeu RC, RA, RB for registers and cmpgeui
RB, RA, IMM16 for a register and an immediate value.
Similarly, cmplt RC, RA, RB compares RA and RB and if RA < RB, places 1 in
RC, otherwise 0, while the I-type equivalent cmplti RB, RA, IMM16 does the same
but with an immediate value. If RA and RB are supposed to be considered unsigned
values, this comparison is done with cmpltu RC, RA, RB for registers and cmpltui
RB, RA, IMM16 for a register and an immediate value.
The rest of the comparison instructions are pseudo-instructions, implemented
using those above.
cmpgt RC, RA, RB places 1 in RC if RA>RB, 0 otherwise and it is implemented as
cmplt with swapped parameters. Its immediate equivalent cmpgti RB, RA, IMM16
is implemented as cmplti with swapped parameters. The unsigned equivalents
cmpgtu RC, RA, RB and cmpgtui RB, RA, IMM16 are implemented as cmpltu and
cmpltui with swapped parameters.

2.5. SUBROUTINES AND EXCEPTION HANDLING INSTRUCTIONS

12

cmple RC, RA, RB places 1 in RC if RA RB, 0 otherwise and it is implemented as cmpge with swapped parameters. Its immediate equivalent cmplei RB,
RA, IMM16 is implemented as cmpgei with swapped parameters. The unsigned
equivalents cmpleu RC, RA, RB and cmpleui RB, RA, IMM16 are implemented as
cmpgeu and cmpgeui with swapped parameters.
2.4. Branching instructions
Nios-II assembly offers several branching instructions. An I-type instruction
beq RA, RB, label moves the execution of the program to the PC+4+IMM16 denoted by the label if contents of the two registers are the same. Otherwise, it
continues with the next instruction. The I-type instruction bge RA, RB, label
does the same under the condition RARB for signed values in registers. An I-type
instruction bgeu RA, RB, label does the same with unsigned values.
Signed comparison RA>RB branching is performed with an I type instruction
blt RA, RB, label while the unsigned version of it is bltu RA, RB, label.
Signed comparison RA>RB branching is performed with a pseudo-instruction
bgt RA, RB, label which is interpreted as blt with swapped parameters. Unsigned version of it, bgtu RA, RB, label is interpreted as bltu with swapped
parameters.
Signed comparison RARB branching is performed with a pseudo-instruction
ble RA, RB, label which is interpreted as bge with swapped parameters. Unsigned version of it, bleu RA, RB, label is interpreted as bgeu with swapped
parameters.
Branching if the two registers are not equal is done with an I type instruction
bne RA, RB, label.
An unconditional branching (a GOTO) is performed by an I type instruction
br label. If the address where the program should continue is not a constant (an
immediate value) but a calculated value in a register, then an R-type instruction
jmp RA is used.
Finally, if the full address in the 256 MB range of PC has to be provided, J-type
instruction jmpi label where label is an IMM26 is used. The jump is performed to
PC[31..28]:IMM26x4.

Sin`ce th`e `d`esfiir`edffl


`a`d`dr`esfis m`ay h`apfflp`enffl

to

bfle

n`ot

`divisfiible by 4 (iffl.`e.

n`ot to `en`dffl inffl


00 `as `anffl `a`d`dr`esfis
is sfiupp`osfi`edffl to,
`all br`an`chin`g infflsfitru`cti`ons

`except

jmpiffl `canffl thr`ow


"misfi`ali`gn`edffl

`d`es-

tin`ati`onffl `a`d`dr`esfis"
`excepti`onffl.

2.5. Subroutines and exception handling instructions


Subroutines in Nios II assembly language are called with a J-type instruction
call label. Here, the label denotes the address in the 256 MB space determined
by the highest four bits of the PC, and hence defines the next 26 bits of the PC (last
two bits in PC are zero for alignment). So, the label (address) actually represents
an IMM26 value, as expected in a J-type command. The instruction performs the
following:
r a < PC+4
PC < PC [ 3 1 . . 2 8 ] : IMM26: 0 0
which essentially saves the return address (the address of the instruction right after
the call instruction) for the happy return from subroutine and moves the PC to the
place where label forwarded in call points at.
It is also possible to call a subroutine in register, i.e. to point to an instruction
by giving its address (the whole address, i.e. the whole new content of PC) in a
register with the R-type instruction callr Raddress.
Calls of subroutines access the PC, or to be more precise, get the address of
the next instruction and place it in the ra register. The only way to access PC

Sin`ce th`e `content


`of th`e r`egisfiterffl
m`ay

h`app`enffl

to

bfle n`ot `divisfiible by


4 (iffl.`e. n`ot to `en`dffl
inffl 00 `as `anffl `a`dffl`dr`esfis is sfiupp`osfi`edffl

to, this insfitru`cti`onffl `canffl thr`ow


"misfi`ali`gn`edffl
`d`estin`ati`onffl `a`d`dr`esfis"
`excepti`onffl.

2.6. MISCELLANEOUS INSTRUCTIONS

13

directly (actually, PC+4 again) is an R-type instruction nextpc RC. The content of
PC incremented by four is saved in the specified register.
Return from a subroutine simply returns the content of ra register to PC and
it is performed by an R-type instruction without parameters ret.
Return from an exception is done with an R-type instruction eret. The content
of ea register moves to PC and content of estatus moves to status.
An R-type instruction trap is used either as trap or trap IMM5 to save the
address of the next instruction in ea register, contents of status to estatus,
disable interrupts and start the exception handler. IMM5 is used only for debugging
purposes.
Registers like status, estatus etc. are called control registers and can be
read and written in using an R-type instruction for reading, rdctl RC, N which
rcopies the contents of Nth control register to register RC, and a writing instruction
wrctl N, RA which writes the contents of register RA into Nth control register.
An I-type instruction rdprs RB, RA, IMM16 reads from register RA in the previous register set, adds sign-extended value IMM16 to its value and places it in RB.
This only functions if the version of Nios II used allows shadow register sets.
Writing in the previous register set is done via R-type instruction wrprs RC,
RA. It copies the value of register RA in the current register set to register RC in
previous register set. Note that to write to an arbitrary register set, software can
insert the desired register set number in status.PRS prior to executing wrprs .
2.6. Miscellaneous instructions
The most powerful magical unforgivable curse of Nios II language is an R-type
instruction named custom. It enables introduction of 256 different custom user
designed instructions to Nios II assembly. You design a custom hardware structure
using hardware description adjacent to the Nios II ALU which can use two registers
as inputs and one as an output (but it doesnt have to, it can use its own custom
registers). The syntax is custom N, xresult, xone, xtwo where x can stand
either for R, general purpose Nios II register, or C, custom register. The part about
machine code of custom instructions will provide more explanations.
Most assembly languages include an instruction which does nothing and it is
usually called nop. In Nios II assembly language nop is implemented as a pseudoinstruction nop. The instruction behind it is add r0, r0, r0. It is used to lose
one instruction cycle for timing purposes.
Debuggers place debugging breaking points using special R-type instructions.
Such instructions are exclusively used by debuggers and hence they should not appear in exception handling routines, user programs and operating systems. Syntax
of the breakpoint placement instruction is either break or break IMM5, where the 5bit immediate constant can be used by the debugger as the descriptor of breakpoint
type. The effect of breakpoint is
b s t a t u s < s t a t u s
PIE < 0
U < 0
ba < PC + 4
PC < break h a n d l e r a d d r e s s
On the other hand, bret instruction returns from the break by performing the
following:
s t a t u s < b s t a t u s
PC < ba

Sin`ce
th`e
`conffltent `of th`e r`affl `orffl
`eaffl

r`egisfiterffl

m`ay

h`app`enffl to bfle n`ot


`divisfiible by 4 (iffl.`e.

n`ot to `en`dffl inffl 00


`as `anffl `a`d`dr`esfis is
sfiupp`osfi`edffl to, th`esfi`e
insfitru`cti`ons `canffl
thr`ow "misfi`ali`gn`edffl
`d`esfitin`ati`onffl
`a`dffl`dr`esfis" `excepti`onffl.

Oth`erwisfi`e,
it
thr`ows
th`e
illegal
`op`er`ati`onffl
`excepti`onffl.

M`anipulati`onffl
`of

`contr`ol

r`eg-

tr`aps

`an`dffl

isfiters,
sfi`ets,
`er`et
`affl

`only

r`egisfiterffl

`canffl thr`ow
sfiup`ervisfi`orfflinsfitru`cti`onffl

`excepti`onffl.
Off

to

Azk`abanffl

withffl youffl... `orffl Inffltel.

It thr`ows `affl "br`eak"


`excepti`onffl
wh`enffl
`executedffl.

It is p`osfisfiible to
h`avfle `affl misfi`ali`gn`edffl
`a`d`dr`esfis
r`egisfiterffl.

inffl

baffl
Th`enffl

th`e br`et insfitru`cti`onffl `canffl lea`dffl to


"misfi`ali`gn`edffl
`d`es-

tin`ati`onffl `a`d`dr`esfis"
`excepti`onffl.
If it
is
`a`ccesfisfi`edffl
inffl
usfi`erffl m`od`e, `an`dffl
n`ot inffl sfiup`ervisfi`orffl

m`od`e, it thr`ows
th`e
"sfiup`ervisfi`orffl`only

insfitru`cti`on"

`excepti`onffl.

2.7. MOVING AND DATA MANIPULATION INSTRUCTIONS

14

2.7. Moving and data manipulation instructions


Cache manipulation in Nios II assembly is straightforward. Initialisation of
cache line is done by using an I-type instruction initd IMM16(RA) which initialises
the data cache line associated with address RA+IMM16 regardless of whether the
address data is currently cached. On the other hand, I-type instruction initda
IMM16(RA) initialises the cache line only when address data is currently cached.
The R-type instruction initi RA initialises the instruction cache line associated
with address RA.
Similarly, I-type instruction flushd IMM16(RA) flushes the data cache line associated with address RA+IMM16 regardless of whether the address data is currently
cached. On the other hand, I-type instruction flushda IMM16(RA) flushes the cache
line only when address data is currently cached. The R-type instruction flushi
RA flushes the instruction cache line associated with address RA. Finally, an R-type
instruction flushp flushes the processor pipeline of any prefetched instructions.
Loading data from memory or I/O peripherals is performed using a set of
dual commands which can be explained on the example of a basic instruction for
loading a byte from memory or I/O peripheral, ldb RB, IMM16(RA) or ldbio RB,
IMM16(RA). Both of these instructions load a byte in RB from the address specified
in RA, offset for the value of IMM16, but the former one may return the value from
cache if cache is implemented. That is why the latter one is preferred for input/output devices. If there is no cache implemented, they perform the same operation. If
the byte loaded should be unsigned, (zero extended in the register), use the I-type
instruction ldbu RB, IMM16(RA) or ldbuio RB, IMM16(RA). Loading a half-word
(16 bits) from memory or I/O peripheral is done by using an I-type instruction
ldh RB, IMM16(RA) or ldhio RB, IMM16(RA), and if the half-word should be unsigned (zero-padded when loaded in the register), use I-type instruction ldhu RB,
IMM16(RA) or ldhuio RB, IMM16(RA). Finally, if the whole word should be loaded,
I-type instruction ldw RB, IMM16(RA) or ldwio RB, IMM16(RA) is used.
Storing data to memory or I/O peripherals is done by similar instructions: stb
RB, IMM16(RA) or stbio RB, IMM16(RA) stores a byte from RB to the address
specified in RA, offset for the value of IMM16. However, the former one can be
delayed by using cache, so the latter one is preferred for I/O peripherals. If a halfword should be written, sth RB, IMM16(RA) or sthio RB, IMM16(RA) is used,
while for whole words, stw RB, IMM16(RA) or stwio RB, IMM16(RA) is used.
Moving data from register to register and moving immediate values to registers is performed by using pseudo-instructions. Moving from register to register is
done by mov RC, RA which is actually add RC, RA, r0. Moving signed immediate
to a register is done by movi RB, IMM16 which is actually addi RB, r0, IMM16.
Moving an unsigned immediate to a register is done by movui RB, IMM16 which
is actually ori RB, r0, IMM16. Moving an immediate to a high half-word is done
by movhi RB, IMM16 which is implemented as orhi RB, r0, IMM16. Moving an
immediate address into word is done by movia RB, IMM32, which is in turn implemented as orhi RB, r0, %hiadj(IMM32) addi RB, RB, %lo(IMM32) (see the
next section about assembler macros for reference).
If you are writing a whole word (32-bit constant) to a register, you can do it in
two steps, as:
movhi
ori

RB, %h i ( v a l u e )
RB, RB, %l o ( v a l u e )

or
movhi

RB, %h i a d j ( v a l u e )

It `canffl thr`ow `affl


numbflerffl `of `excepffl-

ti`ons: sfiup`ervisfi`orffl`only `d`ataffl `a`ccesfis,


misfi`ali`gn`edffl

`d`ataffl

`a`d`dr`esfis,
TLB
p`ermisfisfii`onffl
viffl`olati`onffl,

fasfit

`orffl

`d`ouble TLB misfis


`orffl MPU r`egi`onffl
vi`olati`onffl.

2.8. ASSEMBLER MACROS

addi

15

RB, RB, %l o ( v a l u e )

That is actually movia pseudo-instruction.


2.8. Assembler macros
The following four assembler macros are implemented for convenience:
%hiadj(expression) Extract the upper 16 bits of expression and add
one if the 15th bit is set. Useful to obtain zeros from sign-padded upper
16 bits in case expression is a signed 16-bit constant.
%hi(expression) Extract the upper 16 bits of expression. If expression
was a signed 16-bit constant, the upper 16 bits might be padded with zeros
or ones.
%lo(expression) Extract the lower 16 bits of expression.
%gprel(expression) Subtract the value of the symbol _gp from expression
(global pointer). The intention of the %gprel relocation is to have a fast
small area of memory which only takes a 16-bit immediate to access.

CHAPTER 3

ASM spells for wizards


3.1. Instruction fields
The following table lists machine code for all instructions. Pseudo-instructions
have no machine code, as they are translated into real instructions before conversion
to machine code. A, B and C in the table denote register arguments, IMM26,
IMM16 and IMM5 denote immediate constants, N is the number of control register
or the custom instruction, reada, readb and readc are bits for determining use of
registers A, B and C in custom instructions. Number in parentheses denotes the
number of bits used.
Instruction
add
addi
and
andhi
andi
beq
bge
bgeu
blt
bltu
bne
br
break
bret
call
callr
cmpeq
cmpeqi
cmpge
cmpgei
cmpgeu
cmpgeui
cmplt
cmplti
cmpltu
cmpltui
cmpne
cmpnei
custom
div
divu

Instruction fields
A(5) B(5) C(5) 0x31 (6) 0x0 (5) 0x3A (6)
A(5) B(5) IMM16(16) 0x04 (6)
A(5) B(5) C(5) 0x0E (6) 0x0 (5) 0x3A (6)
A(5) B(5) IMM16(16) 0x2C (6)
A(5) B(5) IMM16(16) 0x0C (6)
A(5) B(5) IMM16(16) 0x26 (6)
A(5) B(5) IMM16(16) 0x0E (6)
A(5) B(5) IMM16(16) 0x2E (6)
A(5) B(5) IMM16(16) 0x16 (6)
A(5) B(5) IMM16(16) 0x36 (6)
A(5) B(5) IMM16(16) 0x1E (6)
0x0 (5) 0x0 (5) IMM16(16) 0x06 (6)
0x0 (5) 0x0 (5) 0x1E (5) 0x34 (6) IMM5 (5) 0x3A (6)
0x1E (5) 0x0 (5) 0x1E (5) 0x09 (6) 0 (5) 0x3A (6)
IMM26 (26) 0x0 (6)
A (5) 0x0 (5) 0x1F (5) 0x1D (6) 0x0 (5) 0x3A (6)
A (5) B (5) C (5) 0x20 (6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x20 (6)
A (5) B (5) C (5) 0x08 (6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x08 (6)
A (5) B (5) C (5) 0x28 (6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x28 (6)
A (5) B (5) C (5) 0x10 (6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x10 (6)
A (5) B (5) C (5) 0x30 (6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x30 (6)
A (5) B (5) C (5) 0x18 (6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x18 (6)
A(5) B(5) C(5) reada(1) readb(1) readc(1) N(8) 0x32 (6)
A(5) B(5) C(5) 0x25(6) 0x0 (5) 0x3A (6)
A(5) B(5) C(5) 0x24(6) 0x0 (5) 0x3A (6)
16

3.1. INSTRUCTION FIELDS

eret
flushd
flushda
flushi
flushp
initd
initda
initi
jmp
jmpi
ldb
ldbio
ldbu
ldbuio
ldh
ldhio
ldhu
ldhuio
ldw
ldwio
mul
muli
mulxss
mulxsu
mulxuu
nextpc
nor
or
orhi
ori
rdctl
rdprs
ret
rol
roli
ror
sll
slli
sra
srai
srl
srli
stb
stbio
sth
sthio
stw
stwio
sub
sync

0x1D(5) 0x1E(5) C(5) 0x01(6) 0x0 (5) 0x3A (6)


A (5) 0x0 (5) IMM16 (16) 0x3B (6)
A (5) 0x0 (5) IMM16 (16) 0x1B (6)
A(5) 0x0(5) 0x0(5) 0x0C(6) 0x0 (5) 0x3A (6)
A(5) 0x0(5) 0x0(5) 0x04(6) 0x0 (5) 0x3A (6)
A (5) 0x0 (5) IMM16 (16) 0x33 (6)
A (5) 0x0 (5) IMM16 (16) 0x13 (6)
A(5) 0x0(5) 0x0(5) 0x29(6) 0x0 (5) 0x3A (6)
A(5) 0x0(5) 0x0(5) 0x0D(6) 0x0 (5) 0x3A (6)
IMM26 (26) 0x01 (6)
A (5) B (5) IMM16 (16) 0x07 (6)
A (5) B (5) IMM16 (16) 0x27 (6)
A (5) B (5) IMM16 (16) 0x03 (6)
A (5) B (5) IMM16 (16) 0x23 (6)
A (5) B (5) IMM16 (16) 0x0F (6)
A (5) B (5) IMM16 (16) 0x2F (6)
A (5) B (5) IMM16 (16) 0x0B (6)
A (5) B (5) IMM16 (16) 0x2B (6)
A (5) B (5) IMM16 (16) 0x17 (6)
A (5) B (5) IMM16 (16) 0x37 (6)
A(5) B(5) C(5) 0x27(6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x24 (6)
A(5) B(5) C(5) 0x1F(6) 0x0 (5) 0x3A (6)
A(5) B(5) C(5) 0x17(6) 0x0 (5) 0x3A (6)
A(5) B(5) C(5) 0x07(6) 0x0 (5) 0x3A (6)
0x0(5) 0x0(5) C(5) 0x1C(6) 0x0 (5) 0x3A (6)
A(5) B(5) C(5) 0x06(6) 0x0 (5) 0x3A (6)
A(5) B(5) C(5) 0x16(6) 0x0 (5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x34 (6)
A (5) B (5) IMM16 (16) 0x14 (6)
A(5) B(5) C(5) 0x26(6) N(5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x38 (6)
0x1F(5) 0x0(5) 0x0(5) 0x05(6) 0x0(5) 0x3A (6)
A(5) B(5) C(5) 0x03(6) 0x0(5) 0x3A (6)
A(5) 0x0(5) C(5) 0x02(6) IMM5(5) 0x3A (6)
A(5) B(5) C(5) 0x0B(6) 0x0(5) 0x3A (6)
A(5) B(5) C(5) 0x13(6) 0x0(5) 0x3A (6)
A(5) 0x0(5) C(5) 0x12(6) IMM5(5) 0x3A (6)
A(5) B(5) C(5) 0x3B(6) 0x0(5) 0x3A (6)
A(5) 0x0(5) C(5) 0x3A(6) IMM5(5) 0x3A (6)
A(5) B(5) C(5) 0x1B(6) 0x0(5) 0x3A (6)
A(5) 0x0(5) C(5) 0x1A(6) IMM5(5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x05 (6)
A (5) B (5) IMM16 (16) 0x25 (6)
A (5) B (5) IMM16 (16) 0x0D (6)
A (5) B (5) IMM16 (16) 0x2D (6)
A (5) B (5) IMM16 (16) 0x15 (6)
A (5) B (5) IMM16 (16) 0x35 (6)
A(5) B(5) C(5) 0x39(6) 0x0(5) 0x3A (6)
0x0(5) 0x0(5) 0x0(5) 0x36(6) 0x0(5) 0x3A (6)

17

3.2. MACHINE CODE EXAMPLE

trap
wrctl
wrprs
xor
xorhi
xori

18

0x0(5) 0x0(5) 0x1D(5) 0x2D(6) IMM5(5) 0x3A (6)


A(5) 0x0(5) 0x0(5) 0x2E(6) N(5) 0x3A (6)
A(5) 0x0(5) C(5) 0x14(6) 0x0(5) 0x3A (6)
A(5) B(5) C(5) 0x1E(6) 0x0(5) 0x3A (6)
A (5) B (5) IMM16 (16) 0x3C (6)
A (5) B (5) IMM16 (16) 0x1C (6)
3.2. Machine code example

Let us take a sample code:


START_TIMER = 0xF68C
l a b e l = 5000
o r h i r8 , r0 , %h i a d j ( l a b e l )
a d d i r8 , r8 , %l o ( l a b e l )
s u b i r8 , r8 , 1
bne r8 , r0 , START_TIMER
Now, the question is how to translate the four instructions to machine code.
Note that it uses two assembler macros.
(1) IMM16 for orhi is %hiadj(5000) which is 00000000000000000001001110001000
so the instruction is going to be 01000|00000|0000000000000000|110100.
(2) now, IMM16 is 00000000000000000001001110001000 so the instruction is
going to be 01000|01000|0001001110001000|000100.
(3) subi is a pseudo-instruction implemented as addi with negative IMM16,
so the instruction is going to be 01000|01000|1111111111111111|000100
(4) Finally, the last command is 01000|00000|1111011010001100|011110.
The whole code is then
01000000000000000000000000110100
01000010000001001110001000000100
01000010001111111111111111000100
01000000001111011010001100011110

You might also like