Professional Documents
Culture Documents
LEARNING GOALS:
TABLE OF CONTENTS
Figure 9.1
The 68000 User Registers
Here are the important points you should know about the 68000 user registers:
All the data registers, all the address registers and the Program Counter are 32-bits (4 bytes) wide.
The Status Register, or SR, is 16 bits (2 bytes) wide. Only the low-order byte of the SR, which is called the CCR, or Condition Code Register, can be
accessed by the user. The high-order byte of the SR, the so-called System Byte, can be seen and accessed only by the Operating System during special
emergency cases called interrupts and exceptions which will be discussed in the second part of the course. Until then, we can forget about the System
Byte and work only with the CCR.
The data registers are used to store any data. They are general-purpose registers, because they haven't been reserved for any specific task by the 68000
chip designers, and they are interchangeable, in the sense that whatever you can do with register Di you can also do with register Dj. There are some rare
instructions that require a specific register as an operand, but those are really special cases and we won't see many of them.
The address registers are used to store addresses of locations in main memory. In other words, address registers are pointers to locations in memory.
Registers A0 to A6 are general-purpose and interchangeable, just like their cousins the data registers. Register A7, also referred to as SP, is more special:
it is the processor's stack pointer. It is used by the system to maintain a stack of subroutine return addresses (to be discussed later). You are free to access
and modify the contents of register of A7, just like any other address register, but you always have to bear in mind what A7 is used for. Note that any
address register can be used to maintain a stack. Thus, it is possible to handle several stacks at the same time. Note also that although you can store an
address of a memory location in a data register, you cannot use a data register as a pointer to that location. The differences between address and data
registers will become more clear when we start looking at the instructions that use those registers.
As we've already mentioned, the Program Counter is a register used to store the address in main memory of the next instruction to be executed. It is 32
bits wide, because memory addresses in the 68000 system are 32-bit numbers. The contents of the PC are automatically updated as each instruction is
fetched and executed. The contents of the PC are always an even number, because an instruction can begin only at an even address. The user has a more
restricted access to the PC than to a data or address register. We'll see later how the PC can be put to use in a program.
As we said, most of the 68000 user registers are 32 bits wide, i.e. they can accommodate 32 bit numbers. In order to be able to reference a particular bit in the
register, each bit is given a number. The convention is to number the bits from 0 to 31, bit 0 being the least significant, i.e. the rightmost bit, and bit 31 being the
most significant, i.e. the leftmost bit. In other words, the bit numbering starts at 0 at the right end and goes from right to left in increasing order (see figure 8.2).
This numbering system is called little-endian (pronounce Little Indian) numbering, because if you read the bit numbers from right to left you have a little
number at the end. The opposite system, called big-endian (pronounce Big Indian), consists in starting the numbering at 0 at the most significant (i.e. leftmost)
bit and go in increasing order to the right, until the rightmost bit receives number 31 (you have a big number at the right end). There is nothing wrong with this
numbering system, but the Motorola chip designers decided to go with the little-endian system, because it is more logical and more convenient to use.
Registers have the potential to accommodate 32 bit numbers (one longword), but they don't limit the user to perform operations only on 32 bit numbers. In fact,
the programmer can work with either
The CCR
Let's now look in more detail at the CCR. The CCR is a very important piece of hardware because it allows conditional behavior (i.e. high level constructs of the
kind if X then Y; else Z;) to be implemented, and the Control Unit often bases its decisions on the contents of the CCR.
Figure 9.3
The CCR
The CCR is an 8 bit register, but only bits 0 to 4 are actually used. Bits 5, 6 and 7 are always ignored, and their value doesn't matter. It may be assumed that they
are always set to 0. Bits 0 to 4 are flags, each of whom is affected in specific ways by the various operations performed by the CPU. Almost every instruction
that is executed by the CPU forces an update on the value of one or more CCR bits.
We'll come back to the CCR when we start examining the 68000 assembler instructions. But before attacking the actual 68000 language, let's examine the
organization of the 68000 memory space.
The memory space of the 68000 processor is one big linear array of memory locations, each of them being able to store one byte. The memory is said to be
byte-addressable, i.e. each byte within the memory has its own unique address and can be accessed directly. Note that the memory of the 68000 is not bit-
addressable, which means that you cannot access data in memory bit by bit. That also means that you cannot start reading memory in the middle of a memory
location, only at the beginning. Later versions of the 68000 processor, like the 68020, allow you to overcome those access restrictions.
By convention, the memory locations are numbered in a big-endian order. If the memory space is represented vertically, then the memory locations with a small
address number are found at the top, and those with a big address number are found at the bottom.
Figure 9.4
The linear memory of the 68000.
Each memory location is one byte in size. The 32 bit addresses are given as 8 hexadecimal digits.
As you know, the 68000 has a 32 bit Program Counter and 32 bit address registers. This is so because addresses of locations in memory are 32 bit numbers, and
consequently you can address up to 232 locations, i.e. 232 bytes, or 4 gigabytes (each memory location is one byte). In order to do so, you need a 32 bit address
bus to carry a 32 bit address number from the CPU to memory and vice versa. The 68000 and its successors the 68020, the 68030, etc. have indeed a 32 bit
address bus. However, the 68000 processor as such is a special case, because due to space restrictions only the first 24 lines of the address bus (address lines 0 to
23) actually leave the chip and connect it to memory. Address lines 24 to 31 are simply not brought off-chip. Since one address line transports one bit, that
means that only bits 0 to 23 within a 32 bit address are used to specify a memory location. For example, if you write a program to access locations 8012345616
and 4512345616, you will access the same physical location. The state of bits 24 to 31, represented by the two leftmost hexadecimal digits, simply doesn't
matter. In other words, the 68000 behaves as if its addresses are 24 bit quantities, not 32 bit quantities. That means that the addressable memory space of the
68000 is in practice only 224 bytes, or 16 megabytes. Note that addresses in the 68000 are still represented as and stored as 32 bit numbers, even if only the first
24 bits of those numbers are actually used.
This limitation does not exist with the newer members of the 68000 family. The 68020, 68030 and 68040 have a fully connected 32 bit address bus and a true
address space of 4 gigabytes.
Figure 9.5
The address bus of the 68000
The most common 68000 instructions and assembler directives, as well as the most common addressing modes are described in detail in the NeXT textbook
written by David Cloutier. We will therefore simply introduce the most important topics covered in the NeXT textbook, and we'll concentrate on giving some
examples of programs, Motorola syntax, and interfacing assembly language with C.
Let's introduce some instructions, and then we'll put them to use in a simple program.
SUB #N,D0 [D0] <- [D0] - N The number N is subtracted from the contents of
register D0 and the result is stored in D0
The contents of register D1 are subtracted from the
SUB D1,D5 [D5] <- [D5] - [D1] contents of register D5, and the result is stored in
register D5
Subtract the number N from the contents of register
CMP #N,D2 [D2] - N
D2. The result is discarded and the CCR is set up.
CMP D1,D2 [D2] - [D1] Subtract the contents of D1 from the contents of D2.
The result is discarded, and the CCR is set up.
BEQ X IF CCR(Z) = 1 THEN [PC]<- X Branch to location X if the Z bit of the CCR is set,
i.e. if the previous operation yielded zero as result.
Branch to location X if the Z bit of the CCR is
BNE X IF CCR(Z) = 0 THEN [PC]<- X cleared, i.e. if the previous operation didn't yield
zero as result.
Here is how some of the above instructions can be used in an assembly language program. Consider the following C code fragment:
x = 0;
y = Q;
if (y == 5)
x = x + y;
y = y - 6;
x = y;
MOTOROLA SYNTAX
MOVE #0,D0 x = 0; loads D0 with the value 0;
* we use D0 to represent x
MOVE Q,D1 y = Q; loads D1 with Q;
* we use D1 to represent y; Q is a reference
* to a memory location
CMP #5,D1 Compare the number 5 with D1 (y)
BNE EXIT_IF If not equal, then branch to(go to)label EXIT_IF
ADD D1,D0 x = x + y; this statement is executed
* only if y == 5
EXIT_IF SUB #6,D1 y = y - 6; subtracts 6 from D1
MOVE D1,D0 x = y; moves the value of y (D1) into x (D0)
MILO SYNTAX
move #0,d0 |x = 0; loads D0 with the value 0;
# we use D0 to represent x
move q,d1 |y = Q; loads D1 with q;
# we use D1 to represent y; q is a reference
# to a memory location
cmp #5,d1 |Compare the number 5 with D1 (y)
bne exit_if |If not equal, then branch to(go to)label exit_if
add d1,d0 |x = x + y; this statement is executed
# only if y == 5
exit_if: sub #6,d1 |y = y - 6; subtracts 6 from D1
move d1,d0 |x = y; moves the value of y (D1) into x (D0)
You certainly recognize the 4 fields in the layout of this program: the label field, the instruction field, the operands field, and the comments field. Here are some
points to note about the above example:
The instructions are executed in a sequence from top to bottom; only instructions of the form Bcc LABEL or Jcc LABEL, i.e. Branch on Condition Code
or Jump on Condition Code (see the NeXT textbook for more details) can force a non-sequential execution of the instructions by loading the PC with the
address of the instruction bearing the label LABEL. This is how conditional statements and loops are implemented in assembly language.
You are maybe starting to appreciate the presence of 8 general purpose data registers which provide you with a lot of on-chip working space. Note that the
above operations could have been implemented by using memory locations instead of data registers, but the program would have been much slower.
Note how the if statement is constructed. First we execute a CMP instruction. Its effect is to substract 5 from the contents of D1 without storing the result
in D1, i.e. without affecting the value stored in D1. This instruction is used only to set up the CCR bits, so that a branch instruction can be executed next.
We execute the BNE (Branch on Not Equal) instruction, which means that if 5 is not equal to the contents of D1 we will take a branch to the instruction
labeled EXIT_IF. How does the CPU know if 5 is equal or not to the contents of D1? Well, if the CMP instruction yielded 0 as result, then the Z bit would
be set and the branch will not be taken. If [D1] - 5 is not equal to 0, then the Z bit will be cleared (set to 0) and the branch will be taken. If the branch is
taken, the PC is loaded with the address of the instruction labeled EXIT_IF, if the branch is not taken, then the instruction directly following the Branch
instruction is executed.
Other conditional constructs found in if statements, loops etc. are executed in a very similar manner, by using Branch or Jump instructions based on the
value of one or more CCR bits.
The 68000 allows you to work with operands of 3 different sizes: bytes, words, and longwords; registers can be used to store bytes, words, or longwords. For
example, when you work with character data, you may want to work with bytes; if you work with integers you'll probably work with words, or longwords. The
Assembler who translates your program into machine code has no way to know when you want to perform a longword operation and when a word operation
unless you explicitly specify its size.
Operations on 32 bit numbers are specified by appending the suffix .L to the end of the instruction mnemonic, operations on 16 bit numbers are specified by
appending a .W suffix to the end of the instruction mnemonic, and operations on 8 bit numbers are specified by appending a .B suffix to the end of the
instruction mnemonic.
It is very important to keep in mind that only the bits specified by the size of the operation are affected by the operation. For example, if register D4 contains the
number FFFFFFFF16 and you perform the operation
ADD.W #1,D4
the result (which will be stored in D4 and overwrite the previously held value) will not be 0000000016, as it may be expected at first sight, but rather
FFFF000016.
Furthermore, the value of the CCR bits calculated after an operation is also determined by the size of the operation. Thus, in the above case:
the Z bit will be set, because the operation yielded a zero result, even if D4 as a whole does not contain 0.
the C bit will be set since a carry was generated from bit 15.
the N bit will be cleared since bit 15 is 0.
the V bit will be set since the operation resulted in arithmetic overflow: FFFF16 + 000116 = 1000016, which cannot be stored within a 16 bit word. We
started with FFFF16 as operand, we end up with 0000 as result. The sign of the result has changed, that is enough to set the V bit.
Note that the CPU doesn't know whether you perform signed or unsigned arithmetic, or any other kind of operation; it updates its CCR bits blindly, and
it's up to you to decide what use to make of the CCR bits.
The size of an operation is specified in a slightly different way in the Milo syntax. The Motorola syntax uses a dot to separate the instruction from the size,
while the Milo syntax does not use a dot, it simply appends the letter of the size to the end of the instruction.
Let's now look at another simple program which specifies its operand sizes. Consider the following fragment of C code:
/*
We assume we have the following declarations:
char C = 'A';
int X = 0x100;
long int Y = 0x2000A111;
X++ ;
if (C != 'B')
X -= 0x5;
Y += 0x9001;
MOTOROLA SYNTAX
* First, fetch the data from memory
*
MOVE.W X,D1 Fetch X and place it in D1. Note: X is 2 bytes!
MOVE.L Y,D2 Fetch Y and place it in D2. Note: Y is 4 bytes!
MOVE.B C,D3 Fetch C and place it in D3. Note: C is 1 byte!
*
ADD.W #1,D1 Executes X++
CMP.B #$42,D3 Compares the ASCII code for 'B' (0x42) to C
BEQ EXIT_IF Go to label EXIT_IF,thus skipping the next instruction,
* if C == 'B'
SUB.W #$5,D1 Executes X -= 0x5
EXIT_IF ADD.L #$9001,D2 Executes Y += 0x9001
MILO SYNTAX
# First, fetch the data from memory
#
movew X,d1 |Fetch X and place it in d1. Note: X is 2 bytes!
movel Y,d2 |Fetch Y and place it in d2. Note: Y is 4 bytes!
moveb C,d3 |Fetch C and place it in d3. Note: C is 1 byte!
#
addw #1,d1 |Executes X++
cmpb #0x42,d3 |Compares the ASCII code for 'B' (0x42) to C
beq exit_if |Go to label EXIT_IF,thus skipping the next
# instruction, if C == 'B'
subw #0x5,d1 |Executes X -= 0x5
exit_if: addl #0x9001,d2 |Executes Y += 0x9001. Note Milo indicates
# hex numbers with 0x, just like in C
This program is intended to show you the importance of operand sizes. Let's assume that C, X, and Y are originally stored in memory as shown in the following
diagram:
Figure 9.6
The above diagram represents an area of memory where our 3 variables are stored. Note that all numbers are in hexadecimal. Note also the big endian ordering
of memory. Variable C, which holds the ASCII code for A (41 hexadecimal), is stored in location 1001. C takes up only one location, because char type
variables are one byte in size. Variable X is an int, it is 2 bytes in size, therefore it takes the next two locations 1002 and 1003. Y is a long int, it is stored in 4
consecutive locations from 1004 to 1007.
MOVE.W X,D1
MOVE.L Y,D2
MOVE.B C,D3
Figure 9.7
Note very carefully the order in which bytes are transferred from memory to registers and vice versa: the most significant byte is stored at the smallest
address in memory, and the least significant byte is stored at the greatest address. A word at location N occupies byte addresses N and N+1. A longword at
location N occupies byte addresses N, N+1, N+2 and N+3.
What would have happened if we had done, let's say, MOVE.W Y,D2? There is nothing illegal with such an instruction, but it would be incorrect in our case. The
least significant word of D2, i.e. bits 0 to 15, would hold 200016 and bits 16 to 31 would stay untouched.
It is important to remain consistent with the operand sizes throughout the entire program. When you know that your D2 holds a 32 bit number, keep on
performing 32 bit operations on that register until you start using the register for something else. What would have happened if you had performed ADD.W
#$9001,D2 instead of ADD.L #$9001,D2? After the ADD.W #$9001,D2 instruction, register D2 will hold the value 2000311216. Your program will not crash
because of that, but it will give you an incorrect result. The correct result should be 2001311216, but you get 2000311216 because a word operation affects only
bits 0 to 15 and leaves bits 16 to 31 unaffected. Thus, the carry from bit 15 which should normally be added to bit 16 goes instead to the C bit of the CCR.
In the above example, D3 and D1 are not used to their full capacity. In fact, it is possible to use bits 8 to 31 in D3 and bits 16 to 31 in D1 to store other useful
information not related to this program. Such "multiple-purpose" use of data registers is perfectly legal, but highly discouraged because it is confusing and may
lead to errors.
1) How does the computer know exactly where in memory variables C, X, and Y are actually stored?
2) And how do you declare variables in assembly language? What is the assembly language equivalent of char C = 'A'; ?
3) How do you declare constants?
4) How can the programmer participate in the management of memory space? How do you indicate the start and the end of a program?
Question 1:
The letters C, X, and Y are identifiers that refer to address of locations in memory, in the above example C refers to 1001, X refers to 1002 and Y refers to 1004.
In general, programmers don't have to worry about the actual numerical address, as the computer automatically takes care of that, as we'll see later. However, if
for some reason you want to store one byte of information in a specific location, e.g. 100116, then you can use the assembler directive EQU to equate the name
C to the value 100116. The syntax of the EQU directive:
This way, you give a name to the number 100116, and you can use the name and the actual numerical value interchangeably in calculations and other
expressions.
In the above example, we use EQU to equate the name C to the absolute address $1001. The EQU directive can be used in other circumstances as well. For
example you can have
LENGTH EQU 35
WIDTH EQU 15
AREA EQU LENGTH*WIDTH
Later in your program, you can use the identifiers instead of the actual numerical values they represent, and the assembler will take care of replacing the
identifiers with the numerical values. The use of EQU is very similar to the use of #define in C. It makes the program more clear and readable. Just beware of
illegal forward references:
Note that identifiers equated to a value through the EQU directive are still to be treated as literals, and prefixed by the # sign, like in MOVE.W #LENGTH,D4.
The equivalent of EQU in the Milo syntax is the .set directive, its syntax is:
.set identifier,expression
This directive tells the Assembler to replace all the occurence of identifier with expression. Consult with the NeXT textbook for details.
Question 2:
The use of the EQU directive is not the method of choice for declaring variables and constants. There is a specific assembler directive used for purposes
equivalent to variable declarations in high-level languages. This directive is DS and is qualified by .B, .W or .L. Its syntax:
NAME DS.S <amount of storage space> where S stands for size, i.e. B, W, or L.
DS means "define storage" and it reserves, or allocates, storage locations in memory. Here is how it works:
When you use the DS directive, you don't care anymore about the absolute numerical address of the location(s) where the variables are stored, everything is
being taken care of by the Assembler. Just as in a high level language, you don't need to know where a variable is stored, only that space is being allocated
somewhere in memory for that variable.
The Milo syntax for the DS directive is somewhat quite different. In Milo, if you want to reserve one word for a variable called milovar and initialize it to the
value 16 you will write
milovar: .word 16
Don't be confused by the number 16: here you don't reserve 16 words for the variable milovar, you reserve one word and give it the initial value 16. For more
details about Milo directives, consult the NeXT textbook.
Question 3:
Constants are declared by the directive DC, which means "define constant". DC is qualified by .B, .W or .L, depending on the storage space that the constant is
meant to occupy . Its syntax:
NAME DC.S constant where S stands for size, i.e. B, W, or L. All the occurences of NAME will be replaced by the corresponding constant.
The constant that is being stored with the DC directive can be a decimal number, a hex number prefixed with $, a binary number prefixed with %, or an ASCII
string enclosed in single quotes. You can store constants in consecutive locations by separating them with a comma. We'll show you shortly how they are
actually stored in memory.
You can combine the DC directive with other directives, for example
To declare constants with Milo, you could use the same directives used for variable declaration, i.e. .byte, .word and .long. Consult with the NeXT textbook
for details.
Question 4:
There is a directive called ORG, the origin directive, whose operand specifies the absolute address of the beginning of the area of memory where a program
and its associated data are located. Its syntax:
ORG <address>
An ORG directive can be located at any point in the program, it simply resets the value of the location counter that keeps track of where the next item is to be
located in the processor's memory.
ORG $001200 Sets the origin of the data area at address $001200
DC.B 12 The value $0C is stored in one byte of memory
DC.W 3,$3B2 The values $0003 and $03B2 are stored
in consecutive locations, each of them taking up 2 bytes
DC.L $DEADBEEF The value $DEADBEEF (4 bytes) is stored in memory
DC.L $3B2 The value $000003B2 is stored in 4 bytes of memory
DC.B 'Arnie' The ASCII characters are stored as 5 bytes
Here is how the memory of the processor would look like (note: all numbers are in base 16):
Address Contents
001200 0C
001201 00
001202 03
001203 03
001204 B2
001205 DE
001206 AD
001207 BE
001208 EF
001209 00
00120A 00
00120B 03
00120C B2
00120D 41
00120E 72
00120F 6E
001210 69
001211 65
A similar directive exists in Milo, its syntax is: .org <address>. Its use is essentially the same.
The ORG directive is thus used to specify the beginning of a program in memory. One program can have multiple origins, e.g. one for the data region and one
for the instructions region. Many assemblers, however, don't require the use of the ORG directive, and they locate the program in memory wherever there is
space for it. In fact, the use of the ORG directive to specify an absolute location for the program in memory can be very dangerous, since it can overwrite
another program or data region already present at that address. It is therefore recommended not to use ORG unless you are certain there is no other program
running on your computer at the same time, which happens virtually never on modern systems.
In order to know where the source code of a program ends, some Assemblers require the presence of the END directive at the last line of the assembly language
program. This directive simply tells the Assembler that there are no more instructions or directives to be assembled. Most of the Assemblers who employ this
directive use it without parameters, but if you use the University of Teesside cross-assembler then you need to supply a single parameter: the address in memory
where the code is located, i.e. the point where it is to start executing. This address is in general the same as the one supplied by the ORG directive.
/*
We assume chars are 1 byte in size,
ints are 2 bytes in size,
and long ints are 4 bytes in size;
*/
char C = 'A';
int X = 0x100;
long int Y = 0x2000A111;
X++ ;
if (C != 'B')
X -= 0x5;
Y += 0x9001;
Finally, here is the Milo version of the same program. You have probably noted that the most important difference between the Motorola and the Milo syntax is
found in the way assembler directives are written and used. Once again, you are encouraged to refer to the NeXT textbook for information about the Milo
syntax.
.data
#Reserve one byte for variable C and initialize it to the ascii value of A
C: .ascii "A"
.even
#Reserve one word for variable X and initialize it to 0x0100
X: .word 0x100
.text