You are on page 1of 20

Computer Architecture and Assembly Language

Why assembly? -Assembly is widely used in industry: - Embedded systems. - Real time systems. - Low level and direct access to hardware -Assembly is widely used not in industry:

- Cracking software protections: patching, patch-loaders and emulators.


- Hacking into computer systems: buffer under/overflows, worms and Trojans.

Byte structure:
a byte has 8 bits

MSB (most significant bit)

LSB (least significant bit)

Data storage in memory:


80x86 processor stores data using little endian order.

Little endian means that the low-order byte of the number is stored in the memory at the lowest address, and the high-order byte at the highest address.
Example: You want to store 0x1AB3 (hex number) in the memory. This number has two bytes: 1A and B3. It would be stored this way: B3 0

1A
memory block

1 2

bytes of memory

Registers:
CPU contains a unit called Register file. This unit contains the registers of the following types: 1. 8-bit general registers: AL, BL, CL, DL, AH, BH, CH, DH 2. 16- bit general registers: AX, BX, CX, DX, SP, BP, SI, Dl 3. 32-bit general registers: EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI (Accumulator, Base, Counter, Data, Stack pointer, Base pointer, Source index, Destination Index) 4. Segment registers: ES, CS ,SS, DS, FS, GS

5. instruction pointer: EIP


Note: the registers above are a partial list. There are more registers.

EIP - instruction pointer:


contains offset (address) of the next instruction that is going to be executed. Exists only during run time. The software change it by performing unconditional jump, conditional jump, procedure call, return.

AX,BX,CX,DX - 16-bit general registers:


contains two 8-bit registers: Example: AH,AL (for AX) high byte low byte

XH

XL

EAX - 32-bit general purpose register: lower 16 bits are AX. segment registers: we use a flat memory model 32bit 4GB

address space, without segments. So for this course you can ignore segment registers.

ESP - stack pointer: contains the next free address on a stack.

Basic assembly instructions:


Each NASM standard source line contains a combination of the 4 fields:

label:

(pseudo) instruction

operands

; comment

optional fields

Either required or forbidden by an instruction

Notes:

1. backslash (\) uses as the line continuation character: if a line ends with
backslash, the next line is considered to be a part of the backslash-ended line.

2. no restrictions on white space within a line. 3. a colon after a label is optional. Examples: 1. mov ax, 2 ; moves constant 2 to the register ax 2. buffer: resb 64 ; reserves 64 bytes

Instruction arguments
A typical instruction has 2 operands. The left operand is the target operand, while the right operand is the source operand

3 kinds of operands exists:


1. Immediate, i.e. a value 2. Register, such as AX,EBP,DL 3. Memory location; a variable or a pointer. One should notice that the x86 processor does not allow both operands be memory locations.

mov [var1],[var2]

Move instructions:
MOV move data

mov r/m8,reg8
(destination) )

(copies content of 8-bit register (source) to 8-bit register or 8-bit memory unit

mov reg32,imm32

(copies content of 32-bit immediate (constant) to 32-bit register) - In all forms of the MOV instruction, the two operands are the same size Examples: mov EAX, 0x2334AAFF mov word [buffer], ax
Note: NASM doesnt remember the types of variables you declare. It will deliberately remember nothing about the symbol var except where it begins, and so you must explicitly code mov word [var], 2.

Basic arithmetical instructions:


ADD: add integers

add r/m16,imm16
operand)

(adds its two operands together, and leaves the result in its destination (first) Examples: add AX, BX ADC: add with carry

adc r/m16,imm8

(adds its two operands together, plus the value of the carry flag, and leaves the result in its destination (first) operand)

Examples: add AX, BX (AX gets a value of AX+BX+CF)

Basic arithmetical instructions (Cont.):


SUB: subtract integers

sub reg16,r/m16
(first) operand)

(subtracts its second operand from its first, and leaves the result in its destination Examples: sub AX, BX SBB: subtract with borrow

sbb r/m16,imm8

(subtracts its second operand, plus the value of the carry flag, from its first, and leaves the result in its destination (first) operand)

Examples: sbb AX, BX (AX gets a value of AX-BX-CF)

Basic arithmetical instructions (Cont.):


INC: increment integer

inc r/m16

(adds 1 to its operand)


* does not affect the carry flag; affects all the other flags according to the result

Examples: inc AX

DEC: decrement integer

dec reg16

(subtracts 1 from its operand)


* does not affect the carry flag; affects all the other flags according to the result

Examples: dec byte [buffer]

Basic logical instructions:


NEG, NOT: two's and one's complement

neg r/m16

(replaces the contents of its operand by the two's complement negation - invert all
the bits, and then add one)

not r/m16

(performs one's complement negation- inverts all the bits)

Examples: neg AL (if AL = (11111110), it becomes (00000010)) not AL


(if AL = (11111110), it becomes (00000001))

Basic logical instructions (Cont.):


OR: bitwise or

or r/m32,imm32

(each bit of the result is 1 if and only if at least one of the corresponding bits of the
two inputs was 1; stores the result in the destination (first) operand)

Example: or AL, BL (if AL = (11111100), BL= (00000010) => AL would be (11111110))

AND: bitwise and

and r/m32,imm32

(each bit of the result is 1 if and only if the corresponding bits of the two inputs were
both 1; stores the result in the destination (first) operand)

Example: and AL, BL (if AL = (11111100), BL= (11000010) => AL would be (11000000))

Compare instruction:
CMP: compare integers

cmp r/m32,imm8

(performs a mental subtraction of its second operand from its first operand, and

affects the flags as if the subtraction had taken place, but does not store the result of the subtraction anywhere)

Example: cmp AL, BL (if AL = (11111100), BL= (00000010) => ZF would be 0)


(if AL = (11111100), BL= (11111100) => ZF would be 1)

Labels definition (basic):


Each instruction of the code has its offset (address from the beginning of the address space). If we want to refer to the specific instruction in the code, we should mark it with a label:

my_loop1: add ax, ax

- label can be with or without colon - an instruction that follows it can be at the same or the next line - a code cant contain two different non-local (as above) labels with the same name

Loop definition:
LOOP, LOOPE, LOOPZ, LOOPNE, LOOPNZ: loop with counter
* for all the possible variants of operands look at NASM manual, B.4.142

Example:

mov ax, 1 mov cx, 3 my_ loop: add ax, ax loop my_ loop, cx

1. decrements its counter register (in this case it is CX register)

2. if the counter does not become zero as a result of this operation, it jumps to the given label

Note: counter register can be either CX or ECX - if one is not specified explicitly, the BITS setting dictates which is used.
LOOPE (or its synonym LOOPZ) adds the additional condition that it only jumps if the counter is nonzero and the zero flag is set. Similarly, LOOPNE (and LOOPNZ) jumps only if the counter is nonzero and the zero flag is clear.

DB, DW, DD : declaring initialized data


DB, DW, DD, DQ (DT, DDQ, and DO) are used to declare initialized data in the output file. They can be invoked in a wide range of ways: db db db db dw dw dw dw dd 0x55 0x55,0x56,0x57 'a',0x55 'hello',13,10,'$ 0x1234 'a' 'ab 'abc' 0x12345678 ; just the byte 0x55 ; three bytes in succession ; character constants are OK ; so are string constants ; 0x34 0x12 ; 0x41 0x00 (it's just a number) ; 0x41 0x42 (character constant) ; 0x41 0x42 0x43 0x00 (string) ; 0x78 0x56 0x34 0x12 (dword)

Assignment 0
You get a simple program that receives a string from the user. Than, it calls to a function (that youll implement in assembly) that receives one string as an argument and should do the following: 1. Convert lower case to upper case. 2. Convert upper case to lower case. 3. Convert * into #. 4. Convert # into *. 5. Calculate the length of the string. (characters that are not letters will remain as they are) e.g. "1: heL*Lo WorLd! " "1: Hel#lO wORlD! The function shall return the length of the string. The characters conversion should be in-place.

section .data an:

DD 0

; data section, read-write ; this is a temporary var ; ; ; ; our code is always in the .text section makes the function appear in global scope tell linker that printf is defined elsewhere (not used in the program)

section .text global do_str extern printf

do_str:

; push ebp ; mov ebp, esp ; pushad ; mov ecx, dword [ebp+8] ; ;;;;;;;;;;;;;;;; FUNCTION EFFECTIVE mov dword [an], 0 label_here:

functions are defined as labels save Base Pointer (bp) original value use base pointer to access stack contents push all variables onto stack get function argument CODE STARTS HERE ;;;;;;;;;;;;;;;;

; initialize answer

; Your code goes somewhere around here... inc cmp jnz ecx byte [ecx], 0 label_here ; increment pointer ; check if byte pointed to is zero ; keep looping until it is null terminated

;;;;;;;;;;;;;;;; popad mov mov pop ret

FUNCTION EFFECTIVE CODE ENDS HERE ;;;;;;;;;;;;;;;; ; restore all previously used registers eax,[an] ; return an (returned values are in eax) esp, ebp dword ebp

Running NASM
To assemble a file, you issue a command of the form > nasm -f <format> <filename> [-o <output>] [ -l listing] Example: > nasm -f elf mytry.s -o myelf.o It would create myelf.o file that has elf format (executable and linkable format). We use main.c file (that is written in C language) to start our program, and sometimes also for input / output from a user. So to compile main.c with our assembly file we should execute the following command: > gcc main.c myelf.o -o myexe.out It would create executable file myexe.out. In order to run it you should write its name on the command line: > myexe.out

You might also like