You are on page 1of 6

Design and Test

of Systems-on-a-Chip
Universität Stuttgart Prof. Dr. H.-J. Wunderlich
Prof Dr. M. Berroth
Dipl.-Phys. V. Gherman
Dr. A. Virazel

SS 2002

Tutorial 3
System Design with ARM

This tutorial sheet contains "exercises" about ARM. You should read the text in advance and make yourself
familiar with the topic, e. g. by reading the relevant documentation on the tools.
Especially, you should read:
1. ARM presentation (see web page). Additional information can be founded in the Reference
Manual (Chapter 3: Assembler ; Chapter 4: ARM Instruction Set ; Chapter 7 : Symbolic Debugger)
2. The cheetah man page (type: man cheetah)

Exercise 1
Use the "Hello World" program presented in the following figure to make yourself familiar with the ARM
assembler, linker and symbolic debugger. Save this file as hs.s

Area declaration Comment


AREA HelloW, CODE, READONLY

SWI-WriteC EQU &0 ; output character in r0


SWI-Exit EQU &11 ; finish program

Assign value ENTRY ; code entry point


to label
ADR r1, TEXT ; r1-> "Hello World"
LOOP LDRB r0, [r1], #1 ; get next byte
CMP r0, #0 ; check for text end
SWINE SWI_WriteC ; if not end print ..
Label BNE LOOP ; .. and loop back
SWI SWI_Exit ; end of execution

TEXT = "Hello World" , &0a, &0d, 0

END
Enter data
End of file into the code

1. Simulating the “Hello World” program


• Step 1: Assemble program
socpraar:~/arm>armasm -g -o hello_world.o hello_world.s
• Step 2: Link program
socpraar:~/arm>armlink -o hello_world hello_world.o

1
• Step 3: Run Debugger/Simulator
socpraar:~/arm>armsd hello_world
A.R.M. Source-level Debugger vsn 4.45c (ARM Toolkit v2.02) [Mar 12 1996]
ARMulator vsn 1.08c [Feb 5 1996]
ARM7TDM, 2048Mbyte, MMU present, soft DEMON vsn 1.3, FPE, Little endian.
Memory map ...
00000000..80000000, 32-Bit, rw, R(N/S) = 135/80, W(N/S) = 135/80
Clock speed = 33.33MHz
Object program file hello_world
armsd:_
• Important Commands in armsd
help - Help
help <command> - Help on <command>
quit - Quit
go - run program
step - step through program
break @<label> - set break point
break - list all break points
unbreak - remove break point
registers - Display register contents
list - examine memory contents (instruction, hex & character format)
lsym - list symbols (e. g. labels)
print $clock - number of microseconds since simulation has started
print $memstats - print memory statistics & times
• Listing Memory Contents
armsd: l @loop
LOOP
+0000 0x00008084: 0xe4d10001 .... : ldrb r0,[r1],#1
+0004 0x00008088: 0xe3500000 ..P. : cmp r0,#0
+0008 0x0000808c: 0x1f000000 .... : swine 0x0
+000c 0x00008090: 0x1afffffb .... : bne LOOP
+0010 0x00008094: 0xef000011 .... : swi 0x11
Text
+0000 0x00008098: 0x6c6c6548 Hell : stcvsl p5,c6,[r12],#-0x120
+0004 0x0000809c: 0x6f57206f o Wo : swivs 0x57206f
+0008 0x000080a0: 0x0a646c72 rld. : beq 0x1923270
+000c 0x000080a4: 0x0000000d .... : andeq r0,r0,r13
...

2
• Simulation & Execution Control
armsd: b @loop
armsd: b
#1 hello_world break @LOOP
armsd: g
Breakpoint #1 at #hello_world, line 9 of hello_world.s
9 LOOP LDRB r0, [r1], #1 ; get next byte
armsd: s
Step completed at #hello_world, line 10 of hello_world.s
10 CMP r0, #0 ; check for text end
armsd: s
Step completed at #hello_world, line 11 of hello_world.s
11 SWINE SWI_WriteC ; if not end print...
armsd: u @loop
armsd: b
armsd: g
Hello World
Program terminated normally at #hello_world, line 13 of hello_world.s
13 SWI SWI_Exit ; end of execution
armsd:
• Displaying Register
armsd: r
r0 = 0x00000000 r1 = 0x000080a6 r2 = 0x00000020 r3 = 0x00000000
r4 = 0x00000000 r5 = 0x00000000 r6 = 0x00000000 r7 = 0x00000000
r8 = 0x00000000 r9 = 0x00000000 r10 = 0x00000000 r11 = 0x00000000
r12 = 0x000080a8 r13 = 0x00000000 r14 = 0x00008010
pc = 0x00008094 psr = %nZCvift_User32
armsd:
• Performance Estimation
Required:
a) Memory description
File "armsd.map":
0 80000000 RAM 4 rw 135/80 135/80
...
address range (hex)
read time write time
word width (bytes) (normal/sequential access)

3
b) CPU clock speed
rapra01:~/arm>armsd -clock 50Mhz hello_worldA.R.M. Source-level
Debugger vsn 4.45c (ARM Toolkit v2.02) [Mar 12 1996]
ARMulator vsn 1.08c [Feb 5 1996]
ARM7TDM, 2048Mbyte, MMU present, soft DEMON vsn 1.3, FPE, Little
endian.
Memory map ... Check!
00000000..80000000, 32-Bit, rw, R(N/S) = 135/80, W(N/S) = 135/80
Clock speed = 50.00MHz
Object program file hello_worldarmsd: g
Hello World
Program terminated normally at #hello_world, line 13 of hello_world.s
13 SWI SWI_Exit ; end of execution

armsd: p $clock
12
armsd:
execution time in µs

2. Cache Simulation
Tool: Cheetah (University of Michigan)
• Step 1: Generate address trace
socpraar:~/arm>armsd-hex hello_world ...
File "memaccess":
...
0000801C (Load Word - sequential cycle)
00008020 (Load Word - sequential cycle)
00008058 (Load Instruction - sequential cycle)
0000805C (Load Instruction - sequential cycle)
00008030 (Load Word - non-sequential cycle)
...
• Step 2: Convert trace file
socpraar:~/arm>hexbin < memaccess > memaccess.bin
• Step 3: Run cheetah
a. Fully-Associative Caches
cheetah < memaccess.bin -Cfa -i128

... size increment

Addresses processed : 366 mode (fully-associative)


Line size : 16 bytes
Number of distinct lines 57
Cache size (bytes) Miss Ratio
128 0.218579
256 0.180328
384 0.177596
512 0.166667
640 0.161202
768 0.155738
896 0.155738
Miss ratio is 0.155738 for all bigger caches

4
b. Directly-Mapped Caches
cheetah < memaccess.bin -Cdm -a4 -b12 -c12
log2(cache size) (4 KB)
...
mode log2(max. line size) (4 KB)
Addresses processed 366
Cache size: 4096 bytes log2(min. line size) (16 bytes)
Line size (bytes) Miss ratio
16 0.185792
32 0.117486
64 0.084699
128 0.068306
256 0.073770
512 0.073770
1024 0.073770
2048 0.095628
4096 0.112022

c. Set-Associative Caches
cheetah < memaccess.bin -Csa -a4 -b8 -n2
log2(max. associativity)
...
mode log2(max. no. of sets)
Addresses processed: 366
Line size: 16 bytes
log2(min. no. of sets)
Miss Ratios
___________

Associativity
1 2 3 4
No. of sets
16 0.229508 0.174863 0.166667 0.158470
32 0.207650 0.158470 0.155738 0.155738
64 0.193989 0.155738 0.155738 0.155738
128 0.185792 0.155738 0.155738 0.155738
256 0.185792 0.155738 0.155738 0.155738

cache size
= 128 * 2 * 16 bytes
= 4 KB

5
Exercise 2
Consider the following program:
AREA prg, CODE, READONLY

SWI_Exit EQU &11

ENTRY

adr r0, size


ldr r1, [r0], #4 ; r0 now points to...
; ... "array"
ldr r2, [r0], #4

loop subs r1, r1, #1


beq done
ldr r3, [r0], #4
cmp r2, r3
movlo r2, r3
b loop

done swi SWI_Exit ; end of execution

AREA data, DATA


size DCD 8
array DCD 11, 17, 5, 23, 7, 3, 19, 2

END

a) What is the output of the program in register r2?


b) Simulate the program and verify your answer to a).

Exercise 3: Bubble sort


Implement a bubble sort algorithm for the ARM using the data part of the program of exercise 2.
Optimize your code for speed!

You might also like