You are on page 1of 268

Computer Systems Lab

Matteo Corti

Informatikdienste, ETH Zürich

Summer 2007

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 1 / 244
Introduction
Course schedule
• lecture: Monday 4:15 – 5:00 p.m. (RZ F21)
• assignments: in teams, time to be arranged individually

Contact
Matteo Corti Mathias Payer
SOW E16 RZ H7
matteo.corti@id.ethz.ch mathias.payer@inf.ethz.ch

Resources
• Homepage:
http://www.lst.inf.ethz.ch/teaching/lectures/ss07/2100/index.html

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 2 / 244
Structure of the course
Projects

• Topics related to operating systems


• Practical focus

Organization

• programming assignments (labs)


• teams (max 2 students) are allowed
• final grade based on the labs

Environment
• Unix/C

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 3 / 244
Course schedule (tentative)
Date Lecture Project
2007-03-19
Memory allocation
2007-03-26 malloc lab
2007-04-02 Scheduling
2007-04-09 Easter Monday
2007-04-16 Sechseläuten scheduler lab
2007-04-23
File systems
2007-04-30
filesystem lab
2007-05-07
Networking
2007-05-14
proxy lab
2007-05-21 Distributed systems
2007-05-28 Whit Monday
2007-06-04 Distributed systems distributed lab
2007-06-11 Optional labs
2007-06-18 TBD Optional labs

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 4 / 244
Suggested literature
R. E. Bryant and D. O’Hallaron
Computer Systems — A Programmer’s Perspective
Prentice Hall, Upper Saddle River, NJ, 2003.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 5 / 244
Project 1: Memory allocation
Summary
Implement a memory allocator: malloc, free, realloc
Goals
• Learn how to manage storage allocation.
• Good exercise to master pointers

Environment
• Unix like system
• C

Resources
• Computer Systems — A Programmer’s Perspective, Chapter 10.9
(Dynamic Memory Allocation), pages 730–755.
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 6 / 244
The Heap
Area of a process’s (virtual) memory User stack
where which is dynamically allocated.
Example (Unix)
Memory mapped region
• grows upwards for shared libraries
• the brk pointer marks the top of
the heap
top of the
heap (brk)
Heap

Uninitialized data (.bss)

Initialized data (.data)

Program text (.text)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 7 / 244
Memory Management
The memory manager (user space) maintains a process’s heap: a
collection of blocks (contiguous chunks of memory) which are either
allocated or free.
Allocation is explicit (e.g., malloc in C or new in Java).
Deallocation can be explict (e.g., free in C) or implicit (garbage
collection in Java).
Issues
The memory manager needs to:
• distinguish block boundaries
• distinguish between free and allocated blocks

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 8 / 244
Memory allocation: Strategies

32 8 16 8
16 32 8 8
32 16 8 8
32 16 16
64
8 16 8 32

• first-fit: use the first free block which is big enough


• best-fit: take smallest fitting block
• worst-fit: take biggest available block
• quick-fit: best-fit but multiple free-lists (one per block size)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 9 / 244
Memory allocation: Bitmaps
The simplest (but often slowest) way to manage memory blocks is to
mark in a bitmap if a block is free or allocated.

0 0 0 0 1 1 1 0
32 8 16 8 0 0 1 1 1 1 0 1
1 1 1 1 0 0 0 1
16 32 8 8 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0
32 16 8 8 0 1 1 0 1 1 1 1
32 16 16
64
8 16 8 32

Space
For a 4GB memory space and 4KB words a 128MB table is necessary
to store the free/allocated information.

Speed
Slow scan to find a free block
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 10 / 244
Memory allocation: Dynamic data structures
Free blocks are stored in a dynamic data structure (ordered list,
ordered tree, . . .).

free_list 32 8 16 8
16 32 8 8
32 16 8 8
32 16 16
64
8 16 8 32

Lists are usually ordered by allocation address to allow a faster


merging of free blocks.
8 16 8

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 11 / 244
Memory allocation: Dynamic data structures
Free blocks are stored in a dynamic data structure (ordered list,
ordered tree, . . .).

free_list 32 8 16 8
16 32 8 8
32 16 8 8
32 16 16
64
8 16 8 32

Lists are usually ordered by allocation address to allow a faster


merging of free blocks.
8 16 8

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 11 / 244
Memory allocation: Dynamic data structures
Free blocks are stored in a dynamic data structure (ordered list,
ordered tree, . . .).

free_list 32 8 16 8
16 32 8 8
32 16 8 8
32 16 16
64
8 16 8 32

Lists are usually ordered by allocation address to allow a faster


merging of free blocks.
16

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 11 / 244
Memory allocation: Lists
Space
Linking information as well as some flags (i.e., free or allocated)
has to be stored in the block.

Speed
Depending on the chosen structure operations can be implemented
efficiently.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 12 / 244
Memory allocation: Space utilization
External fragmentation
Unused space among the blocks. Depends on the allocation
strategy.

Internal fragmentation
Unused space inside the blocks. Depends on the possible block
sizes (granularity).

Data structures
Space used to maintain data structures cannot be allocated (bitmaps,
pointers, flags, . . .)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 13 / 244
Memory allocation: Buddy System
An example of allocation strategy:

• Each block has a size of 2k


• The system maintains a list for each size
• Block allocation:
• Find a block which fits
• If necessary split
• Put the remaining part in the right list
• Free:
• Mark the block as free
• If is possible merge with a free neighbor

The buddy system is used in several operating systems (Windows,


Linux, Oberon, . . .).

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 14 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8 8

16 16

32

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8 8 8

8 8 16 32 16 16

32

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8

8 8 16 32 16 16 16

16 16 32 32

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8

8 8 16 32 16

16 16 32 32 32

32 32

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8

8 8 16 32 16

16 16 32 32 32

32 32

Allocate (8)
32 32 8

16

32 32

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8

8 8 16 32 16

16 16 32 32 32

32 32

Allocate (8)
32 32 8

32 16 16 16 16 16

32

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8

8 8 16 32 16

16 16 32 32 32

32 32

Allocate (8)
32 32 8 8 8

32 16 16 16 16

32 8 8 16 32

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Buddy System (example)
Free
8 8 16 32 8

8 8 16 32 16

16 16 32 32 32

32 32

Allocate (8)
32 32 8 8

32 16 16 16 16

32 8 8 16 32

32 8 8 16

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 15 / 244
Memory allocation: Slabs
Disadvantages of the buddy system:
• internal fragmentation (max 50%)
• bad distribution of the block addresses (bad caches
performance)
Solaris (J. Bonwick 1994) introduced the concept of a slab allocator:
• based on the idea that processes are likely to request a lot of
objects of the same size: these objects can be kept and reused
• slabs are collections of objects of the same size
• sizes (and addresses) are not geometrically distributed
Used by AmigaOS (4), Linux (≥ 2.2) and Solaris (≥ 2.4)
J. Bonwick
The Slab Allocator: An Object-Caching Kernel Memory Allocator
Proc. 1994 USENIX Annual Tech. Conf
pp. 87–98, June 1994, Boston, MA.
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 16 / 244
Lab: Task
Write a dynamic memory allocator, i.e., implement:
• malloc
• free
• realloc

Goals of the lab


• understand how a real memory allocator works
• evaluate different design strategies
• a nice exercise for low level C programming

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 17 / 244
Lab: Task
void *mm malloc(size t size)

• returns a pointer to a memory block of at least size bytes.


• 8 byte aligned
• if no allocation is possible returns NULL

void mm free(void *ptr)

• frees the block pointed by ptr


• if ptr is not the first address of an allocated block the behavior is
undefined.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 18 / 244
Lab: Task
void *mm realloc(void *ptr, size t size)

• if ptr is NULL: mm malloc(size)


• if size is 0: mm free(ptr)
• returns a pointer to a new block of size size and with the same
content of the block pointed by ptr (up to the minimum of the old
and new size).

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 19 / 244
Lab: Provided framework
We provide one of the simple possible implementations:

• malloc: sequentially allocate a block increasing the top of heap


pointer (brk).
• free: do nothing
• realloc: allocate a new block with the requested size

Although this solution is formally correct and very fast it has a very
bad space utilization.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 20 / 244
Lab: Design
Before beginning think about the design of your memory allocator:

• Choose the allocation strategy: first fit, best fit, . . .


• Choose a data structure(s): ordered list, binary tree, . . .
• Think on how to mark the blocks: in general how to store the
size of the block, the free tag and the eventual pointers to
manage the data structure (e.g., next pointer for a list, left, right
for tree, . . .)
• Think about the API of the functions to manage your structure
(flexibility)
• Choose a time to perform coalescing (merging): immediate
(when putting a block in the free list) or deferred (e.g., when
allocation fails)?

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 21 / 244
Lab: Design
How to choose the correct strategy?

• analyze the needs (space, speed, predictability)


• analyze the environment (experimentally)
• test different strategies

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 22 / 244
Lab: Block format
Most allocators store information (allocated/free, size, . . .) in the
blocks themselves.
Example:
ptr size flags size flags
linking information

payload payload

padding (optional)

Linking:
• implicit: blocks are traversed using the size field only.
• explicit: linking information is embedded in the block.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 23 / 244
Lab: Block format (footers)
Implicit allocators have the problem that to perform coalescing they
need information about the previous block.

size flags
ptr Knuth [K73] suggested to put a copy of
the block header at the end of the block
(footer)
payload When freeing a block an implicit allocator
can then check the header of the next
padding (optional)
size flags
block and the footer of the previous one.
Footers are not necessary for allocated blocks if we store the
allocated/free information in the header of the next block.
D. E. Knuth
The Art of Computer Programming
Addison Wesley, 1973
Chapter 2 of Volume 1, pages 228–463.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 24 / 244
Admin: Blog
Blog
The Informatikdiesnte is now offering a blog service in collaboration
with the NET.
http://blogs.ethz.ch/syslab
To comment you must register yourself as a blog user with the nethz
admin tool (https://password.ethz.ch).

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 25 / 244
Admin: Bug fix
The macros to align the addresses returned by malloc in the given
example are wrong since they only deliver 8-byte aligned results.
The results must of course be and-ed with
˜(ALIGNMENT-1)

and not
˜0x7

(which is only OK if ALIGNMENT is 8)


Please patch your mm.c file:

#define ALIGNMENT 8
#define ALIGN(size) (((size) +
(ALIGNMENT-1)) & ˜(ALIGNMENT-1))
#define SIZE_T_SIZE (ALIGN(sizeof(size_t)))

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 26 / 244
Lab: Goals of a dynamic memory allocator
• Correctness (will be discussed later)
• Maximize throughput:
• number of malloc and free per second
• performance related to the number of allocated blocks
• Maximize memory utilization:
• minimize the ratio between the allocated memory and the stack
size (i.e., minimize fragmentation).

Maximizing throughput could worsen memory utilization and


minimizing fragmentation could worsen performance.
Find a balance between performance and memory utilization

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 27 / 244
Design: Advanced topics
Locking
On modern systems (especially on SMP and multi-core architecture)
more than one thread can execute a memory operation concurrently.
Memory management data structures (e.g., free lists) must be
properly protected by locks.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 28 / 244
Resources (extended)
R. E. Bryant and D. O’Hallaron
Computer Systems — A Programmer’s Perspective
Prentice Hall, Upper Saddle River, NJ, 2003.
Chapter 10.9 (Dynamic Memory Allocation), pages 730–755.
Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David
Boles
Dynamic Storage Allocation: A Survey and Critical Review
Proc. Int’l Workshop on Memory Management
Kinross, Scotland, UK, September 1995.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 29 / 244
Lab: Support
We provide:
mm.c
The actual implementation, the API is fixed.

mdriver.c
A test program which evaluates mm.c with various allocation patterns
(real traces).

memlib.c
Provides a simulation of a memory system

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 30 / 244
Lab: Support (memlib)
Heap
void *mem sbrk(int incr) mem_heap_hi

expands the heap and returns the mem_pagesize

first byte of the allocated area mem_heapsize

void *mem heap lo(void) mem_heap_lo

returns the address of the first byte


in the heap
void *mem heap hi(void)

returns the address of the last byte in the heap


size t mem heapsize(void)

returns the size of the heap


size t mem pagesize(void)

returns the page size


Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 31 / 244
Lab: Restrictions
• Do not change the API of mm.c.
• Do not invoke any memory-management related library calls or
system calls (e.g, malloc, calloc, free, realloc, sbrk,
brk, . . .).
• Do not define any global or static compound data structures
(e.g., arrays, structs). You are allowed to declare global scalar
variables (i.e., integers, floats, and pointers).

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 32 / 244
Lab: Testing
Consistency checks
You are encouraged to include consistency checks in int
mm check(void). Traverse the heap blocks and check:
• the block flags (are the blocks in the free list marked as free)
• if there are contiguous free blocks (not merged)
• if there is any free block not in the free list
• if any of the blocks are overlapping
• if pointers in the free list point to free blocks only
• if pointers point to valid heap addresses
• ...

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 33 / 244
Lab: Testing
Trace driven driver
Use mdriver to check the correctness, space utilization and speed of
your solution.

./mdriver -v
Team Name: MyTeam
Member 1 : Matteo Corti matteo.corti@id.ethz.ch
Member 2 : Cristian Tuduce tuduce@inf.ethz.ch
Using default tracefiles in /home/corti/edu/comp_sys_lab/svn/labs/lab_1/traces/
Measuring performance with gettimeofday().

Results for mm malloc:


trace valid util ops secs Kops
0 yes 99% 5694 0.005680 1002
1 yes 98% 5848 0.005870 996
2 yes 99% 6648 0.006605 1006
3 yes 99% 5380 0.004915 1095
4 yes 99% 14400 0.009588 1502
5 yes 94% 4800 0.009901 485
6 yes 94% 4800 0.009632 498
7 yes 91% 12000 0.013163 912
8 yes 88% 24000 0.021968 1092
9 yes 99% 14401 0.008420 1710
10 yes 86% 14401 0.009596 1501
Total 95% 112372 0.105339 1067

Perf index = 57 (util) + 40 (thru) = 97/100

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 34 / 244
Lab: Traces
You can (and are encouraged to) generate your own simple traces
for testing.
A .rep trace is a text file formed by a header:
sugg_heapsize /* suggested heap size (unused) */
num_ids /* number of request id’s */
num_ops /* number of requests (operations) */
weight /* weight for this trace (unused) */

Followed by num_ops lines defining an operation (allocate (a),


reallocate (r) or free (f):
a id bytes /* ptr_id = malloc(bytes) */
f id bytes /* realloc(ptr_id, bytes) */
r id /* free(ptr_id) */

Example:
0 suggested heap size
2 2 objects
4 4 operations
0 weight
a 0 16 allocate a 16-bytes object [0]
r 0 32 reallocate object [0] with 32 bytes
a 1 8 allocate an 8-byte object [1]
f 0 deallocate object [1]

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 35 / 244
Evaluation
Your work will be evaluated following various criteria

• Correctness: no crashes, correct heap layout


• Performance:
• space: ration between the heap size and allocated memory
• throughput: operations per second
• Style
T
 
T
P = wU + (1 − w) min 1, max

Tlibc performance

solutions
600

U space utilization
w 0.6 (favors space utilization)
T throughput
0
0 1 U

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 36 / 244
Amdahl’s Law
Speedup (S) of a program where a fraction α of the code was
optimized by k.

αTold
Tnew = (1 − α)Told +
k
 α
= Told (1 − α) +
k
Told 1
S= =
Tnew (1 − α) + αk

General case:

1
S= n
X αi 
ki
i=0

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 37 / 244
Amdahl’s Law
Example
We optimize a function which takes 20% of the total time by a factor of
3. The total application speedup will be:

1
S= 0.2
= 1.15
(1 − 0.2) + 3

To speed up a system we must improve the speed of a large fraction


of the system.
1
Setting k to ∞ we get S = :
10

1−α
8

6
S

0
0 0.2 0.4 0.6 0.8 1
α
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 38 / 244
Profilers
A profiler is a tool that analyzes a program as it runs collecting
performance information (in particular frequency, duration and order of
function calls)
flat
computes, for each function, the average call time, number of calls, . . .

call-graph
collects information on the call-chains (but not on the call contexts)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 39 / 244
Profilers
Event based
hooks to events like calls, exceptions, . . . (e.g., Java, Python, Ruby)

Statistical
• compile-time instrumentation (e.g., gprof)
• binary instrumentation (e.g., ATOM)
• runtime instrumentation (e.g., valgrind)
• simulator

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 40 / 244
gprof
gprof

• instrument code for profiling (-pg with gcc)


• generate a profile by running the program
• analyze the profiled (sampled) data
• flat: look at the most called functions and the function where the
majority of time is spent
• call graph: detect dependencies

Implementation notes

• Sampling-based: every interval δ (1–10ms) the program is


interrupted and the function currently in execution is taken into
account.
• Calling information is accurate
• Time spent in library calls is accounted to the callee

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 41 / 244
gprof: Example
Flat profile:

Each sample counts as 0.01 seconds.


% cumulative self self total
time seconds seconds calls us/call us/call name
61.45 0.43 0.43 41719 10.31 10.31 add_range
28.58 0.63 0.20 40042 5.00 5.00 remove_range
10.00 0.70 0.07 main
0.00 0.70 0.00 369359 0.00 0.00 mem_sbrk
0.00 0.70 0.00 369359 0.00 0.00 mm_malloc
0.00 0.70 0.00 364926 0.00 0.00 mm_free
0.00 0.70 0.00 83438 0.00 0.00 mem_heap_hi
0.00 0.70 0.00 83438 0.00 0.00 mem_heap_lo
0.00 0.70 0.00 2753 0.00 0.00 mm_realloc
0.00 0.70 0.00 77 0.00 0.00 mem_reset_brk
0.00 0.70 0.00 77 0.00 0.00 mm_init
0.00 0.70 0.00 60 0.00 0.00 eval_mm_speed
0.00 0.70 0.00 11 0.00 0.00 free_trace
0.00 0.70 0.00 11 0.00 0.00 read_trace
0.00 0.70 0.00 6 0.00 0.00 fsecs
0.00 0.70 0.00 6 0.00 0.00 ftimer_gettod
0.00 0.70 0.00 6 0.00 0.00 mem_heapsize
0.00 0.70 0.00 5 0.00 0.00 malloc_error
0.00 0.70 0.00 1 0.00 0.00 init_fsecs
0.00 0.70 0.00 1 0.00 0.00 mem_init

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 42 / 244
gprof: Example
Call graph (explanation follows)

granularity: each sample hit covers 2 byte(s) for 1.43% of 0.70 seconds

index % time self children called


name
<spontaneous>
[1] 100.0 0.07 0.63 main [1]
0.43 0.00 41719/41719 add_range [2]
0.20 0.00 40042/40042 remove_range [3]
0.00 0.00 68756/369359 mm_malloc [5]
0.00 0.00 67076/364926 mm_free [6]
0.00 0.00 2753/2753 mm_realloc [9]
0.00 0.00 17/77 mem_reset_brk [10]
0.00 0.00 17/77 mm_init [11]
0.00 0.00 11/11 read_trace [14]
0.00 0.00 11/11 free_trace [13]
0.00 0.00 6/6 mem_heapsize [17]
0.00 0.00 6/6 fsecs [15]
0.00 0.00 5/5 malloc_error [18]
0.00 0.00 1/1 init_fsecs [19]
0.00 0.00 1/1 mem_init [20]
-----------------------------------------------
0.43 0.00 41719/41719 main [1]
[2] 61.4 0.43 0.00 41719 add_range [2]
0.00 0.00 83438/83438 mem_heap_lo [8]
0.00 0.00 83438/83438 mem_heap_hi [7]
-----------------------------------------------
0.20 0.00 40042/40042 main [1]
[3] 28.6 0.20 0.00 40042 remove_range [3]
-----------------------------------------------
0.00 0.00 369359/369359 mm_malloc [5]
[4] 0.0 0.00 0.00 369359 mem_sbrk [4]

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 43 / 244
Tools: Memory debuggers
Many memory errors are not fatal (i.e., they do not cause an
immediate crash and are therefore difficult to debug.
SPlint
Static check of C source files for possible coding errors (e.g.,
uninitialized values, unfreed temporaries, . . .)

Valgrind
Dynamic memory checker: the program is translated by a JIT which
then executed as in a VM

Purify
Commercial (IBM/Rational) dynamic checker.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 44 / 244
Project 2: A Process Scheduler
Summary
Implement a simple scheduler which supports priorities and
resource locking.

Goals
• Learn how a scheduler works
• Experiment different scheduling strategies
• Deal with resources and locks

Environment
• Unix like system
• C

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 45 / 244
Scheduling
When more than one process (or thread) is runnable an operating
system has to decide which one to run first.
When

1. synchronization request
2. resource conflict (locking)

Preemptive systems:
3. timer interrupt (end of time quantum)
Cooperative systems:
3. explicit call (yield)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 46 / 244
Scheduling: Batch systems
Usually non-preemptive, tasks run uninterrupted, no user
interaction.
Jobs to run are chosen from the wait queue with one of the following
criteria:
Strategies

• First Come First Served (FCFS): fair, may cause long waiting
times (also known as FIFO)
• Shortest Job First (SJF): requires knowledge about job length
• Longest Response Ratio: response ratio = (time in the system /
CPU time) depends on the waiting time
• Highest Priority First: with or without preemption
• Mixed: the priority is adjusted dynamically (time in queue, length,
priority, . . .)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 47 / 244
Scheduling: Problems
Starvation
A task is never scheduled (although ready)
Algorithm Starvation Throughput Comment Real life
FCFS no post office
SJF yes max requires job length VPP
LRR no
HPF yes fixed priorities
Mixed no

Deadlock
No task is ready (nor it will ever become ready)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 48 / 244
Scheduling: Preemptive systems
Each task runs for a small time quantum. After this time period the
kernel schedules the next process (round robin).
Quantum length
A short quantum gives high response time but has high context switch
costs.

Priorities
• Processes with high priorities are scheduled first
• Priorities can be adjusted dynamically (e.g., inversely
proportional to the time in the system).
• The time quantum could be made proportional to the process
priority.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 49 / 244
Scheduling: Context switch
Store and restore the state (context) a process.
Operations

• store the context


• choose the next process (scheduling strategy)
• accounting
• restore the context of the next process
• jump to the restored PC

Context
• program counter (PC) and CPU registers
• stack pointer
• state (new, running, waiting, . . . )

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 50 / 244
Scheduling: Performance
• CPU utilization
• Throughput
• number of jobs per time unit
• minimize context switch penalty
• Turnaround time
• tta = texit − tarrival
• execution, wait, I/O
• Response time
• tr = tstart − trequest
• Waiting time (I/O, waiting, . . . )
• Fairness

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 51 / 244
Scheduling: Goals
All systems

• Fairness: give every task a chance


• Policy enforcement
• Balance: keep all subsystems busy

Interactive systems

• Response time: respond quickly


• Proportionality: meet user’s expectations

Batch systems

• Throughput: maximize number of jobs


• Turnaround time: minimize time in system
• CPU utilization: keep CPU busy
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 52 / 244
Scheduling: States

• Running (executing, active): running


the process that is currently lock

running.
• Ready (runnable): processes
that are waiting to be waiting
scheduled. scheduled preempted

• Waiting (blocked): processes


that are waiting for a system
resource free
resource (e.g., locked, waiting
for I/O). ready

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 53 / 244
Scheduling: Additional states
• Created: the process is waiting to be put into the ready list
(relevant for real-time systems where the set of ready processed
is limited by the available system resources).
• Terminated: the process has terminated it’s execution but is kept
for accounting or dependencies on other processes.
• Swapped out: on system supporting virtual memory waiting and
blocked processes can be swapped out.
• Standby: on Windows systems the next-to-run process (ready
process with the highest priority). There is one standby process
for each CPU.
• Transition: on Windows systems a ready process which is
waiting for a swapped out resource.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 54 / 244
Scheduling: States and transitions
running terminated

lock

waiting
killed
scheduled preempted

resource free

ready created

waiting (swap) ready (swap)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 55 / 244
Scheduling: time quanta
Processes can be broadly classified as

• I/O-bound: processes that are often runnable for a short period


• CPU-bound: processes that do not require to be scheduled often
(and do not often require access to system resources)

Quanta
The size of a time quantum can therefore be dynamically changed to
reflect a process behavior.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 56 / 244
Scheduling: Process descriptors
A structure that stores information about the process state and
characteristics. It usually contains:

• process state
• PC, registers, . . . (context)
• identifier (PID)
• priority
• parent / children
• accounting information
• ...

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 57 / 244
Scheduling: SMP
Processor affinity
Try to dispatch a thread on the last processor the thread used.
+ good for caches
- might delay the redispatch

CPU
The choice of the CPU can be made with an heuristic based on:
• idle CPUs (load balancing)
• last CPU
• memory management of the thread
• time to invalidate the caches
• ...

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 58 / 244
Scheduling: Examples
UNIX
• preemption
• priority levels (round robin)
• each second the priorities are recomputed (CPU usage, nice
level, last run)

BSD
• every 4th tick priorities are recomputed (usage estimation)

Windows NT
• real time priorities: fixed, may run forever
• variable: dynamic priorities, preemption
• idle: last choice (swap manager)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 59 / 244
Scheduling: Example quanta and priorities
Win2K
• quantum = 20ms (professional) 120ms (user), configurable
• depending on type (I/O bound)

BSD
• quantum = 100ms
• priority = f(load,nice,timelast)

Linux
• quantum = quantum / 2 + priority

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 60 / 244
Scheduling: The Linux ≥ 2.6 scheduler
Constant time scheduler O(1)

• Run queues: one per CPU, runnable tasks


• Each run queue is composed by two priority arrays: priority and
expired
• Each array indexes a list of process per priority
• A bitmap indicates which priority level (140) has at least one task
• The scheduler chooses the highest priority task (if ≥ 1 RR)
• If the timeslice is over move to the expired array
• No more tasks → swap the arrays
• Run queues are protected by locks

J. Aas
Understanding the Linux 2.6.8.1 CPU Scheduler

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 61 / 244
Scheduling Lab: Environment
Simulator (provided) Scheduler
• Parses the command file (with • Receives event notifications
the event specs)
• Receives timing events (end
• Generates events at the of time quantum)
correct time
• performs scheduling
• Generates timing events decisions

Characteristics
• Events duration is a multiple of the time quantum.
• Processes have three fixed priority levels 0 (low), 1 and 2 (high).
• The system has a set of numbered resources that can be locked
(e.g., hardware components, protected regions, . . .).

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 62 / 244
Scheduling Lab: API
void sch schedule()
Called at the end of each time quantum. Must set int current to
the PID of the process chosen to be scheduled.

void sch start(int PID, int priority)

Notifies that a process will start at the beginning of the next time
quantum.
PID Process ID of the new process
priority The initial priority of the new process.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 63 / 244
Scheduling Lab: API
void sch exit (int PID)
Notifies that a process has terminated its execution and should not be
scheduled anymore.
PID Process ID of the terminated process

void sch renice(int PID, int priority)

Notifies that a process has a new priority level.


PID Process ID of the process to be reniced.
priority The new priority

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 64 / 244
Scheduling Lab: API
void sch locked (int PID, int res)
Notifies that a process wants to lock a system resource.
PID Process that requires a lock
res ID of the resource to lock

void sch unlocked (int PID, int res)


Notifies that a process wants to unlock a system resource.
PID Process that issues the unlock
res ID of the resource to unlock

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 65 / 244
Scheduling Lab: API
void sch finalize ()
Signals that no more processes will be scheduled and that memory
can be released.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 66 / 244
Scheduling Lab: Data structures
Process descriptors

• PID
• priority
• locks
• ...

Resources
Store if the resource is locked:
Process lists
• Ready (priorities: multiple queues)
• Waiting (pro resource)
• Running
• ...
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 67 / 244
Scheduling Lab: Steps
Phase I

1 Simple round robin


2 Consider process priorities
3 Lock and unlock resources

Phase II: not mandatory

• Evaluate different strategies


• Handle priority inversion
• Detect deadlocks
• Real-time scheduling
• Variable priorities

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 68 / 244
Scheduling Lab: Testing
Events are specified in a file with the following grammar (EBNF):

EventFile := { event }.
event := time action PID duration argument.

Where:
time start time of the event
action type of the event
• start for a new process
• lock for a lock on a resource
• renice
PID the PID of the process.
duration the run time
• of the process in case of a start
• spent in the locked region in case of a lock

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 69 / 244
Scheduling Lab: Testing
Events are specified in a file with the following grammar (EBNF):

EventFile := { event }.
event := time action PID duration argument.

Where:
argument
• the priority of the process in case of a start
• the resource to lock in case of a lock
• the new priority of the process in case of a renice

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 70 / 244
Scheduler Lab: Example
The following schedule:
0 start 1 3 1 /* @0 start PID_1, len 3, prio 1 */
1 start 2 2 2 /* @1 start PID_2, len 2, prio 2 */
1 lock 2 1 0 /* @1 (relative) lock res 0 for 1 */
will generate the following calls:
sch_start(1, 1); /* time 0 */
sch_schedule(); /* time 0: returns 1 */
sch_start(2, 2); /* time 1 */
sch_schedule(); /* time 1: returns 2 */
sch_locked(2, 0); /* time 1: 2 signals lock request
sch_schedule(); /* time 2: returns 2 */
sch_unlocked(2, 0); /* time 2: 2 signals unlock */
sch_exit(2); /* time 2: 2 exits */
sch_schedule(); /* time 3: returns 1 */
sch_exit(1); /* time 3: 1 exits */
sch_finalize(); /* simulation finished */
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 71 / 244
Scheduler Lab: Performance
To evaluate the performance of your scheduler we will compute (for
each trace):
Average turnaround time
How long a process stays in the system:
n−1
P
(texit (pi ) − tstart (pi ))
i:=0
ttat =
n

Average response time


How fast a process will have a CPU slot (reactivity):
n−1
P
(tschedule (pi ) − tstart (pi ))
i:=0
trt =
n
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 72 / 244
Scheduler lab: Hints
• Do not modify the test.c and test.h files:
you will just submit scheduler.c
• If no processes are available for scheduling set current to -1

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 73 / 244
Scheduler lab: Generating traces
generate.pl -- Generate scheduling traces

Usage:
generate.pl [OPTION] file

Options:
-d time max duration of a process
-h print this message
-p procs number of processes
-r prob probability of a renice for a given process
-s simple traces (no locks, no priorities)
-t time max delay between two start events
-v verbose

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 74 / 244
Scheduler lab: Generating traces
8 start 0 52 0
20 renice 0 0 0
9 lock 0 5 0
9 start 1 21 2
38 renice 1 0 2
11 lock 1 5 0
14 start 2 7 2
15 lock 2 1 0
18 start 3 44 1
22 renice 3 0 1
19 lock 3 1 0
18 start 4 78
18 lock 4 2 0
25 start 5 92 0
56 renice 5 0 0
27 lock 5 2 0
[...]

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 75 / 244
Comments on the malloc lab
• Coding style
• Coding errors
• Testing

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 76 / 244
Coding style
Your code should be understood by others
Variable names
Give variables clear and self-explanatory names: a and b are not
understandable (as an exception i is O.K. for indexes)

Comments
Comments should help to understand the code:
• explain the usage of a variable
• describe a function (arguments, return value and exception
handling)
• describe algorithms

Comments are not an option they are a part of the program.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 77 / 244
Coding style
goto

The goto statement generally produces code which is difficult to


manage and understand and should be avoided.
Many modern programming language lack a goto statment.

E. Dijkstra,
Go To Statement Considered Harmful,
Communications of the ACM,
11(3)147–148, 1968

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 78 / 244
Coding errors
Variable initialization
The C language lacks automatic variable initialization. A declaration
assigns memory to a variable but does not initialize it with any
particular value.
void *ptr0; // p is not initialized
void *ptr1 = NULL; // p is initialized

Source code
You are given the source code of the tester so that you can inspect it.
If you don’t know what mem sbrk returns in case of error check it
before assuming a random variable:
if ((incr < 0) || ((mem_brk + incr) > mem_max_addr)) {
errno = ENOMEM;
(void) fprintf(stderr, "snip");
return (void *) -1;
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 79 / 244
Testing
Testing an application is a difficult task.
Unit testing
Divide the application in units and test their functionality. You are
advised, for example, to test the functionality of your functions with
custom test cases.
Test cases
Think about possible problems and create special test cases for them.
Tools
Use tools to check for common coding errors:
• Compiler warnings (e.g., -Wall for gcc)
• Static checkers (e.g., splint)
• Memory analyzers (e.g., valgrind)
• Debuggers
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 80 / 244
Debugging
Compile your sources for debugging:
• remove optimizations (e.g., -O0 for gcc)
• include debugging symbols (e.g., -g for gcc)

gdb

• if not familiar with gdb use a GUI (e.g., ddd)


• look for a tutorial online

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 81 / 244
Filesystem lab: A Filesystem Driver
Summary
Implement a simplified driver for a FAT filesystem.

Goals
• Learn how an operating system deals with filesystems
• Explore a real and widely used filesystem and analyze his
advantages and shortcomings

Environment
• Unix like system
• C

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 82 / 244
Filesystems
Abstractions
To facilitate the access to data stored on disks filesystems usually
provide several layers of abstraction:
• A disk driver which allows to access sectors on the given media
• A filesystem driver which allows to access files and directories

File organization
The design of a file system comprises solutions to the following
problems:
• how can we group sectors in to files?
• how do we keep track of free space?
• how can I jump to a given location?
• how can read and write data?

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 83 / 244
Abstractions
Clusters (blocks)
Arrays of sectors
• user configured
• reduce the address space
• increase speed
• cause internal fragmentation

Disk
Array of sectors

File
Stream of bytes
• sequential access
• random access

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 84 / 244
Filesystems: Contiguous allocation
Files are allocated as a contiguous list of blocks:
• fast
• no internal fragmentation
• high external fragmentation
• problematic growth

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 85 / 244
Filesystems: Linked allocation
Files are allocated as a list of blocks:
• the linking information can be stored in the blocks or in a separate
table (e.g., FAT)
• internal fragmentation (but no external fragmentation)
• not reliable
• difficult positioning

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 86 / 244
Filesystems: Indexed allocation
• Fast positioning
• No external fragmentation
• High management overhead

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 87 / 244
Filesystems: Free Space Management
Bitmaps (e.g., HFS)

• bit vector to mark free blocks


• simple
• needs caching

Linked lists
• list of free blocks (similar to linked allocation)

Grouping

• free blocks contain n address of free blocks (similar to multilevel


indexing)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 88 / 244
FAT: Partition Layout

A FAT partition is composed by:


• Boot block
• FAT (one or more)
• Root directory
• Data (files and directories)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 89 / 244
FAT: Boot block

field offset length


jump (bootstrap) 0 3
OEM name 3 8
bytes per sector 11 2
sectors per cluster 13 1
reserved sectors 14 2
FATs 16 1
root entries 17 2
small sectors 19 2
media descriptor 21 1
sectors per FAT 22 2
sectors per track 24 2
heads 26 2
hidden sectors 28 4
large sectors 32 4

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 90 / 244
FAT: Table

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 91 / 244
FAT: Table
Each entry refers to a block and specify if the cluster is free, if the
cluster is bad or which is the next cluster in the given file.

value meaning
0x000 free
0xFF8–0xFFF EOF
0xFF7 bad cluster
number of the next cluster

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 92 / 244
FAT: Partition size

Block size FAT12 FAT16 FAT32


0.5 KB 2 MB
1 KB 4 MB
2 KB 8 MB 128 MB
4 KB 16 MB 256 MB 1 TB
8 KB 512 MB 2 TB
16 KB 1024 MB 2 TB
32 KB 2048 MB 2 TB

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 93 / 244
FAT: Directory

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 94 / 244
FAT: Directory

field offset length


name 0 8
extension 8 3
attribute 11 1
reserved 12 1
10ms unit 13 1
creation time 14 2
creation date 16 2
access date 18 2
high 16bits of the cluster (FAT32) 20 2
update time 22 2
update date 24 2
16bit cluster (first cluster) 26 2
size 28 4

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 95 / 244
FAT: Directory

bit attribute
0 read only
1 system
2 hidden
3 volume
4 directory

type directory volume


file pointers 0 0
subdirectory pointers 1 0
volume labels 0 1
long file name components 1 1

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 96 / 244
A simple disk driver
bios.c provides a simple disk-driver to access FAT disk images
using 512-bytes blocks.
int bios init(char * name)
initializes the disk driver with the given image.

void bios shutdown()


closes the image file

void bios read(int number, char * block)


reads a 512 bytes sector from the disk image

void bios write(int number, char * block)


writes a 512 bytes sector to the disk image.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 97 / 244
Goal
The goal of this lab is to implement a minimal FAT driver with the
following simplified API:
int fs open(char * path)

opens a file for reading specified by path for reading and writing and
return a file descriptor

int fs read(int fd, char * buff, int len)


reads len bytes from fd, puts them into buff and returns the actual
number of read bytes

int fs write(inf fd, char * buff, int len)


writes len bytes from buff to the file specified by the file descriptor
fd and returns the actual number of written bytes

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 98 / 244
Goal
void fs close(int fd)
closes the file specified by the file descriptor fd

int fs creat(char * path)

creates a new file specified by path and opens it for writing

Please note that this simple API does not allow to specify the read
position. A file must be sequentially read and written. In addition we
do not specify a way to delete a file.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 99 / 244
File descriptors
Your filesystem driver will deliver to user a file descriptor: a key to
uniquely identify a file.
The file descriptor will be then used to identify the kernel (or driver)
data structures relative to the given file.

user program

fd = open("file");
user space
fs driver
fd
*internal_data

file
pos
buffer
...

disk driver

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 100 / 244
Steps
Divide you work in steps:

1 Understand the filesystem layout


2 Read the boot sector and the structure of the FAT partition you
are handling (i.e., number of clusters, number of FATS and
position of the root directory).
3 Read the directory structure: parse the directory entries to
located the starting cluster of the handled file.
4 Access files: read and write the requested clusters

Test your work after each step!

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 101 / 244
Testing
We provide several images for testing:
• each image is FAT12 and contains only files and directories
named with the 8.3 schema
• For each image we provide a tarball with the image content
• You can inspect the image content with:
> hexdump -C partition.img
...
00008000 74 65 73 74 20 20 20 20 20 20 20 08 00 00 7d 72 |test ...}r|
00008010 b2 34 b2 34 00 00 7d 72 b2 34 00 00 00 00 00 00 |.4.4..}r.4......|
00008020 41 66 00 69 00 6c 00 65 00 00 00 0f 00 bc ff ff |Af.i.l.e........|
00008030 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................|
00008040 46 49 4c 45 20 20 20 20 20 20 20 20 00 64 7d 72 |FILE .d}r|
00008050 b2 34 b2 34 00 00 7d 72 b2 34 03 00 0d 00 00 00 |.4.4..}r.4......|
...

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 102 / 244
Create test images
Example for Linux:

# create an empty file


dd if=/dev/zero of=/tmp/disk.img \
count=<number of blocks>
# format the disk image
mkdosfs -f 2 -F 12 -r 512 -S 512 -n disk \
/tmp/disk.img
# mount the disk image
sudo mount -t vfat -o rw,loop=/dev/loop0 \
test.img /mnt
# copy the files
sudo cp -vr * /mnt/
# unmount the disk image
sudo umount /mnt

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 103 / 244
Hints
• The first cluster is cluster 2 (clusters 0 and 1 don’t exist)
• Names (8+3) are padded with spaces: "EXAMPLE "."TXT"
• FAT 12: two 12 bit entries are packed into three bytes:
uv.wx.yz =⇒ xuv, yzw
• Root dir position: reserved + number of FATs ·
sectors per FAT
• Root dir length: rootdir−entries·32
sector length
• Even if the file has a 8.3 name a long name entry is usually
created. You can recognize (and skip) this directory entries by
their attribute: 0x0F.
• FAT type:
• if there are less than 4085 clusters FAT12
• if less than 65525 FAT16
• otherwise FAT32
We provide FAT12 images

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 104 / 244
Hints: packed structures
Tell the compiler not to word align structures. GCC example:
struct fat_boot_sector {
__u8 ignored[3]; /* Boot strap short or near jump */
__u8 system_id[8]; /* Name - can be used to special case *
* partition manager volumes */
__u16 sector_size; /* bytes per logical sector */
__u8 sec_per_clus; /* sectors/cluster */
__u16 reserved; /* reserved sectors */
__u8 fats; /* number of FATs */
__u16 dir_entries; /* root directory entries */
__u16 sectors; /* number of sectors */
__u8 media; /* media code */
__u16 fat_length; /* sectors/FAT */
__u16 secs_track; /* sectors per track */
__u16 heads; /* number of heads */
__u32 hidden; /* hidden sectors (unused) */
__u32 total_sect; /* number of sectors (if sectors = 0) */
} __attribute__((packed));

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 105 / 244
Hints: driver.h
We provide:
• Definitions for the boot sector and directory entries:
• struct fat_boot_sector {
__u8 ignored[3];
__u8 system_id[8];
__u16 sector_size;
__u8 sec_per_clus;

• struct dos_dir_entry {
__u8 name[8], ext[3];
__u8 attr;
__u8 lcase;
__u8 ctime_cs;

• Constants for attributes: FILE ATTR RONLY,


FILE ATTR HIDDEN, . . .
• Disk driver (bios.c)
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 106 / 244
Malloc lab: Sample solutions
Red-black tree: 57 (util) + 40 (thru) = 97/100

• Best-fit algorithm
• Free blocks stored in a red-black tree (sorted by size)
• Freed blocks are temporarily stored in a list (blog) and put in the
tree before a malloc.
• Blocks have a header and a footer

Buddy (Jürg Billeter): 57 (util) + 40 (thru) = 97/100

• 128 free lists: one unsorted (similar to the blog), one for large
blocks
• Blocks have a header and a footer
• Best-fit algorithm

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 107 / 244
Red-Black Trees (Beyer 1972)
• Self-balancing binary search tree 11

• Search, insert and delete in O(log n)


8 42
• Good worst-case run time
• Properties:
1 10 12 100
• Each node is either red or black
• The root is black
• All leaves are black nil nil nil 150

• Children of a red node are black


• All paths from any node to its leaf nodes nil

contain the same number of black nodes.


• The longest path (root–leaf) is no more than twice the shortest
path.

R. Bayer
Symmetric binary B-Trees: Data structure and maintenance
algorithms
Acta Informatica, 1(4):290–306, Springer, December 1972
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 108 / 244
Red-Black Tree — Algorithm
free(b)
if next block is free then
coalesce
end if
if previous block is free then
coalesce
end if
put b in the blob

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 109 / 244
Red-Black Tree — Algorithm
malloc(size)
insert all the nodes of the blob into the tree
search a block (b) in the free tree
if no free blocks are available then
if the heap can be grown then
grow the heap and return the newly allocated block
else
error
end if
else
if the block can be split then
split the block
put the remainder in the blob
end if
return b
end if
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 110 / 244
Red-Black Tree — Algorithm
realloc(b, size)
if b = NULL then
return malloc(size)
end if
if size = 0 then
free(b)
return
end if
if we can expand to the next block1 then
coalesce the next block
return new block
end if
b2 = malloc(size)
copy b to b2
free(b)
1
This includes growing the heap if b is the last block
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 111 / 244
Buddy — Algorithm
free(b)
if next block is free then
coalesce
end if
if previous block is free then
coalesce
end if
put b in the unsorted list

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 112 / 244
Buddy — Algorithm
malloc(size)
sort the unsorted list
search a block (b) in the free lists
if no free blocks are available then
if the heap can be grown then
grow the heap and return the newly allocated block
else
error
end if
else
if the block can be split then
split the block
put the remainder in the corresponding list
end if
return b
end if
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 113 / 244
Buddy — Algorithm
realloc(b, size)
if b = NULL then
return malloc(size)
end if
if size = 0 then
free(b)
return
end if
if we can expand to the previous or next block then
coalesce the next block
return new block
end if
b2 = malloc(size)
copy b to b2
free(b)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 114 / 244
Sample solution: review
• Data structures: does not necessarily need to be complicated
(red-black tree vs. linear lists). Data characteristics (in our case
with a certain degree of homogeneity) have to be taken into
account.
• Lazy processing: blocks are reorganized only if needed

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 115 / 244
Malloc lab: Comments
realloc semantics
The newly returned block must preserve the content (up to its size).
Pseudocode:
new = malloc(size)
memcpy(new, old, min(sizeold , sizenew ))
free(old)
An efficient implementation could check if it is possible to enlarge the
current block.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 116 / 244
Malloc lab: Comments
Block format
size flags linking information
linking information size flags

payload payload

Put the linking information after the header: the memory can be used
in allocated blocks.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 117 / 244
RSDL: Rotating Staircase DeadLine
Characteristics
Starvation free, strict fairness, O(1), simple fixed accounting.

Principle

• Priority array (one ready list for priority)


• Each level has a priority quota: when the priority quota is used all
the remaining processes are moved to the lower priority level.
• Each process has a priority quota: when used the process is
moved to the lower priority level.
• When a process quantum is used the process is moved to the
expired array (see the O(1) scheduler by J. Aas).

Con Kolivas,
RSDL completely fair starvation free interactive cpu scheduler
http://thread.gmane.org/gmane.linux.kernel.ck/6462, March 2007
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 118 / 244
Project: Web Proxy
Summary
Implement a concurrent logging HTTP proxy:
1 sequential: one connection
2 concurrent: more than one simultaneous connection

Goals
• Learn how to deal with a network protocol
• Learn how an operating system deals with I/O
• Implement a concurrent application

Environment
• Unix like system
• C

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 119 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser proxy server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser proxy server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser proxy server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser proxy server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy
A proxy server is a program (or hardware device) that allow clients to
make indirect network connections to a server.

browser proxy server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 120 / 244
Proxy goals
• speed: a proxy could cache or prefetch pages to improve load
time
• logging: monitor traffic
• security:
• block connections or allow only certain connections
• log all the connection endpoints
• adaptation: transform the viewed pages:
• translation
• virus scanning
• improve readability (e.g., for people with disabilities)
• anonymity: hide the endpoint of the connection from the server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 121 / 244
HTTP
• Hypertext Transfer Protocol: patented, open internet protocol.
• RFC 2616: HTTP/1.1
• request/response protocol between clients and servers

Connection

Server Listen to a given TCP port (usually 80)


Client Connect to the server using TCP (usually port 80)
Client Send request (usually to get a page)
Server Send acknowledgment and requested text

R. Fielding, et al.
Hypertext Transfer Protocol
RFC 2616, June 1999

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 122 / 244
HTTP
A simple HTTP request is composed by a method a resource and
the version. Example:
HEAD /index.php HTTP/1.1
Methods:
GET request a page (idempotent, safe)
HEAD request a page header (idempotent, safe)
PUT request to store a page (idempotent)
POST append to a resource
DELETE delete a resource (idempotent)
TRACE echoes back the received request (idempotent, safe)
OPTIONS returns the supported methods (idempotent, safe)
CONNECT tell a proxy to just forward the data (e.g., SSL tunneling)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 123 / 244
HTTP — Example
Client request

GET /index.html HTTP/1.1


Host: www.example.com
empty line

Server answer

HTTP/1.1 200 OK
Date: Mon, 01 May 2006 22:38:34 GMT
Server: Apache/2.2.0 (Fedora)
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 124 / 244
The Logging Web Proxy — Tasks
• Listen on a random port p (1024 ≤ p ≤ 65536)
• Accept connections on port p
• Parse the request URI (server, port and resource)
• Connect to the server
• Forward the request
• Receive the reply
• Update the log file
• Send back the reply to the client

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 125 / 244
The Logging Web Proxy — Log format
Log format
You should log the date the IP of the client the requested URL and the
size of the request

Date: browserIP URL size

Example:
Wed 03 May 2006 09:40:44 CEST: 127.0.0.1
http://www.ethz.ch/ 25764

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 126 / 244
The Web Proxy — Connections
Your proxy will be both a server and a client:
• server: it will listen to the given port and accept connections
• client: it will (when requested) connect to the web server and get
the requested resource.

browser proxy server

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 127 / 244
Sockets
A socket is a software abstraction providing a standard API for
sending and receiving data across a computer network.

• A socket has an address and a port.


• A connection is identified by:
clientaddr:clientport ←→ serveraddr:serverport

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 128 / 244
Working with sockets
int socket(int domain, int type, int protocol);

socket() creates an endpoint for communication and returns a


descriptor.

domain AF INET: IPv4 Internet protocols


type SOCK STREAM: sequenced, reliable, two-way,
connection-based byte streams
protocol 0

Example

int fd;
if (fd = socket(AF_INET, SOCK_STREAM, 0) < 0) {
exit(EXIT_FAILURE);
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 129 / 244
Working with sockets
int bind(int sockfd, const struct sockaddr *my addr,
socklen t addrlen);
bind() gives the socket a local address.

sockfd the socket


my addr the address
addrlen the length of my addr

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 130 / 244
Working with sockets
struct sockaddr_in sin;

sin.sin_family=AF_INET;
sin.sin_port=8080;

/* all interfaces */
sin.sin_addr.s_addr=htonl(INADDR_ANY);

if (bind(fd, &sin, sizeof(sin)) < 0) {


exit(EXIT_FAILURE);
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 131 / 244
Working with sockets
int listen(int sockfd, int backlog);

listen() listens for connections on a socket


sockfd the socket
backlog maximum length of the queue of pending connections.

Example

if (listen(fd, Q_LEN) < 0) {


exit(EXIT_FAILURE);
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 132 / 244
Working with sockets
int accept(int sockfd, struct sockaddr *addr,
socklen t *addrlen);
accept() extracts the first connection on the queue and creates a
new non-listening socket.
sockfd the socket
addr address of the peer socket
addrlen length of addr

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 133 / 244
Working with sockets
struct sockaddr_in from;
socklen_t len;

/* while true -> infinite loop */


while (1) {
if ((sfd=accept(fd, &from, &len)) < 0) {
exit(EXIT_FAILURE);
}
if (!fork()) {
/* work with sfd */
exit(0);
}
close(sfd);
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 134 / 244
Getting information
man pages

• man 2 socket
• man 2 accept
• man 2 bind
• man 2 connect
• man 2 listen
• man 7 ip
• man 7 tcp

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 135 / 244
csapp.c
The file csapp.c provides the Robust reading and writing (RIO)
package (from Computer Systems, A Programmer’s Perspective,
CS:APP, 11.4).
Sometimes read and write transfer fewer bytes than
expected (EOF on read, reading from a terminal, network
sockets, . . .).

It provides:
• Wrappers for socket operations with additional checks
(Socket(), Accept(), Bind(), . . .)
• Client/server helper functions:
• int Open clientfd(char *hostname, int port)
• int Open listenfd(int port)
• Wrappers for the RIO buffered and unbuffered functions (see next
slide).

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 136 / 244
csapp.c — RIO package
Unbuffered:
• ssize t Rio readn(int fd, void *usrbuf, size t n):
read n bytes (unbuffered)
• ssize t Rio writen(int fd, void *usrbuf, size t
n): write n bytes (unbuffered)
Buffered:
• void Rio readinitb(rio t *rp, int fd): associate a
descriptor with a read buffer and reset the buffer
• ssize t Rio readnb(rio t *rp, void *usrbuf,
size t n): transfers n raw bytes from the buffer.
• ssize t Rio readlineb(rio t *rp, void *usrbuf,
size t maxlen) reads a line of text from the buffer.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 137 / 244
proxy.c
proxy.c provides a skeleton for the implementation of your proxy.

• int parse uri(char *uri, char *target addr, char


*path, int *port):
given a URI from an HTTP proxy GET request (i.e., a URL),
extract the host name, path name, and port.
• void format log entry(char *logstring, struct
sockaddr in *sockaddr, char *uri, int size):
create a formatted log entry in logstring.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 138 / 244
Sockets — Overwiew
client server

socket socket

bind open_listenfd
open_clientfd

listen

connection
request
connect accept

rio_writen rio_readlineb

await connection
rio_readlineb rio_writen from the next client

EOF
close rio_readlineb

close

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 139 / 244
Testing with telnet
You can test the connection to your proxy with telnet server
port. Example:
> telnet www.google.com 80
Trying 66.249.85.99...
Connected to www.google.com.
Escape character is ’ˆ]’.
GET /index.html HTTP/1.1
Host: www.google.com
empty line
HTTP/1.1 302 Found
Location: http://www.google.ch/index.html
Content-Type: text/html
Server: GWS/2.1
Content-Length: 228
Date: Thu, 04 May 2006 13:24:45 GMT

<HTML>

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 140 / 244
Testing with a browser
• Firefox/Mozilla/Explorer: proxy settings (127.0.0.1:port)
• Command line tools: set the HTTP PROXY environment variable.
Example (bash):
$ export HTTP PROXY=127.0.0.1:8888
$ links --dump www.ethz.ch

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 141 / 244
Hints — Signals
Signal
A signal is a message from the OS to signal that en event of some
type has occurrend
sighandler t signal(int signum, sighandler t
handler)
Installs a new signal handler for the signal with number signum.
signum signal number (see signal(7))
handler signal handler (SIG IGN: ignores the signal, SIG DFL:
default action)

SIGPIPE
If you attempt to write to a closed connection twice, the system will
generate a SIGPIPE (broken pipe) signal and terminate your
application.
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 142 / 244
Hints — Signals: SIGPIPE
Hint: catch SIGPIPE signals to avoid crashes

void handler(int sig) {


fprintf(stderr, "Caught SIGIPE\n");
}

/* ... */
/* Install the signal handler */
if (signal(SIGPIPE, handler) == SIGERROR) {
fprintf(stderr, "Error: cannot install
signal handler");
exit(EXIT_FAILURE);
fi
/* ... */

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 143 / 244
A concurrent proxy
Your proxy should be able to handle more than one simultaneous
connection.
General structure:
• The main program listens for incoming connections
• For each new connection: generate a new thread (or process) to
handle it.
Some system calls or data structures are not thread safe.
Examples:
resource the log file
call gethostbyaddr (and consequently open clientfd)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 144 / 244
Thread unsafe functions
CS:APP defines four classes of thread-unsafe functions:

1 functions that do not protect shared variables


2 functions that keep state across invocations
3 functions that return a pointer to a static variable
4 functions that call a thread-unsafe function

Unsafe calls should be protected by a semaphore

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 145 / 244
Threads — A very short summary
Threads
Simultaneously running tasks with a common memory space.

Problems
• resource protection
• thread synchronization
• deadlocks

(Binary) semaphores
One activity (thread) at a time is allowed to get the resource
P wait for the semaphore
critical section
V release the semaphore

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 146 / 244
Threads and processes

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 147 / 244
Filesystem lab — Deadline extension
The deadline for the filesystem lab is extended to 2007-05-20
Date Topic Lab Notes
2007-03-19 memory allocation First meeting
memory allocation malloc lab
2007-03-26
scheduling
2007-04-02 scheduling
2007-04-09 Easter Monday
2007-04-16 Sechseläuten scheduler lab
2007-04-23 file systems
file systems
2007-04-30 memory allocation
scheduling
filesystem lab
2007-05-07 HTTP proxy
2007-05-14 file systems, scheduling
HTTP proxy
2007-05-21
proxy lab
2007-05-28 Whit Monday
2007-06-04
2007-06-11
2007-06-8 optional labs Last meeting
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 148 / 244
Filesystem — Hints and pitfalls
FAT 12 entries
FAT 12: two 12 bit entries are packed into three bytes:
uv.wx.yz =⇒ xuv, yzw
Example:
[...]
00000200 f8 ff ff 00 f0 ff 05 60 00 07 80 00 09 a0 00 0b
00000210 c0 00 ff ff ff ff 0f 00 00 00 00 00 00 00 00 00
00000220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[...]

Entry Value Comment


0 0xff8
1 0xfff
2 0x000 free
3 0xfff EOF
4 0x005
5 0x006

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 149 / 244
Filesystem — Hints and pitfalls
Simplifications

• You can limit yourself to FAT12 images


• The driver should be able to create new files but not necessarily
new directories
• You can ignore long file names
• Required operations:
• Open and close a file
• Open a file, read the content, close it
• Open a file in a subdirectory, read the content, close it
• Create a file, write “Hello world!” and close it
• Create a file in a subdirectory, write “Hello world!” and close it
• Work with more than one open file at the same time
• Create a file with more than 2K of content and close it
• Append content to an existing file
A .command file will be posted on the blog

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 150 / 244
POSIX Threads — pthreads
POSIX standard for threads (API for creating and manipulating
threads).
In a UNIX environment threads:
• exist within a process (use the process resources)
• have their own independent flow of control
• duplicate only the essential resources
• die if the parent process dies

Resource sharing:
• changes to system resources (e.g., closing a file) are seen by all
other threads
• two pointers having the same value point to the same data.
• reading and writing to the same memory locations is possible:
explicit synchronization is required.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 151 / 244
pthreads
Threads share: Each thread has a unique:
• instructions
• thread id
• memory space
• set of registers, stack pointer
• file descriptors
• stack, return addresses
• signals and signal handlers
• signal mask
• user and group id
• priority

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 152 / 244
pthreads — A couple of examples
From csapp.c:
void Pthread create(pthread t *tidp, pthread attr t
*attrp, void * (*routine)(void *), void *argp)
Creates a new thread
tidp the thread
attrp attributes (e.g., NULL)
the start routine of the thread
argp the arguments of the routine

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 153 / 244
pthreads — A couple of examples
void Pthread detach(pthread t tid)

Indicates that storage for the thread can be reclaimed when that
thread terminates.
tid the thread

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 154 / 244
pthreads — A couple of examples
Example

void bar() {
char *string = "hello";
pthread_t thread1 = NULL;
Pthread_create(&thread1,
NULL,
foo,
(void*) string);
}

void foo(char * s) {
Pthread_detach(pthread_self())
/* ... */
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 155 / 244
pthreads — A couple of examples
void Pthread cancel(pthread t *tidp)

Cancel execution of a thread


tidp the thread

void Pthread join(pthread t tid, void


**thread return)
Suspend execution until the target thread terminates
tidp the thread
thread return the return value of Pthread exit

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 156 / 244
pthreads — A couple of examples
Example

pthread_t thread1, thread2;

ret1 = Pthread_create(&thread1, NULL, foo, NULL);


ret2 = Pthread_create(&thread2, NULL, foo, NULL);

/* Wait until the threads are complete */

Pthread_join(thread1, NULL);
Pthread_join(thread2, NULL);

exit(0);

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 157 / 244
Semaphores — Short summary
A semaphore is a protected variable that controls access to a critical
section: the value of the variable represents the number of processes
that can concurrently lock the semaphore.
P(s) await s > 0, then s := s-1
V(s) s := s+1
Init(s, v) s := v
Both P and V must be atomic (can be implemented with test-and-set
operations).
Problems
Primitive synchronization mechanism in many operating systems. No
protection against mistakes:
• there is no guarantee that each access to a protected variable is
guarded by a semaphore
• no guarantee that a lock is released
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 158 / 244
Semaphores alternatives
Monitors
• a set of procedures that allow interaction with the shared resource
• a mutual exclusion lock
• the variables associated with the shared resource
• an invariant defining the assumptions needed to avoid race
conditions

Example: Java
class Test {
private int shared;
void synchronized foo() {
/* access shared */
}
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 159 / 244
POSIX Semaphores
Semaphores store a value corresponding to maximum number of
activities that can access the resource.

• sem close: deallocates the semaphore


• sem destroy: destroys an unnamed semaphore
• sem getvalue: gets the value of a semaphore
• sem init: initializes an unnamed semaphore
• sem open: opens/creates a named semaphore
• sem post: unlocks a locked semaphore
• sem trywait: performs a semaphore lock only if it can lock the
semaphore without waiting for another process to unlock it
• sem unlink: removes a named semaphore
• sem wait: performs a lock on a semaphore

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 160 / 244
POSIX Semaphores
• unnamed semaphores are used within the same memory space
while named semaphores are used with multiple processes.
• semaphores are global entities (not associated with any
process).
• POSIX semaphores are persistent (the value of a semaphore is
preserved after a close).

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 161 / 244
POSIX Semaphores — Example
sem_t mutex;

sem_init(&mutex, 0, 1);

/* ... */

sem_wait(&mutex);

/* critical section */

sem_post(&mutex);

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 162 / 244
Debugging multithreaded programs
GDB
• automatic notification of new threads
• thread threadno: switch among threads
• info threads: inquire about existing threads
• thread apply [threadno] [all] args: apply a command
to a list of threads
• thread-specific breakpoints

DDD
• currently active threads: Status −→ Threads.
• the current thread is highlighted
• select any thread to make it the current thread.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 163 / 244
Debugging multithreaded programs
Problem: since there is no information on how programs will be
scheduled it is difficult to repeat errors.
Some approaches modify the OS in order to have a reproducible
execution trace. This may include the recording (and reproduction)
of:
• scheduling timing and decisions
• signals

I’ll try to make a short introduction toward the end of the semester if
the course schedule will allow it.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 164 / 244
Debugging network applications
Privacy

• Monitor your traffic only


• Respect the privacy of other users

tcpdump

dumps information on network packets.


Examples:
• tcpdump host www.google.com
• tcpdump dst port 80
• tcpdump proto tcp

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 165 / 244
tcpdump example
# tcpdump host www.google.com
listening on en0, link-type EN10MB (Ethernet), capture size 96 bytes
08:39:21.151736 IP src.58239 > dst.http: S 3886383149:3886383149(0)
win 65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 1645559094 0>
08:39:21.157868 IP dst.http > src.58239: S 1960310735:1960310735(0)
ack 3886383150 win 8190 <mss 1460>
08:39:21.158004 IP src.58239 > dst.http: . ack 1 win 65535
08:39:21.203720 IP src.58239 > dst.http: P 1:428(427) ack 1 win 65535
08:39:21.209715 IP dst.http > src.58239: . ack 428 win 7763
08:39:21.210489 IP dst.http > src.58239: . ack 428 win 6432
08:39:21.254266 IP dst.http > src.58239: P 1:550(549) ack 428 win 6432
08:39:21.254432 IP src.58239 > dst.http: . ack 550 win 65151
08:39:21.271985 IP src.58240 > dst.http: S 760319562:760319562(0)
win 65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 1645559094 0>
08:39:21.279091 IP dst.http > src.58240: S 3146126223:3146126223(0)
ack 760319563 win 8190 <mss 1460>
08:39:21.279194 IP src.58240 > dst.http: . ack 1 win 65535
08:39:21.279418 IP src.58240 > dst.http: P 1:427(426) ack 1 win 65535
08:39:21.288789 IP dst.http > src.58240: . ack 427 win 7764
08:39:21.291592 IP dst.http > src.58240: . ack 427 win 6432

Abbreviations: src source IP, dst destination IP (66.249.85.99)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 166 / 244
tcpdump example
# tcpdump -A host www.google.com
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
10:51:37.787765 IP src.36989 > dst.http:S 2333173671:2333173671(0)
win 5840 <mss 1460,sackOK,timestamp 2147888 0,nop,wscale 2>
E..<..@.@.E...9_B.Uh.}.P..g....................
. .0........
10:51:37.793757 IP dst.http > src.36989:S 2140124340:2140124340(0)
ack 2333173672 win 8190 <mss 1460>
E..,.#.....cB.Uh..9_.P.}......g.‘...m.........
10:51:37.793788 IP src.36989 > dst.http:. ack 1 win 5840
E..(..@.@.E...9_B.Uh.}.P..g.....P.......
10:51:37.793964 IP src.36989 > dst.http: P 1:170(169) ack 1 win 5840
E.....@.@.EI..9_B.Uh.}.P..g.....P...T...GET / HTTP/1.1
Host: www.google.com
User
10:51:37.800087 IP dst.http > src.36989: . ack 170 win 8021
E..(.......fB.Uh..9_.P.}......hQP..U..........
10:51:37.801042 IP dst.http > src.36989: . ack 170 win 5720
E..(....8...B.Uh..9_.P.}......hQP..X..........
10:51:37.891655 IP dst.http > src.36989: P 1:550(549) ack 170 win 5720
E..M....8...B.Uh..9_.P.}......hQP..X....HTTP/1.1 302 Found

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 167 / 244
ethereal

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 168 / 244
Scheduler lab: example solution
Resources
struct resource_desc {
int status; /* locker or -1 if free */
struct p_desc **wait_list;
size_t wait_size;
int no_wait;
};

Processes
struct p_desc {
int PID;
int priority;
struct p_desc *next;
struct p_desc *previous;
int locks16;
int n_locks;
};

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 169 / 244
Scheduler lab: example solution
Resources
/* list of ready lists (one per priority) */
struct p_desc **ready_list = NULL;
struct p_desc *wait_list = NULL;
struct p_desc **processes = NULL;
extern int current;
struct resource_desc resource[RESOURCES];

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 170 / 244
Scheduler lab: example solution
sch schedule

void sch_schedule() {
int i;
init();
for(i=PRIORITIES-1; i>=0; i--) {
if (ready_list[i] != NULL) {
if (current_proc[i] == NULL) {
/* start */
current_proc[i] = ready_list[i];
}
/* round robin */
current_proc[i] = current_proc[i]->next;
current = current_proc[i]->PID;
return;
}
}
current = -1;
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 171 / 244
Scheduler lab: example solution
sch locked (simplified)

void sch_locked (int PID, int res) {


/* If the resource is free lock it otherwise
* - remove from the ready list
* - put in the wait list of the specified resource
*/ if (resource[res].status == -1) {
resource[res].status = PID;
} else if (resource[res].status != PID) {
/* remove from the ready list */
remove_ready_proc(processes[PID]);
/* put the process in the wait list for "res" */
if (resource[res].no_wait >= resource[res].wait_size) {
/* grow the wait list */
resource[res].wait_size++;
resource[res].wait_list =
realloc(resource[res].wait_list,
sizeof(struct p_desc *)*resource[res].wait_size);
}
resource[res].wait_list[resource[res].no_wait++] = processes[PID];

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 172 / 244
Scheduler lab: example solution
sch locked (simplified)

void sch_unlocked(int PID, int res) {


/* Unlock resource and lock it for the first process waiting for it */
if (resource[res].status == -1) {
// print error
exit(EXIT_FAILURE);
} else if (resource[res].status != PID) {
// print error
exit(EXIT_FAILURE);
} else {
if (resource[res].no_wait > 0) {
insert_ready_proc(
resource[res].wait_list[--resource[res].no_wait]
);
resource[res].status =
resource[res].wait_list[resource[res].no_wait]->PID;
} else {
resource[res].status = -1;
}
}
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 173 / 244
Filesystem Lab: Example solution
File table
struct file_table_entry {
__u16 cluster;
__u16 first_cluster; /**< the first cluster of the file */
__u32 position; /**< the current pos in the file */
__u32 size; /**< the size of the file */
__u32 dir_entry_offset; /**< pos. of the dir entry: size */
__u16 dir_entry_cluster; /**< pos. of the dir entry: cluster */
};

...
fte = (struct file_table_entry *)malloc(
sizeof(struct file_table_entry)*MAX_FILES
);
...

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 174 / 244
Filesystem Lab: Example solution
Reading FAT entries

static __u16 get_fat_entry(__u16 i) {


// uv.wx.yz -> xuv, yzw
if ((index & 0x1) == 1) {
/* odd index */
next_index = i + 1;
index =
(FAT[next_index + next_index / 2 - 1] << 4) +
((FAT[next_index + next_index / 2 - 2] & 0xF0) >> 4);
} else {
/* even index */
index =
FAT[i + i / 2 ] +
(FAT[i + i / 2 + 1] & 0xF) << 8;
}
return index;
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 175 / 244
Filesystem Lab: Example solution
Reading clusters

static void get_cluster(__u16 cluster_number, void *buffer) {


__u8 sector; /* current sector */
// #define CLUSTER2SECTOR(cluster)
// (cluster-FIRST_CLUSTER)*(int)fbs.sec_per_clus
// #define DIR_SIZE
// (fbs.dir_entries*
// sizeof(struct dos_dir_entry))/fbs.sector_size
// #define ROOT_DIR fbs.reserved+fbs.fats*fbs.fat_length
for (sector = 0;
sector < fbs.sec_per_clus;
sector = sector+1) {
bios_read((int)(ROOT_DIR +
DIR_SIZE +
CLUSTER2SECTOR(cluster_number) +
sector),
buffer+
(sector * fbs.sector_size));
}
}
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 176 / 244
Filesystem Lab: Example solution
Reading clusters

static void get_date_and_time(__u16 * dos_time,


__u16 * dos_date) {
time_t cur_time_t = time(NULL);
struct tm * cur_time = localtime(&cur_time_t);
/* 15-09 year (0 = 1980, 127 = 2107)
* 08-05 month (1 = January, 12 = December)
* 04-00 day (1 - 31) */
* dos_date = cur_time->tm_mday;
*dos_date |= ((cur_time->tm_mon+1) << 5) & 0x01e0;
*dos_date |= ((cur_time->tm_year-80) << 9) & 0xfe00;
/* 15-11 hours (0-23)
* 10-05 minutes (0-59)
* 04-00 seconds/2 (0-29)*/
*dos_time = cur_time->tm_sec / 2;
*dos_time |= (cur_time->tm_min << 5) & 0x07d0;
*dos_time |= (cur_time->tm_hour << 11) & 0xf900;
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 177 / 244
Filesystem Lab: Example solution
Writing (simplified)

int fs_write(int fd, void *buffer, int len) {

/* get the file table entry from the file descriptor */


fte = (struct file_table_entry *)file_table[fd];

if (fte->position != fte->size) {
/* the file is not empty */
fte->position = fte->size;
fte->cluster = get_cluster_number(fte->first_cluster, fte->p
}

/* allocate a buffer for the RW operations */


disk_buffer = (char *)malloc(BUFFER_SIZE);

[...]

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 178 / 244
Filesystem Lab: Example solution
[...]
/* compute the current location in the current cluster */
t_rem = len - t_wri;
while (t_rem) {

get_cluster(fte->cluster, disk_buffer);

cl_offset = ((int)fte->position) % ((int)CLUSTER_SIZE);


cl_remain = (CLUSTER_SIZE - cl_offset);

/* don’t write more that requested */


if (cl_remain > t_rem) { cl_remain = t_rem; }

/* copy the buffer at the right location */


memcpy(disk_buffer + cl_offset,
buffer + t_wri,
(size_t) cl_remain);

/* write back the cluster */


put_cluster(fte->cluster, disk_buffer);
[...]
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 179 / 244
Filesystem Lab: Example solution
[...]
t_rem -= cl_remain;
fte->position += cl_remain;
t_wri += cl_remain;
fte->size = fte->position;

if (t_rem) { /* we need another cluster */


fte->cluster = append_FAT_cluster(fte->cluster, FAT1, FAT2)
}
}
[...]

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 180 / 244
Filesystem Lab: Example solution
[...]
/* update dir structures */
dep = malloc(sizeof(struct dos_dir_entry_pos));

dep->cluster = fte->dir_entry_cluster;
dep->offset = fte->dir_entry_offset;
dep->de = NULL;

read_directory_entry(&dep);

get_date_and_time(&dos_time, &dos_date);
dep->de->size = fte->size;
dep->de->date = dos_date;
dep->de->time = dos_time;

write_directory_entry(dep);
[...]
return (int)t_wri;

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 181 / 244
Filesystem Lab: Doxygen

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 182 / 244
Filesystem Lab: Example solution
Example: functions

/** Gets the cluster number at a given postion in a file


* @param first the first cluster of the file
* @param position the position in the file
* @return the cluster that contains the specified pos
*/
static __u16 get_cluster_number(__u16 first, int position)

Some tags:
• @param: a function parameter
• @return: the return value

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 183 / 244
Filesystem Lab: Example solution
Example: data structures

/** File table entry


*/
struct file_table_entry {
__u16 cluster; /**< the cluster to read from */
__u16 first_cluster; /**< the first cluster of the file */
__u32 position; /**< the current pos. in the file */
__u32 size; /**< the size of the file */
__u32 dir_entry_offset; /**< pos. of the dir entry: size */
__u16 dir_entry_cluster; /**< pos. of the dir entry: cluster *
(0 if root dir) */
};

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 184 / 244
Filesystem Lab: Example solution
Example: varia
Some tags:
• @todo: annotate a missing part
• @bug: annotate a bug
• @def: a macro
• @file: a source file

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 185 / 244
Scheduler lab: Additions

1 Evaluate different strategies


2 Handle priority inversion
3 Process syncing
4 Detect deadlocks
5 Variable priorities
6 Real-time scheduling

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 186 / 244
Scheduler lab: Strategies
Criteria
Good performance
• maximize CPU utilization
• maximize throughput
• minimize turnaround time
• minimize waiting time
• minimize response time

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 187 / 244
Scheduler lab: Strategies
Task
• Implement different scheduling strategies and evaluate the
system’s performance
• Generate scheduling traces to test starvation and particular
cases.
• Submit a written report

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 188 / 244
Scheduler lab: Priority inversion
A low priority process holds a lock on a resource needed by a high
priority process.

Priority

High

Medium

Low

Resource

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 189 / 244
Scheduler lab: Priority inversion
A low priority process holds a lock on a resource needed by a high
priority process.

Priority

High

Medium

Lock
Low

Resource

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 189 / 244
Scheduler lab: Priority inversion
A low priority process holds a lock on a resource needed by a high
priority process.

Priority

High

Medium

Lock
Low

Resource

Schedule

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 189 / 244
Scheduler lab: Priority inversion
A low priority process holds a lock on a resource needed by a high
priority process.

Priority

High

Medium

Lock
Low

Resource

Schedule

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 189 / 244
Scheduler lab: Priority inversion
A low priority process holds a lock on a resource needed by a high
priority process.

Priority
Lock
High

Medium

Lock
Low

Resource

Schedule

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 189 / 244
Scheduler lab: Priority inversion
A low priority process holds a lock on a resource needed by a high
priority process.

Priority
Lock
High

Medium

Lock
Low

Resource

Schedule Schedule

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 189 / 244
Scheduler lab: Priority inversion
A low priority process holds a lock on a resource needed by a high
priority process.

Priority
Lock
High

Medium

Lock
Low

Resource

Schedule Schedule

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 189 / 244
Scheduler lab: Avoiding priority inversion
Priority ceiling
Each resource is assigned a priority ceiling, which is a priority equal
to the highest priority of any task which may lock the resource.
When a task locks the resource, its priority is temporarily raised to the
priority ceiling

Priority inheritance
The priority of a process is increased to the maximum of the priorities
of any process waiting for any resource that the process has a
resource lock on.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 190 / 244
Scheduler lab: Mars pathfinder (Case study)
• Rover on mars surface (1997)
• RT OS: Wind River VxWorks
• preemptive priority scheduling
• RT scheduling algorithm (e.g.,
Earliest Deadline First)
• Resource:
• information bus (information
passing)
• Tasks involved (locking the bus):
high priority Bus management, runs frequently
medium priority Communication
low priority Meteorological data gathering, run infrequently

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 191 / 244
Scheduler lab: Mars pathfinder (Case study)
• It was possible that the medium priority task would delay the low
priority task which was blocking the high priority task.
• A watchdog noticing that the bus management routine was
missing its deadline was periodically resetting the system.
• Problem: the priority inheritance flag of the communication bus
was switched off.
• Fix: VxWorks allows to upload patches which are applied
on-the-fly.

L. Sha, R. Rajkumar, and J. P. Lehoczky.


Priority Inheritance Protocols: An Approach to Real-Time
Synchronization
In IEEE Transactions on Computers, vol. 39, pp. 1175-1185,

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 192 / 244
Scheduler lab: Priority inversion
Task
• Modify your scheduler to avoid priority inversion
• Generate scheduling traces to test the correctness of your
implementation

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 193 / 244
Scheduler lab: Process syncing
Processes are allowed to sync (simplified API):
void wait(int PID)
Waits for process PID to finish.

void wait signal(int s)

Wait for signal s.

void notify(int s)

Send signal s.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 194 / 244
Scheduler lab: Process syncing
Task
• Implement three new system calls (wait, wait signal and
notify)
• Generate scheduling traces to test the correctness of your
implementation

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 195 / 244
Scheduler lab: Coffman conditions
E. G. Coffman, 1971:

1 mutual exclusion: a resource is locked by one process or it is


available
2 hold and wait: processes already locking resources may request
new resources
3 no preemption: only a process locking a resource may release it
4 circular wait: circular chain of locks: each process waits for a
resource that the next process in the chain holds

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 196 / 244
Scheduler lab: Resource graphs
Directed graph containing two types of nodes:
• processes
• resources

A directed edge from a process pi to a resource ri means that pi


requested a lock on ri .
A directed edge from a resource ri to a process pi means that ri has
be locked by pi .
A cycle in the resource graph implies that a deadlock has
occurred.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 197 / 244
Scheduler lab: Resource graphs
Example:
• 3 processes (P1, P2 and P3) and 3 resources (R1, R2 and R3)

R1 locked by P2
R2 locked by P1 R2 P1

R3 locked by P3
P1 request lock on R1
R1 P2
P2 request lock on R2
and R3
P3 request lock on P3 R3

P3

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 198 / 244
Scheduler lab: Resource graphs
Example:
• 3 processes (P1, P2 and P3) and 3 resources (R1, R2 and R3)

R1 locked by P2
R2 locked by P1 R2 P1

R3 locked by P3
P1 request lock on R1
R1 P2
P2 request lock on R2
and R3
P3 request lock on P3 R3

P3

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 198 / 244
Scheduler lab: Resource graphs
Example:
• 3 processes (P1, P2 and P3) and 3 resources (R1, R2 and R3)

R1 locked by P2
R2 locked by P1 R2 P1

R3 locked by P3
P1 request lock on R1
R1 P2
P2 request lock on R2
and R3
P3 request lock on P3 R3

P3

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 198 / 244
Scheduler lab: Detect cycles
Topological sort O(n):
Q := set of nodes with no incoming edges
while Q 6= ∅ do
get n from Q
for all e : n → m do
remove e from the graph
if m has no other incoming edges then
Q := Q ∪ m
end if
end for
end while
if graph has edges then
cycle detected
end if

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 199 / 244
Scheduler lab: Deadlock recovery
One of the four Coffman conditions has to be removed:

1 mutual exclusion: a resource is locked by one process or it is


available
2 hold and wait: processes already locking resources may request
new resources
3 no preemption: only a process locking a resource may release it
4 circular wait: circular chain of locks: each process waits for a
resource that the next process in the chain holds

Choose a process in the deadlock cycle (randomly, depending on


running-time, priority, ...) and disable it.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 200 / 244
Scheduler lab: Deadlock recovery
Task
• Detect deadlocks
• Recover from deadlocks trying to minimize the damage (kill as
few processes as possible).
• Generate scheduling traces to test the correctness of your
implementation

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 201 / 244
Scheduler lab: Variable priorities
Starvation
A process with a low priority could never be executed

Solution
Periodically increase the priority of each process:
• after how much time?

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 202 / 244
Scheduler lab: Deadlock priorities
Task
• Avoid starvation by adapting process priorities.
• Generate scheduling traces to test the correctness of your
implementation

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 203 / 244
Scheduler lab: Real-Time Systems
• The correctness of the system depends on functionality and
timeliness.
• Speed and performance are less important than temporal
aspects and predictability.

Hard RT systems
The response time is specified as an absolute value (from the
environment).

Soft RT systems
The response time is specified as an average value.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 204 / 244
Scheduler lab: Real-Time Scheduling
Each process has a deadline and a worst-case execution time (i.e.
maximum duration).
The scheduler has to guarantee that that each process will finish
before it’s deadline.
Scheduling strategies:
Dynamic: Earliest Deadline First
The process with the earliest deadline is chosen.

Static: Rate-Monotonic
Static-priorities (shorter deadlines are given higher priorities), no
resource sharing, CPU utilization ≈ 70%

C. L. Liu and J. Layland.


Scheduling algorithms for multiprogramming in a hard real-time
environment
Journal of the ACM, 10(1), 1973.
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 205 / 244
Scheduler lab: Issues
Worst-Case Execution Time
The computation of the WCET is a hard problem: automatic/manual
estimations.

Estimation
• overestimation: waste of resources (CPU)
• underestimation: some deadline will be missed (probable crash)

Techniques

• manual annotations
• abstract interpretation
• data-flow analyzes

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 206 / 244
Scheduler lab: Real-time scheduling
Task
• Change the scheduler API to support the submission of the
WCET deadlines and periodicity.
• Implement the Earliest Deadline First scheduling strategy.
• Reject jobs if there are not enough resources
• Generate scheduling traces to test the correctness of your
implementation.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 207 / 244
Project 5: Automatic parallelization
Summary
Use OpenMP to automatically parallelize a program and measure the
speedup

Goals
• Learn how to automatically parallelize programs
• Experiment with OpenMP

Environment
• Unix like system
• C, OpenMP

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 208 / 244
OpenMP
Open Multi-Processing

• API supporting multiplatform shared memory multiprocessing


(Unix and Windows).
• Set of: compiler directives, libraries and and environment
variables.
• Parallel code sections are executed in parallel using several
threads and are managed by the runtime environment.

Implementations

• Visual C++ 2005


• Intel compilers
• Sun Studio
• GCC 4.2 (or GCC 4.1 on some RH platforms)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 209 / 244
OpenMP
Advantages

• simple: need not deal with message passing


• automatic data layout and decomposition
• incremental parallelism
• unified code for both serial and parallel applications

Disadvantages

• only runs efficiently in shared-memory multiprocessor platforms


• requires a compiler that supports OpenMP
• scalability is limited by memory architecture
• synchronization between a subset of threads is not allowed

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 210 / 244
OpenMP: Model

parallelized code

fork

join

parallelized code
fork

join

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 211 / 244
Directives
OpenMP directives are expressed using pragmas:
#ifdef _OPENMP
#pragma omp directive
#endif
section
Defines a thread
#pragma
#omp parallel sections num_threads(2)
{
#pragma omp section
{ /* thread-1 */ }
#pragma omp section
{ /* thread-2 */ }
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 212 / 244
Directives
Loops
Loops can be automatically parallelized
#pragma omp parallel for shared(A, row, col)
for (i = k+1; i<SIZE; i++) {
for (j = k+1; j<SIZE; j++) {
A[i][j] = A[i][j] - row[i] * col[j]; } }

Data reduction
Data from different threads can be merged
sum = 0;
#pragma omp parallel for reduction(+: sum)
for (i = 0; i<NUM_STEPS; i++) {
x = 2.0 * (double)i / (double)(NUM_STEPS);
sum += x * x / NUM_STEPS; }

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 213 / 244
Directives
Critical
Defines a critical section (only one thread at a time)
#pragma omp critical
{ /* critical section */ }

Barrier
A thread reaching a barrier must wait all the other threads of the team
#pragma omp barrier

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 214 / 244
Directives
Ordered
Execute the block in the order it would be executed in a sequential
execution of the loop
#pragma omp parallel for
for (i = 0; i < 1000; i++) {
for (j = 0; j < 1000; j++) {
res = foo();
}
#pragma omp ordered
{
if (i<5) {
printf("%i: %i\n", i, res);
}
}
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 215 / 244
GCC
GCC
Compile and link with:
gcc -fopenmp -lgomp

The number of threads is determined by the runtime environment or


can be set with the OMP NUM THREADS environment variable.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 216 / 244
LU decomposition

A = LU
A is a square matrix and L,U are lower and upper triangular matrices
Example (3 × 3 matrix)
    
a11 a12 a13 l11 0 0 u11 u12 u13
a21 a22 a23  = l21 l22 0   0 u22 u23 
a31 a32 a33 l31 l32 l33 0 0 u33
Used to solve systems of linear equations and compute the inverse of
a matrix.
In situ decomposition
   
a11 a12 a13 l11 u12 u13
a21 a22 a23  =⇒ l21 l22 u23 
a31 a32 a33 l31 l32 l33
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 217 / 244
LU in situ decomposition algorithm
for k := 1 to n do
for i := k + 1 to n do
A(i, k) := A(i, k)/A(k, k)
end for
for i := k + 1 to n do
for j := k + 1 to n do
A(i, j) := A(i, j) − A(i, k) · A(k, j)
end for
end for
end for

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 218 / 244
Course evaluation
Fragen des Dozierenden

• Did you follow the following courses: (1: yes, 5: no)


D1 Interprozesskommunikation in Unix
D2 Compiler design
D3 System construction
• Rate the difficulty of the labs: (1: easy, 5: difficult)
D4 malloc
D5 scheduler
D6 filesystem
D7 proxy
• Infrastructure:
D8 did you find the blog a useful resource? (1: −−, 5: ++)
D9 did you use your own PC or the ETH infrastructure? (1: PC, 5:
ETH)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 219 / 244
Proxy: The main loop
listenfd = Open_listenfd(port);

/* Wait for and process client connections */


while (1) {

argp = (arglist_t *)malloc(sizeof(arglist_t));


clientlen = sizeof(argp->clientaddr);

argp->connfd =
accept(listenfd, (sockaddr *)&argp->clientaddr,
&clientlen);

/* Start a new thread to process the HTTP request */


argp->myid = request_count++;
pthread_create(&tid, NULL, process_request, argp);
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 220 / 244
Proxy: Handling a connection
/* Read the entire HTTP request into the request buffer,
* one line at a time. */
request = (char *)malloc(MAXLINE);
request[0] = ’\0’;
request_len = 0;
Rio_readinitb(&rio, connfd);

while (1) {
if ((n = Rio_readlineb_w(&rio, buf, MAXLINE)) <= 0)
/* handle error */
/* If needed enlarge the buffer */
if (request_len + n + 1 > MAXLINE)
Realloc(request, MAXLINE*realloc_factor++);
strcat(request, buf);
request_len += n;
/* An HTTP request is terminated by a blank line */
if (strcmp(buf, "\r\n") == 0) break;
}
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 221 / 244
Proxy: Check the request
/* Make sure that this is indeed a GET request */
if (strncmp(request, "GET ", strlen("GET "))) {
printf("process_request: Received non-GET request\n");
close(connfd);
free(request);
return NULL;
}
request_uri = request + 4;

/* Extract the URI from the request */


request_uri_end = NULL;
for (i = 0; i < request_len; i++) {
if (request_uri[i] == ’ ’) {
request_uri[i] = ’\0’;
request_uri_end = &request_uri[i];
break;
}
}
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 222 / 244
Proxy: Check the request
/* Make sure that the HTTP version field
* follows the URI */
if (strncmp(request_uri_end + 1, "HTTP/1.0\r\n",
strlen("HTTP/1.0\r\n")) &&
strncmp(request_uri_end + 1, "HTTP/1.1\r\n",
strlen("HTTP/1.1\r\n"))) {
/* handle error */
}

/* We’ll be forwarding the remaining lines in the request


* to the end server without modification */
rest_of_request = request_uri_end +
strlen("HTTP/1.0\r\n") + 1;

/* Parse the URI into its hostname */


if (parse_uri(request_uri, hostname, pathname, &port) < 0)
/* handle error */
}
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 223 / 244
Proxy: Forward the request
/* Forward the request to the end server */
if ((serverfd = open_clientfd_ts(hostname, port, &mutex))
< 0) {
/* handle error */
}
Rio_writen_w(serverfd, "GET /", strlen("GET /"));
Rio_writen_w(serverfd, pathname, strlen(pathname));
Rio_writen_w(serverfd, " HTTP/1.0\r\n",
strlen(" HTTP/1.0\r\n"));
Rio_writen_w(serverfd, rest_of_request,
strlen(rest_of_request));

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 224 / 244
Proxy: Send reply to the client
/* Receive reply from server and forward on to client */
Rio_readinitb(&rio, serverfd);
response_len = 0;
while( (n = Rio_readn_w(serverfd, buf, MAXLINE)) > 0 ) {
response_len += n;
Rio_writen_w(connfd, buf, n);
bzero(buf, MAXLINE);
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 225 / 244
Proxy: Log the request
format_log_entry(log_entry, &clientaddr,
request_uri, response_len);
P(&mutex);
fprintf(log_file, "%s %d\n", log_entry, response_len);
fflush(log_file);
V(&mutex);

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 226 / 244
Proxy: open clientfd
int open_clientfd_ts(char *hostname,
int port, sem_t *mutexp) {
int clientfd;
struct hostent hostent, *hp = &hostent;
struct hostent *temp_hp;
struct sockaddr_in serveraddr;

if ((clientfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)


return -1; /* check errno for cause of error */

P(mutexp); /* lock */
/* Class 3 thread unsafe */
temp_hp = gethostbyname(hostname);
if (temp_hp != NULL)
hostent = *temp_hp; /* copy */
V(mutexp); /* unlock */
if (temp_hp == NULL)
return -2; /* check h_errno for cause of error */
Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 227 / 244
Proxy: open clientfd
/* Fill in the server’s IP address and port */
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
bcopy((char *)hp->h_addr,
(char *)&serveraddr.sin_addr.s_addr,
hp->h_length);
serveraddr.sin_port = htons(port);

/* Establish a connection with the server */


if (connect(clientfd, (SA *) &serveraddr,
sizeof(serveraddr)) < 0)
return -1;
return clientfd;
}

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 228 / 244
Case study
Distributed and concurrent debugging

H. Thane
Monitoring, Testing and Debugging of Distributed Real-Time
Systems
PhD, Royal Institute of Technology, KTH, Mechatronics
Laboratory, TRITA-MMK 2000:16, Sweden, 2000
M. Russinovich and B. Cogswell
Replay for concurrent non-deterministic shared-memory
applications
Proc. ACM SIGPLAN Conf. Programming Language Design and
Implementation, Philadelphia, PA, pp: 258—266, 1996

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 229 / 244
Reproducibility
Problem
Requirements:
• knowledge of the start conditions
• deterministic execution

Sources of non-determinism
• asynchronous interrupts
• concurrent access to shared resources (e.g., shared memory)
• dependence on external data (e.g., time)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 230 / 244
Examples

• Non interactive programs: reproduce with the same input.


• Interactive programs: store and reproduce the input
• terminal
• GUI events
• sensors / interrupts
• Multithreaded programs: reproduce the scheduling decisions
• Distributed programs: reproduce time synchronization.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 231 / 244
Distributed and concurrent debugging
Classical debugging

• breakpoints
• single stepping

Multi-tasking/Distributed debugging

• special hardware (Tai 1991, Tsai 1990)


• software (Le Blanc 1987, McDowell 1989, Thane 2000)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 232 / 244
Deterministic replay
• record significant events
• replay the execution off-line

Problems
• only recorded events can be analyzed
• probe effect: timing can be disturbed by measurement
(Heisenberg uncertainty principle applied to computer software)

Significant events

• synchronization
• scheduling
• communication (internal and external) including the data

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 233 / 244
Example: Thane 2000
Distributed real-time systems
Assumptions
A system is composed by nodes:
• CPU
• memory
• network access
• local clock (synchronized with a global clock)
• I/O
• set of concurrent processes and interrupt routines
• kernel
• preemptive scheduling
• recording mechanism (both for system and user-defined events)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 234 / 244
Thane 2000: Probe effects
The recording mechanism cannot be removed after the development
phase
• Minimize the required resources
• Plan, and allocate resources, early.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 235 / 244
Thane 2000: Offline kernel
Able to replay the recorded events with support of a regular debugger.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 236 / 244
Thane 2000: Storing events
Instrument the code and store:
• PC (loops: the same PC is executed multiple times)
• cycle count
• Hardware counters (high-end CPUs: PPC, Pentium, ...)
• Software: store the context (similarly to context switches)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 237 / 244
Thane 2000: Replaying events
Instrument the off-line kernel and
• trap at the recorded PCs
• if necessary replay event (and context) at the correct time.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 238 / 244
Thane 2000: Recording time
Events must be ordered according to a global time
• multitasking system: system’s clock
• distributed system: total ordering by forming a synchronized
time base

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 239 / 244
Thane 2000: Issues
How much information we need?
• minimize the stored information
• do not store redundant information (periodic messages and
events)

Costs?
The costs (in time and system resources) should be minimal on the
target and acceptable on the replay system.

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 240 / 244
Example: Russinovic 1996
Concurrent shared memory system
Problem similar to distributed systems but requires more resources:
shared memory accesses traces

Issues
• size of the traces
• bandwidth of the collector
• detect shared memory accesses

Approach
Again: reproduce execution traces (repeatable scheduling)

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 241 / 244
Russinovic 1996: Assumptions
• multithreaded/multitasking uniprocessor system
• global or local (group of threads/processes) memory sharing
• replay system:
• is notified of scheduling events
• can control the scheduling process

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 242 / 244
Russinovic 1996: The algorithm
Instruction counters
• Incremented at each backward control transfer
• Each pair (instruction counter, instruction pointer) identifies a
precise event

Event logging

• In case of an asynchronous event store the event location in a


log
• During replay execution execute events according to the log.

Implementation

• Code is instrumented to count backward control transfers

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 243 / 244
Distributed and concurrent debugging: Summary
Problem
Deterministic execution

Techniques
Store program and environment states to replay execution

Issues
• How much to store
• When to store
• Probe effect
• ...

Matteo Corti (Informatikdienste, ETH Zürich) Computer Systems Lab Summer 2007 244 / 244

You might also like