Professional Documents
Culture Documents
國立中正大學
資訊工程研究所
羅習五 老師
少部分內容參酌自薛智文老師
Chapter 5: Kernel Synchronization
• Kernel Control Paths
• When Synchronization is Not Necessary
• Synchronization Primitives
• Synchronizing Accesses to Kernel Data
Structures
• Examples of Race Condition Prevention
1
Kernel
• You could think of the kernel as a server that
answers requests; these requests can come
either from a process running on a CPU or an
external device issuing an interrupt request.
Bottom halves
Top halves
2
Kernel Control Paths
• Kernel Control Path (KCP)
– a sequence of instructions executed by the kernel
to handle interrupts (/exception) of different kinds
• Each kernel request is handled by a different
KCP
– system call request System call
(software interrupt):
system_call ret_from_sys_call
Bottom halves
Top halves
3
Kernel Requests
• A process executing in User Mode causes an
exception. (e.g., x/0)
• A process executing in Kernel Mode causes a Page
Fault exception.
• An external device sends a signal to a programmable
interrupt controller (PIC), and the corresponding
interrupt is enabled
• A process running raises an interprocessor interrupt
(IPI).
4
Kernel Control Paths
• The CPU interleaves KCPs when:
– A process switch occurs. (it relinquishes control of
CPU, e.g., sleep/wait)
– An interrupt occurs.
– A deferrable function is executed.
• Interleaving improves the throughput of PIC
and device controllers.
5
A fully preemptable kernel
• Nonpreemptive kernel? & preemptive kernel?
– Nonpreemptive kernel: Linux kernel ~2.4
– preemptive kernel: Linux kernel 2.6
• Kernel 2.4 + preempt_count* = kernel 2.6
The value is greater than 0 when …
– The kernel is executing an ISR
– The deferrable functions are disabled
– The kernel preemption level has been explicitly
disabled
7
Synchronization Primitives
• Per-CPU variables keep them short!
– One element per each – general, read/write, big
CPU in the system reader
• Atomic operations • Semaphores
– memory bus lock, read- – general, read/write
modify-write (rmw) ops – Local interrupt disabling
• Memory barriers – Local softirq disabling
– avoids compiler, CPU – Read-copy-update (RCU)
instruction re-ordering
• Spin locks
– only on SMP systems;
8
Synchronization Primitives
Technique Description Scope
Atomic read-modify-write
Atomic operation All CPUs
instruction to a counter
Memory barrier Avoid instruction re-ordering Local CPU
Spin lock Lock with busy wait All CPUs
Semaphore Lock with blocking wait All CPUs
Forbid interrupt handling on a
Local interrupt disabling Local CPU
single CPU
Forbid deferrable function
Local softirq disabling Local CPU
handling on a single CPU
Forbid interrupt and softirq
Global interrupt disabling All CPUs
handling on all CPUs
9
Atomic Operations
• Many instructions not atomic in hw (MP)
– rmw instructions: inc, test-and-set, swap
– unaligned memory access
– rep instructions
• Compiler may not generate atomic code
– even i++ is not necessarily atomic! (i=i+1)
• Linux – atomic_ macros
– atomic_t – 24 bit atomic counters
– Intel implementation (atomic, for MP)
• lock prefix byte 0xf0 – locks memory bus
10
Atomic operations in Linux
Function Description
atomic_read(v) Return *v
atomic_set(v,i) Set *v to i
atomic_add(i,v) Add i to *v
atomic_sub(i,v) Subtract i from *v
Subtract i from *v and return 1 if the result is
atomic_sub_and_test(i, v)
zero; 0 otherwise
atomic_inc(v) Add 1 to *v
atomic_dec(v) Subtract 1 from *v
Subtract 1 from *v and return 1 if the result
atomic_dec_and_test(v)
is zero; 0 otherwise
Add 1 to *v and return 1 if the result is zero;
atomic_inc_and_test(v)
0 otherwise
Add i to *v and return 1 if the result is
atomic_add_negative(i, v)
negative; 0 otherwise
11
Atomic bit handling functions in Linux
Function Description
test_bit(nr, addr) Return the value of the nrth bit of *addr
set_bit(nr, addr) Set the nrth bit of *addr
clear_bit(nr, addr) Clear the nrth bit of *addr
change_bit(nr, addr) Invert the nrth bit of *addr
Set the nrth bit of *addr and return its old
test_and_set_bit(nr, addr)
value
Clear the nrth bit of *addr and return its old
test_and_clear_bit(nr, addr)
value
Invert the nrth bit of *addr and return its old
test_and_change_bit(nr, addr)
value
atomic_clear_mask(mask, addr) Clear all bits of addr specified by mask
atomic_set_mask(mask, addr) Set all bits of addr specified by mask
12
Memory Barriers
• Compilers and hw re-order memory accesses
– as an optimization
– true on SMP and even UP systems!
• Memory barrier – instruction to hw/compiler to complete all
pending accesses before issuing more
– read memory barrier – acts on read requests
– write memory barrier – acts on write requests
• Linux macros
– for UP and MP: mb(), rmb(), wmb()
– for MP only: smp_mp(), smp_rmb(), smp_wmb()
13
Memory barriers in Linux
Macro Description
mb( ) Memory barrier for MP and UP
rmb( ) Read memory barrier for MP and UP
wmb( ) Write memory barrier for MP and UP
smp_mb( ) Memory barrier for MP only
smp_rmb( ) Read memory barrier for MP only
smp_wmb( ) Write memory barrier for MP only
14
Peterson’s Solution
• Two process solution
• Assume that the LOAD and STORE instructions are atomic;
that is, cannot be interrupted.
• The two processes share two variables:
– int turn;
– Boolean flag[2]
• The variable turn indicates whose turn it is to enter the critical
section.
• The flag array is used to indicate if a process is ready to enter
the critical section. flag[i] = true implies that process Pi is
ready!
15
Algorithm for Process Pi
while (true) {
flag[i] = TRUE;
turn = j;
while ( flag[j] && turn == j);
/*CRITICAL SECTION*/
flag[i] = FALSE;
/*REMAINDER SECTION*/
}
Task_i Task_j
turn = j;
flag[i] = False turn = i;
turn = i flag[j] = TRUE;
while ( flag[i] && turn == i);
flag[i] = TRUE;
while ( flag[j] && turn == j); /*CRITICAL SECTION*/
/*REMAINDER SECTION*/ }
}
Peterson’s Solution
while (true) {
flag[i] = TRUE;
mb( );
turn = j;
while ( flag[j] && turn == j);
/*CRITICAL SECTION*/
flag[i] = FALSE;
/*REMAINDER SECTION*/
}
18
Spin Lock
– Sequential lock
21
Read/Write Spin Locks
__read_lock_failed:
lock; incl (%eax)
1:cmpl $1,(%eax)
js 1b
lock; decl (%eax)
js __read_lock_failed
ret
23
Write Spin Lock
write_lock(rwlp) write_unlock(rwlp)
movl $rwlp,%eax lock; addl $0x01000000,rwlp
lock; subl $0x01000000,(%eax)
jz 1f
call write_lock_failed
1:
__write_lock_failed:
lock; addl $0x01000000,(%eax)
1: cmpl $0x01000000,(%eax)
jne 1b
lock; subl $0x01000000,(%eax)
jnz __write_lock_failed
ret
24
Seqlock (sequential lock)
• A seqlock is a locking mechanism Linux for
supporting fast writes of shared variables.
• seqlock := sequence number + lock
– The lock is to support synchronization between
two writers
– the counter is for indicating consistency in readers
25
Seqlock (sequential lock)
– the writer increments the sequence number, both after
acquiring the lock and before releasing the lock.
– Readers read the sequence number before and after
reading the shared data.
do {
while (((old_seq_num = seq_num)%2) != 0);
//READER: critical section
} while (old_seq_num != seq_num);
• Seqlock was first applied to system time counter
updating.
26
MONITOR & MWAIT
(x86, for thread synchronization)
• MONITOR defines an address range used to
monitor write-back stores.
Local_PTR
data
PTR
Local_PTR
data
PTR
writer
kmalloc + copy +
data (new)
update
New_PTR
PTR
data
An atomic
operation
writer
PTR
data
writer
data
writer or GC
data (new)
kfree(old_ptr) PTR
data (new)
PTR
34
Read-copy-update (RCU)
Lock scheduler
scheduler
Unlock
CTX_SW
reader
writer GC
Lock_scheduler := preempt_count++
Unlock_scheduler := preempt_count-- 35
Semaphores
• Kernel semaphores
– used by kernel control paths.
– can be acquired only by functions that are allowed
to sleep; interrupt handlers and deferrable
functions cannot use them.
• System V IPC semaphores
– used by User Mode processes
36
Semaphores
• struct semaphore
– count (atomic_t):
• >0 free; 0 inuse, no waiters; <0 inuse, waiters
– wait: wait queue
– sleepers: 0 (none), 1 (some), occasionally 2
• implementation requires lower-level synch!
– atomic updates, spinlock, interrupt disabling
• optimized assembly code for normal case (down())
– C code for slower “contended” case (_ _down())
37
Semaphores
up: down:
movl $sem,%ecx movl $sem,%ecx
lock; incl (%ecx) lock; decl (%ecx);
jg 1f jns 1f
pushl %eax pushl %eax
pushl %edx pushl %edx
pushl %ecx pushl %ecx
call _ _up call _ _down
popl %ecx popl %ecx
popl %edx popl %edx
popl %eax popl %eax
1: 1:
38
_ _down
WaitingQ.ins
WaitingQ.del
39
Read/Write Semaphores
• New feature of Linux 2.4
• Read/Write Semaphores
• FIFO
• complex implementation
– similar to regular semaphores
• operations:
– down_read(), down_write()
– up_read(), up_write()
40
Read/Write Semaphores
• The first process is always awoken.
– If it is a writer, the other processes in the wait
queue continue to sleep.
– If it is a reader, any other reader following the first
process is also woken up and gets the lock.
However, readers that have been queued after a
writer continue to sleep.
R R R W R W R R
41
Completions
• The current implementation of up( ) and
down( ) also allows them to execute
concurrently on the same semaphore.
• up( ) might attempt to access a data structure
that no longer exists.
• up( ) complete( ).
• down( ) wait_for_completion( ).
42
Completions
1 2
create_sem
down
del_sem
up
del_sem
43
Local Interrupt Disabling
• Local interrupt disabling does not protect
against concurrent accesses to data structures
by interrupt handlers running on other CPUs.
Spin locks
•only on SMP systems; keep them short!
•general, read/write, big reader 44
Global Interrupt Disabling
• A typical scenario consists of a driver that
needs to reset the hardware device.
• Global interrupt disabling significantly lowers
the system concurrency level.
• An interrupt service routine should never
execute the cli( ) macro.
45
_ _global_cli()
• wait for top and bottom halves to complete
• disable local interrupts
• grab spinlock
• disable all interrupts
46
47
Disabling Deferrable Functions
• disabling interrupts disables deferred
functions
• possible to disable deferred functions but not
all interrupts
• ops (macros):
– local_bh_disable()
– local_bh_enable()
48
Choosing Synch Primitives
• avoid synch if possible! (clever instruction
ordering)
– example: inserting in linked list (needs barrier still)
– Example: task migration
• use atomics or rw spinlocks if possible
• use semaphores if you need to sleep
• complicated structures accessed by deferred
functions
49
Example Race Conditions
• reference counters for sharing structs
– get/put functions
– deallocate when 0
• memory map semaphore
• slab cache list semaphore
• inode semaphore
50