ch4. Process Scheduling

Process Scheduling
國立中正大學
資訊工程研究所
羅習五老師
1
Outline
• OS schedulers
• Unix scheduling
• Linux scheduling
• Linux 2.4 scheduler
• Linux 2.6 scheduler
– O(1) scheduler
– O(2) scheduler
2
5/31 ~
一二三四五六日
5/31 6/1 6/2 6/3 6/4 6/5 6/6
6/7 6/8 6/9 6/10 6/11 6/12 6/13
6/14 6/15 6/16 6/17 6/18 6/19 6/20
6/21 6/22 6/23 6/24 6/25 6/26 6/27
6/28 6/29 6/30 7/1 7/2 7/3 7/4
7/5 7/6 7/7 7/8 7/9 7/10 7/11
3
Introduction
preemptive & cooperative multitasking
• A multitasking operating system is one that
can simultaneously interleave execution of
more than one process.
• Multitasking operating systems come in two
flavors: cooperative multitasking and
preemptive multitasking.
– Linux provides preemptive multitasking
– MAC OS 9 and earlier being the most notable
cooperative multitasking .
4
Linux scheduler –
Scheduling Policy
• Scheduling policy determines what runs when
– fast process response time (low latency)
– maximal system utilization (high throughput)
• Processes classification:
– I/O-bound processes: spends much of its time
submitting and waiting on I/O requests
– Processor-bound processes: spend much of their time
executing code
• Unix variants tends to favor I/O-bound processes,
thus providing good process response time
5
Linux scheduler –
Process Priority
• Linux’s priority-based scheduling
– Rank processes based on their worth and need for
processor time.
– Both the user and the system may set a process's priority
to influence the scheduling behavior of the system.
• Dynamic priority-based scheduling
– Begins with an initial base priority
– Then enables the scheduler to increase or decrease the
priority dynamically to fulfill scheduling objectives.
– E.g., a process that is spending more time waiting on I/O
will receive an elevated dynamic priority.
6
Linux scheduler –
Priority Ranges
• Two separate priority ranges.
– nice value, from -20 to +19 with a default of 0.
• Larger nice values correspond to a lower priority. (you
are being nice to the other processes on the system).
– real-time priority, by default range from 0 to 99.
• All real-time processes are at a higher priority than
normal processes.
• Linux implements real-time priorities in accordance
with POSIX standards on the matter.
7
2.4 scheduler
• Non-preemptible kernel
– Set p->need_resched if schedule() should be
invoked at the ‘next opportunity‘ (kernel => user
mode).
• Round-robin
– task_struct->counter: number of clock ticks left to
run in this scheduling slice, decremented by a
timer.
8
2.4 scheduler
1. Check if schedule() was invoked from interrupt
handler (due to a bug) and panic if so.
2. Use spin_lock_irq() to lock ‘runqueue_lock’
3. Check if a task is ‘runnable’
– in TASK_RUNNING state
– in TASK_INTERRUPTIBLE state and a signal is pending
4. Examine the ‘goodness’ of each process
5. Context switch
9
2.4 scheduler – ‘goodness’
• ‘goodness’: identifying the best candidate
among all processes in the runqueue list.
– ‘goodness’ = 0: the entity has exhausted its
quantum.
– 0 < ‘goodness’ < 1000: the entity is a conventional
process/thread that has not exhausted its
quantum; a higher value denotes a higher level of
goodness.
10
2.4 scheduler – ‘goodness’
if (p->mm == prev->mm)
return p->counter + p->priority + 1;
else
return p->counter + p->priority;
• A small bonus is given to the task p if it

shares the address space with the previous
task.
11
2.4 scheduler - SMP
run queue
12
2.4 scheduler - SMP
Examine the processor field of the processes
and gives a consistent bonus (that is
PROC_CHANGE_PENALTY, usually 15) to the
process that was last executed on the ‘this_cpu’
CPU.
13
2.4 scheduler - performance
• The algorithm does not scale well
– It is inefficient to re-compute all dynamic priorities at
once.
• The predefined quantum is too large for high
system loads (for example: a server)
• I/O-bound process boosting strategy is not
optimal
– a good strategy to ensure a short response time for
interactive programs, but…
– some batch programs with almost no user interaction
are I/O-bound.
14
Recalculating Timeslices
(kernel 2.4)
• Problems:
– Can take a long time. Worse, it scales O(n) for n tasks
on the system.
– Recalculation must occur under some sort of lock
protecting the task list and the individual process
descriptors. This results in high lock contention.
– Nondeterminism is a problem with deterministic real-
time programs.
15
2.6 scheduler
run queue
task migration
(put + pull)
run queue
16
2.6 scheduler –
User Preemption
• User preemption can occur
– When returning to user-space from a system call
– When returning to user-space from an interrupt
handler
17
2.6 scheduler –
Kernel Preemption
• The Linux kernel is a fully preemptive kernel.
– It is possible to preempt a task at any point, so long as the
kernel is in a state in which it is safe to reschedule.
– “safe to reschedule”: kernel does not hold a lock
• The Linux design:
– additing of a preemption counter, preempt_count, to each
process's thread_info
– This count increments once for each lock that is acquired and
decrements once for each lock that is released
• Kernel preemption can also occur explicitly, when a task in
the kernel blocks or explicitly calls schedule().
– no additional logic is required to ensure that the kernel is in a
state that is safe to preempt!
18
Kernel Preemption
• Kernel preemption can occur
– When an interrupt handler exits, before returning
to kernel-space
– When kernel code becomes preemptible again
– If a task in the kernel explicitly calls schedule()
– If a task in the kernel blocks (which results in a call
to schedule())
19
O(1) & CFS scheduler
• 2.5 ~ 2.6.22: O(1) scheduler
– Time complexity: O(1)
– Using “run queue” (an active Q and an expired Q)
to realize the ready queue
• 2.6.23~present: Completely Fair Scheduler
(CFS)
– Time complexity: O(log n)
– the ready queue is implemented as a red-black
tree
20
O(1) scheduler
• Implement fully O(1) scheduling.
– Every algorithm in the new scheduler completes in constant-time, regardless of the
number of running processes. (Since the 2.5 kernel).
• Implement perfect SMP scalability.
– Each processor has its own locking and individual runqueue.
• Implement improved SMP affinity.
– Attempt to group tasks to a specific CPU and continue to run them there.
– Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes.
• Provide good interactive performance.
– Even during considerable system load, the system should react and schedule interactive
tasks immediately.
• Provide fairness.
– No process should find itself starved of timeslice for any reasonable amount of time.
Likewise, no process should receive an unfairly high amount of timeslice.
• Optimize for the common case of only one or two runnable processes, yet
scale well to multiple processors, each with many processes. 21
The Priority Arrays
• Each runqueuecontains two priority arrays (defined in
kernel/sched.cas struct prio_array)
– Active array: all tasks with timesliceleft.
– Expired array: all tasks that have exhausted their timeslice.
• Priority arrays provide O(1) scheduling.
– Each priority array contains one queue of runnable processors per
priority level.
– The priority arrays also contain a priority bitmap used to efficiently
discover the highest-priority runnable task in the system.
22
The Linux O(1) scheduler algorithm
23
The Priority Arrays
• Each runqueuecontains two priority arrays (defined in
kernel/sched.cas struct prio_array)
– Active array: all tasks with timesliceleft.
– Expired array: all tasks that have exhausted their timeslice.
• Priority arrays provide O(1) scheduling.
– Each priority array contains one queue of runnable processors per
priority level.
– The priority arrays also contain a priority bitmap used to efficiently
discover the highest-priority runnable task in the system.
24
 Each runqueue contains two priority
arrays – active and expired.
runqueue  Each of these priority arrays contains a
list of tasks indexed according to priority
Priority
queue
(0-139)
expired
active
25
 Linux assigns higher-priority tasks
longer time-slice
runqueue
Time quantum ≈
1/priority
tsk1
tsk2 tsk3
expired
active
26
 Linux chooses the task with the
highest priority from the active
runqueue array for execution.
tsk1
tsk2 tsk3
expired
active
27
runqueue
tsk1
Round-robin
tsk2 tsk3
expired
active
28
runqueue
tsk1
Round-robin
tsk3 tsk2
expired
active
29
runqueue
tsk1
tsk2 tsk3
expired
active
30
 Most tasks have dynamic priorities
that are based on their “nice” value
runqueue 
(static priority) plus or minus 5
Interactivity of a task ≈
1/sleep_time
dynPrio = staticPrio +
bonus
tsk1
bonus = -5 ~ +5
bonus ≈ 1/sleep_time
tsk3
tsk2 tsk3
I/O
bound
expired
active
31
 When all tasks have exhausted
their time slices, the two priority
runqueue arrays are exchanged!
tsk1
tsk3
tsk2
expired
active
32
The O(1) scheduling algorithm
sched_find_first_bit()
1 1 1
tsk1
tsk3
tsk2
33
The O(1) scheduling algorithm
Insert O(1)
1 1 1 Remove O(1)
find first set bit O(1)
34
find first set bit O(1)
static inline unsigned long __ffs word >>= 8;
(unsigned long word) { }
if ((word & 0xf) == 0) {
int num = 0; num += 4;
#if BITS_PER_LONG == 64 word >>= 4;
if ((word & 0xffffffff) == 0) { }
num += 32; if ((word & 0x3) == 0) {
word >>= 32; num += 2;
} word >>= 2;
#endif }
if ((word & 0xffff) == 0) { if ((word & 0x1) == 0)
num += 16; num += 1;
word >>= 16; return num;
} }
if ((word & 0xff) == 0) {
num += 8;
35
2.6 scheduler –
CFS
• Classical schedulers compute time slices for
each process in the system and allow them to
run until their time slice/quantum is used up.
– After that, all process need to be recalculated.
• CFS considers only the wait time of a process
– The task with the most need for CPU time is
scheduled.
36
2.6 scheduler –
CFS
1
𝐹𝑎𝑖𝑟𝑛𝑒𝑠𝑠 ≅
σ 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 𝑡𝑖𝑚𝑒
37
2.6 scheduler –
CFS (motivation)
• Traditional Unix scheduling policy
HP LP
HP: high priority

LP: low priority
38
2.6 scheduler –
CFS (motivation)
• Traditional Unix scheduling policy
WT
WT
WT
WT
WT: waiting time
39
2.6 scheduler –
CFS (the idea case)
task task task task task task
virtualization
High priority Low priority
40
2.6 scheduler –
CFS (the idea case)
virtualization
fastly slowly
41
2.6 scheduler –
CFS (the idea case)
8 : 8 : 8 : 3 : 3 : 3
speed
42
2.6 scheduler –
CFS (the idea case)
time
8 8 8 3 3 3
43
2.6 scheduler –
CFS (the idea case)
time
8 8 8 3 3 3
44
2.6 scheduler –
CFS (the idea case)
time
8 8 8 3 3 3
45
2.6 scheduler –
CFS (the idea case)
time
8 8 8 3 3 3
46
2.6 scheduler –
CFS (the idea case)
time
8 8 8 3 3 3
47
2.6 scheduler –
CFS (the implementation)
time
8 8 8 3 3 3
48
2.6 scheduler –
time
8 8 8 3 3 3
49
2.6 scheduler –
8 8 8 3 3 3
50
2.6 scheduler –
8 8 8 3 3 3
51
2.6 scheduler –
8 8 8 3 3 3
52
2.6 scheduler –
8 8 8 3 3 3
53
2.6 scheduler –
8 8 8 3 3 3
54
2.6 scheduler –
8 8 8 3 3 3
55
2.6 scheduler –
8 8 8 3 3 3
56
2.6 scheduler –
8 8 8 3 3 3
57
2.6 scheduler –
8 8 8 3 3 3
58
2.6 scheduler –
8 8 8 3 3 3
59
2.6 scheduler –
8 8 8 3 3 3
60
2.6 scheduler –
8 8 8 3 3 3
61
2.6 scheduler –
8 8 8 3 3 3
62
2.6 scheduler –
8 8 8 3 3 3
63
2.6 scheduler –
8 8 8 3 3 3
64
2.6 scheduler –
CFS
65
2.6 scheduler –
the RB-Tree
• To sort tasks on the red-black tree, the kernel
uses the difference fair_clock -wait_runtime.
– While fair_clock is a measure for the CPU time a
task would have gotten if scheduling were
completely fair,
– wait_runtime is a direct measure for the
unfairness caused by the imperfection of real
systems.
66
2.6 scheduler –
issues
• Different priority levels for tasks (i.e., nice
values) must be taken into account
• Tasks must not be switched too often because

a context switch has a certain overhead.
67
2.6 scheduler –
fields in the task_struct
68
2.6 scheduler –
• prio and normal_prio indicate the dynamic
priorities, static_prio the static priority of a
process.
– The static priority is the priority assigned to the
process when it was started.
– normal_priority denotes a priority that is
computed based on the static priority and the
scheduling policy of the process.
69
2.6 scheduler –
• The scheduler is not limited to schedule

processes, but can also work with larger
entities. This allows for implementing group
scheduling.
70
2.6 scheduler –
• cpus_allowed is a bit field used on

multiprocessor systems to restrict the CPUs on
which a process may run.
– setaffinity()
– getaffinity()
71
2.6 scheduler –
priority
72
2.6 scheduler –
priority
kernel/sched.c
static const int prio_to_weight[40] = {
/* -20 */ 88761, 71755, 56483, 46273, 36291,
/* -15 */ 29154, 23254, 18705, 14949, 11916,
/* -10 */ 9548, 7620, 6100, 4904, 3906,
/* -5 */ 3121, 2501, 1991, 1586, 1277,
/* 0 */ 1024, 820, 655, 526, 423,
/* 5 */ 335, 272, 215, 172, 137,

/* 10 */ 110, 87, 70, 56, 45,
/* 15 */ 36, 29, 23, 18, 15,
};
prio_to_weight[i] = prio_to_weight[i] ×1.25
73
2.6 scheduler –
priority
74
Summary
• The concept of OS schedulers
• Maximize throughput.
– This is what system administrators care about.
– How to maximize throughput (CPU & I/O).
• What is the major drawback of Linux 2.4
scheduler
• Pros and cons of Linux 2.6 schedulers
– O(1)
– CFS
75

ch4. Process Scheduling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ch4. Process Scheduling

Uploaded by

Copyright:

Available Formats

Process Scheduling

• A small bonus is given to the task p if it

find first set bit O(1)

HP: high priority

High priority Low priority

• Tasks must not be switched too often because

• The scheduler is not limited to schedule

• cpus_allowed is a bit field used on

/* 5 */ 335, 272, 215, 172, 137,

prio_to_weight[i] = prio_to_weight[i] ×1.25

You might also like