Professional Documents
Culture Documents
Lecture Plan
Lectures
Introduction to Operating System (OS) & OS structure Process & Scheduling Synchronization & Deadlock Memory management
Tutorial Plan
Tutorials
Books
Silberschatz & Galvin, Operating System Concepts, 5th Ed. Addison-Wesley, 1998,
or
Silberschatz, Galvin, & Gagne. Operating System Concepts, 6th Ed. Wiley, 2002. Nutt Gary J, Operating Systems : A modern perspective, 2nd Ed. AddisonWesley, 2000, ISBN 0-201-61251-8
Division of Information Engineering School of Electrical and Electronic Engineering
4
Copyright Notice
The contents found in this part of the course material are adapted mainly from the listed textbooks and reference books for the purpose of teaching. Some contents in the slides of this course are copyrighted by the respective publishers, for which they are acknowledged here. The instructor does not claim any copyright on such material. You should use this material strictly for your own personal study only. No distribution of this teaching material is allowed without permission.
Division of Information Engineering School of Electrical and Electronic Engineering
5
user n
What is an operating system(OS) ? OS controls and coordinates the use of the hardware among the various application programs for the various users. OS provides the means for the proper use of these resources in the operation of the computer system. Provides an environment within which other programs can do useful work.
User View The OS is designed mostly for ease of use. For a mainframe or minicomputer, the OS is designed to maximize resources utilization. For Workstations that are connected to networks of other workstations and severs, the OS is designed to compromise between individual usability and resource utilization.
For Handled computers, mostly standalone and used single by individual users, the OS is designed mostly for individual usability, battery saving. For Embedded computers in home devices and automobiles the OS is designed to run without user intervention.
Division of Information Engineering School of Electrical and Electronic Engineering
8
users have dedicated resources but they also share resources such as networking and servers (file, compute and print severs).
System View OS is a resource allocator. OS must decide how to allocate resources to specific programs and users so that it can operate the computer system efficiently and fairly. OS is a control program. it manages the execution of user programs to prevent errors and improper use of the computer. It is especially concerned with the operation and control of I/O devices.
System Goals The primary goal of some OS is convenience for the user. e.g. OS for small PCs. The primary goal of other operating systems is efficient operation of the computer system. e.g. workstations and mainframe computers These two goals - convenience and efficiency are sometimes contradictory.
10
11
Multiprogrammed Systems Multiprogramming increases CPU utilization by organizing jobs so that the CPU always has one to execute. The idea is as follows: The operating system keeps several jobs in memory simultaneously Operating system picks and begins to execute one of the jobs in the memory. Job may have to wait for some task, such as an I/O operation, to complete. In a multiprogramming system, the operating system simply switches to, and executes, another job, and so on. Multiprogramming is the first instance where the operating system must make decisions for the users.
Division of Information Engineering School of Electrical and Electronic Engineering
12
Time-Sharing Systems Time sharing (or multitasking) is a logical extension of multiprogramming, but are more complex than multiprogrammed systems. It is an interactive computer system and allows many users to share the computer simultaneously. Such a system used CPU scheduling and multiprogramming to provide each user with a small portion of a time-shared computer. As the system switches rapidly form one user to the next, each user is given the impression that the entire computer system is dedicated to him/her, even though it is being shared among many users. Response time should be short It uses virtual memory.
Division of Information Engineering School of Electrical and Electronic Engineering
14
Desktop Systems The goals of these operating systems have changed with time; Instead of maximizing CPU and peripheral utilization, the systems opt for maximizing user convenience and Responsiveness. These systems include
Intel PCs running Microsoft Windows Apple Macintosh running MacOs X which is now based on Mach and FreeBSD UNIX for scalability, performance, features, but it retains the same rich GUI. Linux, a UNIX-like operating system on PC and Macintosh
Division of Information Engineering School of Electrical and Electronic Engineering
15
16
Distributed Systems Distributed systems depend on networking for their functionality. Loosely coupled systems. Client-Server Systems Sever systems can be broadly categorized as 1. Compute-server systems provide an interface to which clients can send requests to perform an action, in response to which they execute the action and send back results to the client. 2. File-server systems provide a file-system interface where clients can create, update, read, and delete files. Peer-to-Peer Systems applications consist of a collection of processors that do not share memory or a clock. Instead each processor has its own local memory. The processors communicate with one another through various communication lines.
Division of Information Engineering School of Electrical and Electronic Engineering
17
Client-Server System
Based on Virtual Machine concept, implement most of the OS functions as user processes. A user process (client process) sends a request to a server process to perform the work and send back the answer. The kernel handles the communications between clients and servers. Hence, the size of the kernel is reduced to minimal. All client/server processes are run in user mode and do not have direct access to the hardware. A bug in the client/server process will cause the process to crash without bringing down the whole system.
Division of Information Engineering School of Electrical and Electronic Engineering
18
Example of a client/server system - reading a file Client Client Process Terminal ... process process server server kernel message from client to server machine 1 machine 2 machine 3 client kernel file server kernel process server kernel File server user mode kernel mode Client obtains service by sending messages to server process machine 4 terminal server kernel
Clustered Systems Clustered systems gather together multiple CPUs to accomplish computational work. It differs from parallel systems, however, in that they are composed of two or more individual systems coupled together. Clustering is usually performed to provide high availability, but also can be for high throughput or high performance. Real-time Systems As special-purpose system, a real-time system has welldefined, fixed time constrains. In a hard real-time system, processing must be done within the defined within the defined constrains, In a soft real-time system, a critical real-time task gets priority over other tasks, and retains that priority until it completes.
Division of Information Engineering School of Electrical and Electronic Engineering
20
Handheld Systems Examples: personal digital assistants (PDA), or cellular telephones with connectivity to network such as the Internet. Many handheld devices have between 512KB and 8MB of memory. The operating system and applications must manage memory efficiently. Processors for most handheld devices often run at a fraction of the speed of a processor in a PC. The operating system and applications must be designed not to tax the processor. Small display screens typically available. Web-clipping, where only a small subset of a web page is delivered and displayed on the handheld device is necessary. Some handhold devices may use wireless technology, such as BlueTooth.
Division of Information Engineering School of Electrical and Electronic Engineering
21
22
Process Management
A process can be thought as a program in execution. A program by itself is a passive entity , hence not a process. A process is an active entity, with a program counter specifying the next instruction to execute. A process needs resources- such as CPU time, memory, files, and I/O devices - to accomplish its task. Operating System is responsible for : 1. Creating and deleting both user and system processes; 2. Suspending and resuming processes; 3. Providing mechanisms for process synchronization 4. Providing mechanisms for process communication 5. Providing mechanisms for deadlock handling
Division of Information Engineering School of Electrical and Electronic Engineering
23
Main-memory Management
The operating system is responsible for the following activities in connection with memory management: 1. Keeping track of which parts of memory are currently being used and by whom 2. Deciding which processes are to be loaded into memory when memory space becomes available. 3. Allocating and deallocating memory space as needed.
24
: load a program into memory and run it; handle program return code when it finishes or crashes. I/O operations : handles I/O request from programs File system manipulation: maintaining files and directories Communications: handles exchange of information between programs via shared memory or message passing Error detection: traps error in CPU, memory, I/O devices, user programs, etc., and take appropriate actions
25
process A M process B M
process A
Shared memory
process B
kernel
kernel
(continue)
Some OS services are provided for efficient operation of the system itself:
Resource
allocation : determines how best to use CPU, I/O devices, etc. Accounting : keep track of users usage of system resource. Protection: ensure all access to system resources is controlled, user log-ins are authenticated.
27
System Calls
System calls provide the interface between a process and the operating system. System calls can be grouped into 5 categories:
process
Communications create/delete communication connection; send/receive messages; transfer status information; attach/detach remote devices;
29
free memory
Running a program
Command interpreter (CI) is invoked when the computer is switched on. A program is loaded into memory, writing over most part of CI to give the free memory program as much as memory as possible. process Instruction pointer is set to the first instruction of the program. command The program then runs and terminates by executing a system call or by error interpreter traps. kernel CI is then reloaded, error code is reported or passed to next program.
30
When a user logs on to the system, a command interpreter, called a shell, is run. The shell may continue running while another program is executed. To start a new process, the shell execute a fork system call. The selected program is loaded into memory via an exec system call, and then the program is executed. When the process is done, it executes an exit system call to terminate and returns an error code 0. If there is an error, it returns non-zero code.
31
Interfaces and levels are not well separated. application program Intel 8088 chip does not support dual mode* and hence no hardware resident system program protection, OS has no choice but allow access to hardware. MS-DOS device drivers Applications programs are able to access the basic I/O routines to write directly to the display and disk ROM BIOS device drivers drives. *dual mode - user mode, system Vulnerable to errant or malicious mode. programs causing entire system Certain instructions are allowed crashes or disk erasure when user in system mode only, thus programs fail. provide system protection.
Division of Information Engineering School of Electrical and Electronic Engineering
32
signals file system CPU scheduling terminal handling swapping page replacement character I/O system block I/O system demand paging terminal drivers disk and tape drivers virtual memory kernel interface to the kernel terminal controllers device controllers terminals disks and tapes memory controllers physical memory
33
Unix consists of two separate parts: the kernel, and the system programs. The kernel is further separated into a series of interfaces and device drivers - as many as you want to add in, hence fully expandable. Unix can be viewed as a layered structure. Everything below the system-call interface and above the physical hardware is the kernel. The kernel provides the file system, CPU scheduling, memory management, and other OS functions through system calls. The kernel layer contains a lot of functionality.
34
problem : less efficient e.g. for a user program to execute an I/O operation, it executes a system call which is trapped to layer 4, which calls layer 2 functions, through to layer 1 and finally layer 0.
Division of Information Engineering School of Electrical and Electronic Engineering
36
Microkernels
As OS expands, kernel became large and difficult to manage. The microkernel approach removes all nonessential components from the kernel, and implementing them as system- and user-level programs. The result is a smaller kernel. The main function of the microkernel is to provide a communication facility (message passing )between the client program and the various services (all are running in user space). The benefits : the ease of extending the operating system. All new services are added to users space and consequently do not require modification of the kernel. provides more security and reliability, Since most services are running at user---rather than kernel--processes. If a service fails, the rest of the operating system remains untouched.
Division of Information Engineering School of Electrical and Electronic Engineering
37
By using CPU scheduling and virtual-memory techniques, an OS creates the illusion of multiple processes, each executing on its own processor with its own (virtual) memory. Each VM is completely isolated from all other VMs. As an example, an MS-DOS program can run on a SUN or DEC workstation by creating a virtual Intel machine. The virtual-machine approach, on the other hand, does not provide any additional functionality, but rather provides an interface that is identical to the underlying bare hardware. Each process is provided with a (virtual) copy of the underlying computer.
Division of Information Engineering School of Electrical and Electronic Engineering
38
Programming interface
kernel
VM2
kernel
VM3
39
Virtual-machine concept is useful, but it is difficult to implement. Virtual machines are increasing in popularity as a means of solving system compatibility problems. e.g. Intel machine on top of the native processor, and Microsoft Windows is also run in this VM.
A Java virtual machine (JVM) The JVM is a specification for an abstract computer. The JVM consists of a class loader, a class verifier, and a Java interpreter that executes the architecture-neutral bytecodes. The Java interpreter may be a software module that interprets the bytecodes one at a time, or it may be a just-in-time (JIT) compiler that turns the architecture-neutral bytecodes into native machine language for the host computer.
Division of Information Engineering School of Electrical and Electronic Engineering
40
Processes
Process concept:
A process is a program in execution. A process is more than the program code(text section), it also has a program counter, registers,stack, a data section which contains global variables. A program by itself is not a process; A process is an active entity, with a program counter specifying the next instruction to execute and a set of associated resources. Two processes may be associated with the same program.
e.g. several users running copies of the mail program, or the same user may invoke many copies of the editor program. Each of these is a separate process. Text sections are equivalent, the data section vary.
Division of Information Engineering School of Electrical and Electronic Engineering
41
Process State
As a process executes, it changes state. A process may be in one of the following states: New: The process is being created. Running: Instructions are being executed. Waiting: The process is waiting for some event to occur (e.g. I/O completion or reception of a signal). Ready: The process is waiting to be assigned to a processor. Terminated: The process has finished execution.
42
terminate
44
process P0
executing
operating system
interrupt or system call save state into PCB0 reload state from PCB1 interrupt or system call save state into PCB1 reload state from PCB0
process P1
idle
executing
idle
executing
45
Process Scheduling
Scheduling Queues As processes enter the system, they are put into a job queue which consists of all processes in the system. queues are implemented as linked lists. Processes residing in main memory and ready for execution are kept in the ready queue. A ready queue header will contain pointers to the first and the last PCBs. Each PCB has a pointer field that points to the next process in the queue. The operating system also has other queues. Each device ha its own device queue. Process waiting for a particular I/O device are put in a device queue.
Division of Information Engineering School of Electrical and Electronic Engineering
46
PCB7 register
PCB2 register
PCB3 register
PCB14 register
PCB6 register
Scheduler
selects processes from job pool and loads then into memory for execution. controls the degree of multiprogramming---the number of processes in memory. select a good process mix of I/O-bound and CPUbound processes. An I/O-bound process spends more of its time doing I/O than it spends doing computations. A CPU-bound process, generates I/O requests infrequently, using more of its time doing computation.
Division of Information Engineering School of Electrical and Electronic Engineering
48
(continue) short-term scheduler, or CPU scheduler selects from among the processes that are ready to execute, and allocates the CPU to one of them. The primary distinction between these two schedulers is the frequency of their execution. The short-term scheduler must select a new process for the CPU frequently. The long-term scheduler, on the other hand, executes much less frequently.
Division of Information Engineering School of Electrical and Electronic Engineering
49
medium-term scheduler
Some operating systems, such as time-sharing systems, may introduce an additional, intermediate level of scheduling. It removes processes from memory (and from active contention for the CPU), and thus reduces the degree of multiprogramming. Some later time, the process can be reintroduced into memory and its execution can be continued where it left off. This scheme is called swapping.
50
Context Switch
Context switch is the switching of the CPU to another process by saving the state of the old process and loading the saved state for the new process. Context switch time is pure overhead because the system does no useful work while switching. Context-switch times are highly dependent on hardware support. e.g., some processors (such as the Sun Ultra SPARC) provides multiple sets registers. A context switch simply includes changing the pointer to the current registers set. Otherwise, the system resort to copying register data to and from memory.
51
Cooperating Processes
Cooperating processes are processes that affect each other when executed concurrently. Any process that shares data with other processes is a cooperating process. A process is independent if it cannot affect or be affected by the other processes executing in the system, thus non-cooperating.
52
A producer process produces information that is consumed by a consumer process. To allow producer and consumer processes to run concurrently, we must have available a buffer of items than can be filled by the producer and emptied by the consumer. A producer can produce one item while the consumer is consuming another item. The producer and consumer must be synchronized, so that the consumer does not try to consume an item that has not yet been produced. In this situation, the consumer must wait until an item is produced.
Division of Information Engineering School of Electrical and Electronic Engineering
53
The unbounded-buffer producer-consumer problem places no practical limit on the size of the buffer. The consumer may have to wait for new items, but the producer can always produce new items. The bound-buffer producer-consumer problem assumes a fixed buffer size. In this case, the consumer must wait if the buffer is empty, and the producer must wait if the buffer is full. The buffer may either be provided by the operating system through the use of an inter-process communication (IPC) facility, or explicitly coded by the application programmer with the use of shared memory.
Division of Information Engineering School of Electrical and Electronic Engineering
54
A shared memory solution for the bounded-buffer the producer-consumer problem Shared variable : #define BUFFER_SIZE 10 /* n=10*/ typedef struct { ... } item; item buffer[BUFFER_SIZE]; int in =0; /*in, out initialized to 0.*/ int out=0; in==out buffer is empty ((in+1)%BUFFER_SIZE)==out buffer is full
55
A shared memory solution for the producer-consumer problem (continue) The consumer process The producer process while (1) { while (1) { while (in == out); produce an item in nextProduced ; /*do nothing*/ while (((in +1 ) % nextComsumed = BUFFER_SIZE)== out); buffer[out]; /* do nothing */ out = (out +1) % buffer[in] = nextProduced; BUFFER_SIZE; in = (in + 1)%BUFFER_SIZE; consume the item in nextConsumed ; } }
Division of Information Engineering School of Electrical and Electronic Engineering
56
IPC provides a mechanism to allow processes to communicate and to synchronize their actions. IPC is best provided by a message system. An IPC provides at least 2 operations: send (message) receive (message) Messages sent by a process can be of either fixed or variable size.
Division of Information Engineering School of Electrical and Electronic Engineering
57
Direct Communication
send (P, message)---Send a message to process P. receive (Q, message)---Receive a message from process Q. send (A, message)---Send a message to mailbox A. receive (A, message)---Receive a message from mailbox A.
Blocking send: The sending process is blocked until the message received by the receiving process or by the mailbox. Nonblocking send: The sending process sends the message and resume operation. Blocking receive: The receiver blocks until a message is available. Nonblocking receive: The receiver retrieves either a valid message null.
Division of Information Engineering School of Electrical and Electronic Engineering
58
Buffering Zero capacity: The queue has maximum length 0; the sender must block the recipient receivers the message; Bounded capacity: The queue has finite length n; at most n messages can reside in it; Unbounded capacity: The queue has potentially infinite length; Remote Procedure Calls (RPCs) The RPC was designed as a way to abstract the procedure-call mechanism for use between systems with network connections. The semantics of RPCs allow a client to invoke a procedure on a remote host as it would invoke a procedure locally.
Division of Information Engineering School of Electrical and Electronic Engineering
59
Threads
A thread is a basic unit of CPU utilization. A thread is sometimes called a light weight process (LWP). It consists of a program counter, a register set, and a stack space. It shares with peer threads its code section, data section, and OS resources, e.g. open files and signals. A task can have many threads. A traditional (or heavyweight) process has a single thread of control
Division of Information Engineering School of Electrical and Electronic Engineering
60
files stack
thread thread
Single threaded
multithreaded
61
Why Threads ?
Responsiveness: Multithreading an interactive application may allow program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user. Resource sharing: share the memory and the resources of the process to which they belong. Economy: Allocating memory and resources for process creation is cost, because threads share resources of the process to which the belong, it is more economical to create and context switch threads. Utilization of multiprocessor architectures: each thread may be running in parallel on a different processor.
Division of Information Engineering School of Electrical and Electronic Engineering
62
Kernel Threads
Kernel threads are supported directly by the operating systems. Because thread management is done by the operating system, kernel threads are generally slower to create and manage. Since the kernel is managing the threads, if a thread performs a blocking system call, the kernel can schedule another thread in the application for execution. In a multiprocessor environment, the kernel can schedule thread on different processors.
impossible. it should not be necessary since threads is under the control of their own parent only.
Division of Information Engineering School of Electrical and Electronic Engineering
64
CPU Scheduling
Basic Concepts:
CPU-I/O Burst Cycle Process execution begins with a CPU burst. It is followed by an I/O burst, which is followed by another CPU burst, then another I/O burst, and so on. Process is terminated with a CPU burst which will end with a system request to terminate execution, rather than with another I/O burst.
Division of Information Engineering School of Electrical and Electronic Engineering
65
load store add store read from file wait for I/O store increment index write to file wait for I/O
CPU burst
66
(continue)
An I/O-bound program have many very short CPU bursts. A CPU-bound program have a few very long CPU bursts. Burst distribution can be important in CPU scheduling.
67
CPU Scheduler
CPU scheduler selects from among the processes in memory that are ready to execute, and allocates the CPU to one of them. The ready queue is not necessarily a first-in, first-out (FIFO) queue. A ready queue may be implemented as a FIFO queue, a priority queue, a tree, or simply an unordered linked list. The record in the queues are generally PCBs of the processes.
68
1. Switching context 2. Switching to user mode 3. Jumping to the proper location in the user program to restart that program
69
Preemptive Scheduling
CPU scheduling decisions may take place under the following four circumstances: 1.When a process switches from the running state to the waiting state (for example, I/O request, or invocation of wait for the termination of one of the child processes) 2. When a process switches from the running state to the ready state (for example, when an interrupt occurs) 3. When a process switches from the waiting state to the ready state (for example, completion of I/O) 4. When a process terminates
Division of Information Engineering School of Electrical and Electronic Engineering
70
Scheduling for circumstances 1 and 4 are nonpreemptive as there is no choice in terms of scheduling. Scheduling for circumstances 2 and 3 are preemptive as there is choice in terms of scheduling. Under nonpreemptive scheduling, once the CPU has been allocated to a process, the process keeps the CPU until it releases the CPU either by terminating or by switching to the waiting state. Preemptive scheduling incurs a cost. It also has an effect on the design of the operating-system kernel.
71
Scheduling Criteria
CPU utilization: We want to keep the CPU as to be busy as possible maximized Throughput : It is a measure of the number of processes that are completed per time unit. Turnaround time: The interval from the time of submission of a process to the time of completion.
Turnaround time is the sum of the periods spent waiting to get into memory, waiting in the ready queue, executing on the CPU, and doing I/O.
Waiting time: a process spends waiting in the to be minimized ready queue. Waiting time is the sum of the periods spent waiting in the ready queue. Response time: In an interactive system, it is the amount of time it takes to start responding, but not the time that it takes to output that response.
Division of Information Engineering School of Electrical and Electronic Engineering
72
73
P2
27
P3
30
The average waiting time (awt) = (0+24+27)/3=17 ms If the order of arrival is P2, P3, P1, awt = (6+0+3)/3=3ms
P2
0 3
P3
6
P1
30
74
Example of SJF scheduling Consider the following set of processes that arrive at time 0, with the length of CPU-burst time given in millisecond as:
Process P1 P2 P3 P4 Burst Time 6 8 7 3
We get the following Gantt chart: P4 P1 P3 P2 0 16 3 9 24 The average waiting time = (3+16+9+0)/4=7 ms
If FCFS is used, then average waiting time = 10.25 milliseconds
Division of Information Engineering School of Electrical and Electronic Engineering
76
(continue)
For batched process, a user specifies the length of the process time limit when submitting job. SJF is used frequently in long-term scheduling. SJF is difficult to implement in CPU scheduling as there is no way to know . We may predict the length of next CPU burst and select process with the shortest next CPU burst. The next CPU burst is generally predicted as an exponential average of measured lengths of previous CPU bursts.
77
Prediction of next CPU burst Let tn be the length of the nth CPU burst, and let n+1 be predicted value for the next CPU burst. Then for , 0 1, define n +1 = tn + (1 ) n tn contains most recent information; n stores the past history. controls the relative weight of recent and past history. if =0, n +1 =n; if =1, n +1 =tn. Usually, =0.5 and recent history and past history are equally weighted.
Division of Information Engineering School of Electrical and Electronic Engineering
78
79
The SJF algorithm may be either preemptive or nonpreemptive. A preemptive SJF algorithm will preempt the currently executing process. A nonpreemptive SJF algorithm will allow the currently running process to finish its CPU burst. Preemptive SJF scheduling is sometimes called shortest-remaining-time-first scheduling.
Division of Information Engineering School of Electrical and Electronic Engineering
80
We get the following Gantt chart: P1 0 1 P2 5 P4 P1 P3 17 26 10 The average waiting time = ((10-1)+(1-1)+(172)+(5-3))/4=26/4=6.5ms
How long is the average waiting time for nonpreemptive SJF ? (7.75 milliseconds)
Division of Information Engineering School of Electrical and Electronic Engineering
81
Priority Scheduling
Priority Scheduling allocates CPU to the process with the highest priority associated with a process. Equal priority processes are scheduled in FCFS order. An SJF algorithm is a priority algorithm where the priority is the inverse of the (predicted) next CPU burst. In some priority representation, lower numbers represent higher priority.
Division of Information Engineering School of Electrical and Electronic Engineering
82
Example of priority scheduling Consider the following set of processes that arrive at time 0, with the length of burst time given in milliseconds: Process Burst Time Priority
P1 P2 P3 P4 P5 10 1 2 1 5 3 1 3 4 2
P3 18
P4 19
Priority can be defined internally or externally. Internally defined priorities use some measurable quantities, e.g. time limits, memory requirements, etc. Externally defined priorities can be based on the importance of the task, fund paid, etc. Priority scheduling can be preemptive and nonpreemptive. A major problem is indefinite blocking or starvation. (In a heavily loaded system, a low-priority process can never get CPU time.) Aging is a technique of gradually increasing the priority of processes that wait in the system for a long time. This technique can be used to avoid blocking.
84
The RR scheduling algorithm is preemptive. If a process CPU burst exceeds 1 time quantum, that process is preempted and is put back in the ready queue.
Division of Information Engineering School of Electrical and Electronic Engineering
85
Example of RR scheduling Consider the following set of processes that arrive at time 0, with the length of burst time given in milliseconds:
Process P1 P2 P3 Burst Time 24 3 3
86
The performance of RR depends on the size of time quantum. if the time quantum is very large, RR same as FCFS. if the time quantum is very small, RR becomes processor sharing. Context switch time needs to be taken into account. Turnaround time also depends on the size of time quantum. To obtain better performance, 80% of the CPU bursts should be shorter than the time quantum.
Division of Information Engineering School of Electrical and Electronic Engineering
87
88
High priority
low priority
batch processes
89
90
Consider FCFS, SJF and RR (quantum=10ms) FCFS: P1 P2 P3 P4 P5 49 39 42 61 10 0 The average waiting time = (0+10+39+42+49)/5=28 ms
Division of Information Engineering School of Electrical and Electronic Engineering
92
The average waiting time = (10+32+0+3+20)/5=13 ms RR(time slice =10ms): P1 0 10 P2 P3 P4 P5 P2 P5 P2 30 61 40 50 52 20 23 The average waiting time = (0+32+20+23+40)/5=23 ms
Queuing modeling
The CPU is treated as a server and the ready queue contains waiting processes. The queue can be optimized for shorter waiting time and so on by knowing the arrival rates and the service rates. The difficulty of applying this model is due to the complexity of the queue analysis for various cases.
94
The Littles formula: n = W where n is the average queue length excluding the process being served, is the average arrival rate for new processes W is the average waiting time in the queue Example 7 processes arrive per second, normally 14 processes found in the queue. W = 14/7 = 2 seconds
Division of Information Engineering School of Electrical and Electronic Engineering
95
Race condition in concurrent execution of the modified solution to the producer and consumer problem for handling more than (n-1) items in the buffer
Code for the consumer process Code for the producer process While (1) { While (1) { while (counter == 0) ; produce an item in nextProduced /* do nothing */ while (counter ==BUFFER_SIZE); nextConsumed = buffer[out]; /* do nothing */ out = ( out + 1)%BUFFER_SIZE; buffer[in] = nextProduced; counter --; in = (in +1) % BUFFER_SIZE; consume item in nextConsumed counter ++; } }
Division of Information Engineering School of Electrical and Electronic Engineering
Process Synchronization
96
When the modified algorithms are executed in parallel, the final result depends on the actual sequence of the instruction execution. Example:
counter = 5, correct result after concurrent execution of both consumer and producer algorithms should be 5. What will the answer be if the counter increment and decrement are implemented in assembly language as follows ? Code for the producer process Code for the consumer process
97
The actual concurrent execution sequence of instructions from both processes are interleaved. T0 : producer execute register1 := counter; {register1 =5} T1 : producer execute register1 := register1 +1; {register1 =6} T2 : consumer execute register2 := counter; {register2 =5} T3 : consumer execute register2 := register2 -1; {register2 =4} T4 : producer execute counter := register1; {counter =6} T5 : consumer execute counter := register2; {counter =4} By allowing both processes to manipulate the counter variable concurrently, the final result can be 4. If T4 & T5 executed in the reverse order, counter=6 . Hence, process synchronization is necessary.
Division of Information Engineering School of Electrical and Electronic Engineering
98
Consider a system consisting of n processes. Each process has a segment of code, called a critical section.
The critical section may contain instructions that change common variables, updating a table and so on.
A protocol is to be designed such that each process must request permission to enter its critical section.
Division of Information Engineering School of Electrical and Electronic Engineering
99
General structure of a typical process The section of code implementing the request to enter its critical-section is the entry section. The critical-section may be followed by the exit section. The remaining code is the remainder section. do { entry section critical section exit section remainder section } while (1);
Division of Information Engineering School of Electrical and Electronic Engineering
100
A solution to the critical-section problem must satisfy the following three requirements:
1. Mutual Exclusion : If a process is executing in its critical-section, then other processes cannot be executing in its critical-section. 2. Progress : If no process is executing in its critical-section and some processes wish to enter their critical-sections, then only those processes that are not executing in their remainder section will participate in the decision to enter its critical section next. The selection cannot be postponed indefinitely. 3.Bounded Waiting : The number of times that other processes are allowed to enter their critical-section is bounded if a process made request to enter its critical-section .
Division of Information Engineering School of Electrical and Electronic Engineering
101
Two-process solution(Algorithm1)
For two processes named as Pi and Pj . I == 0, j == 1 One common variables are shared by Pi and Pj. int turn; Initially, turn = 0 or turn = 1 . do{ /*for process Pi*/
while ( turn !=i) ; critical section turn = j;; remainder section }while (1); (replace i by j and j by i in the above for process Pj )
It failed requirement 2, i.e. strict alternation of processes in the execution of critical section. e.g, P0 may be in remainder section when turn ==0.
Division of Information Engineering School of Electrical and Electronic Engineering
102
Two-process solution(Algorithm2)
For two processes named as Pi and Pj . I == 0, j == 1 One common variables are shared by Pi and Pj. boolean flag[2]; initially flags are set to false. do{ /*for process Pi*/ flag[i] = true; while ( flag[j] ) ; critical section flag[i]= false; remainder section }while (1); It failed requirement 2, e.g. consider the following execution sequence: P0 sets flag[0] = true, then P1 sets flag[1] = true; Both processes will be looping forever. Switching the order of the instructions in the program violate requirement 1.
Division of Information Engineering School of Electrical and Electronic Engineering
103
Two-process solution(Algorithm3)
Two common variables are shared by Pi and Pj. boolean flag[2]; initially flags are set to false. int turn; Initially, turn = 0 or turn = 1 .
do{ /*for process Pi*/ flag[i] = true; turn = j; while ( flag[j] && turn == j) ; critical section flag[i]= false; remainder section }while (1);
104
How are the three requirements met ? Requirement 1: Each Pi enters its critical section only if either flag[j]==false (i.e. Pj is in the remainder section) or turn==i. turn can be either 0 or 1, but not both at the same time. Hence, mutual exclusion is preserved. Requirements 2&3: Pi can be prevented from entering its critical section only if it is stuck in the while loop with condition flag[j]==true and turn==j. Since when Pj is in the remainder section, flag[j]==false, hence Pi will not wait for ever. Also Pj will not be allowed to enter its critical section all the time. Progress and bounded-waiting are satisfied.
Division of Information Engineering School of Electrical and Electronic Engineering
105
Multiple-process solution (the bakery algorithm) Each process receives a number. The process with the lowest number is served next. If two processes receive the same number, the process with smaller process identification number is served first. Common data structures are boolean choosing [n]; int number[0..n]; Initially, common data structures are set to false or 0. Define the following notation: (a,b) < (c,d) if a < c or if a==c and b<d. k=max(a0, , an-1) such that kai for i=0,,n-1.
106
do { /*bakery algorithm for process Pi*/ choosing[i]=true; number[i]=max(number[0], , number[n-1])+1; choosing[i]=false; for (j= 0 ; j < n; j++){ while (choosing[j]) ; while (number[j]) != 0 && (number[j],j) < (number[i],i)); } critical section number[i]=0; remainder section } while (1);
Division of Information Engineering School of Electrical and Electronic Engineering
107
Semaphores
It is a synchronization tool used for more complex problems. A semaphore S is an integer variable that, apart from initialization, is accessed only through two standard atomic operations: wait and signal (also termed P and V correspondingly). An atomic operation is one that if executed, it will be executed until completion, or it will not be executed.
i.e when one process modifies the semaphore value, no other process can simultaneously modify that same semaphore value.
108
The classical definitions of wait and signal are: wait(S) { while S <=0 ; /* no-op*/ S--; } signal(S){ S++; } However, mutual exclusion solutions studied so far all require busy waiting which results in wasting CPU cycles. This type of semaphore is also called a spinlock
Division of Information Engineering School of Electrical and Electronic Engineering
109
Example of using semaphore The consumer and producer problem using semaphores Semaphore empty = n; Semaphore full = 0; Semaphore mutex = 1; Code for the producer process
do {
wait(empty); wait(mutex);
do { wait(full ); wait(mutex);
110
Examples of semaphore Let us define P(s): [while (s==0); s--;] V(s): [s++;] P(s) is then the wait(s) and V(s) is the signal(s) semaphore mutex = 1;fork(proc_0,0);fork(proc_1,0); Proc_0(){ while (TRUE) { <compute section>; P(mutex); <critical section> V(mutex) } } Proc_1(){ while (TRUE) { <compute section>; P(mutex); <critical section> V(mutex) } }
111
Using semaphores to synchronize 2 processes semaphore s1=0 ; semaphore s2 = 0; fork(proc_A,0); fork(proc_B,0); proc_A(){ while (TRUE) { <compute A1>; write(x); V(s1); /*signal to proc_B*/ <compute A2> P(s2); /*wait for proc_B*/ read(y); } } proc_B(){ while (TRUE) { P(s1); /*wait for proc_A*/ read(x); <compute B1>; write(y); V(s2); /*signal to proc_B*/ <compute B2>; } }
112
Example of reader-writer problem Problem statement: A resource is shared by a community of processes of two distinct types : reader and writer. A reader can share the resource with any other reader process but not with any writer process. A writer process requires exclusive access to the resource whenever it acquires any access to the resource.
113
Deadlock and starvation Consider the two processes, P0 and P1 , each accessing two semaphores, S and Q, initially set to the value 1: P0 P1 wait(S); wait(Q); wait(Q); wait(S); .. .. signal(S); signal(Q); signal(Q); signal(S); Suppose P0 executes wait(S) and then P1 and executes wait(Q). When P0 executes wait(Q), it must wait until P1 to executes signal(Q). Similarly, when P0 executes wait(S), P1 must wait until P0 executes signal(S). P0 and P1 are deadlocked.
Division of Information Engineering School of Electrical and Electronic Engineering
115
The Dining-Philosophers Problem Consider 5 philosophers who spend their lives thinking and eating. On the round table, there are 5 plates and 5 chopsticks. When one philosopher eats, he will use 2 chopsticks closest to him, and release the chopsticks only when he finishes. Each chopstick is represented by a semaphore. A philosophers execute a wait on the semaphore for a chopstick to be grabbed. When he finishes, a signal will be executed on the semaphore to release the chopstick. semaphore chopsticks[5];
116
Eat
This algorithm guarantees no two neighbors can eat at the same time. However, if all philosophers are hungry at the same time, all grabbed their left chopsticks and none can grab the right chopsticks. As a result, deadlock leads to starvation to death.
118
Deadlock
Necessary Conditions for deadlock 1. Mutual exclusion: At least one resource must be held in a non-sharable mode, i.e. only one process at a time can use the resource. Other processes requesting this resource must wait until this resource is released. 2. Hold and Wait: A process is holding at least one resource and is waiting to acquire additional resources that are currently being held by other processes.
Division of1998/99 semesterEngineering E455 Information 1 School of Electrical and Electronic Engineering
119
3. No preemption: Resources cannot be preempted, i.e. a resource can be released only voluntarily by the process holding it after that process has completed its task. 4. Circular wait: A set of n process (P0 , P1 , , Pn ) waiting in such a way that P0 is waiting for a resource that is held by P1 , P1 is waiting for a resource that is held by P2 , , Pn-1 is waiting for a resource that is held by Pn , and Pn is waiting for a resource that is held by P0. All four conditions must hold for a deadlock to occur. The circular wait condition implies the hold and wait condition.
120
Resource-Allocation (R-A) Graph for deadlock description The R-A graph is a directed graph consisting of a set of vertices V and a set of edges E. The set of V consists of P={P1 , P2, , Pn} and R={R1 , R2, , Rm} P is the set containing all the processes in the system. R is the set containing all resource types in the system.
R1 R3
P1 R2
P2 R4
P3
121
A directed edge from process to resource is denoted by Pi R j. Pi Rj is a request edge, which signifies that Pi requested an instance of Rj and is currently waiting for the resource. A directed edge from resource to process is denoted by Rj P i. Rj Pi is an assignment edge, which signifies the allocation of an instance of Rj to Pi. If the graph contains no cycles, then no process in the system is deadlocked. If the graph contains a cycle, then deadlock may exist. If each resource type has several instances, a cycle does not necessarily imply that a deadlock has occurred.
Division of Information Engineering School of Electrical and Electronic Engineering
122
P1 R2
P2 R4
P3
123
P1 R2
P2
P3
124
To ensure that deadlock never occur, a deadlock prevention or a deadlock avoidance scheme can be used. Deadlock detection and recovery allows deadlock to occur.
Division of Information Engineering School of Electrical and Electronic Engineering
125
Allow resources to be sharable as much as possible. Ensure that whenever a process requests a resource, it does not hold any other resources. Possible ways are: all requested resources are allocated to a process before the process begins execution. Request resource only when the process has none. Two main disadvantages : 1. low resource utilization : resource allocated but unused for a long time 2. starvation : a process that needs several popular resources may have to wait indefinitely.
Division of Information Engineering School of Electrical and Electronic Engineering
126
3. No preemption (prevention of ):
one of the ways to allow resource preemption is: If a process, that is holding some resources, requests another resources that is unavailable, then all its currently held resources are preempted The preempted resources are added to the list of resources for which the process is waiting. The process will be restarted only when it can regain all its old resources as well as the new ones. Impose a total ordering of all resources types Process must requests resources in an increasing order of enumeration. e.g. F(tape drive) =1; F(disk drive) = 5;F(printer) = 12 A process that needs both tape drive and printer must request the tape drive first and then request the printer.
127
Deadlock Avoidance
a priori information is obtained for each process, e.g. the maximum number of resources of each type that may be requested, a deadlock avoidance algorithm is developed. The deadlock avoidance algorithm dynamically examine the resource-allocation state to ensure there will not be a circular-wait condition. The resource allocation state is defined by the number of available and allocated resources, and the maximum demands of the processes.
Division of Information Engineering School of Electrical and Electronic Engineering
128
Safe state A safe state is one in which the system can allocate resources to each process (up to its maximum) in some order and still avoid a deadlock. A system is in safe state only if there exist a safe sequence of resource allocation. A safe state is a deadlock free state. R1 R1 A claim edge P2 R2 If the request for R2 made by P2 (left) is granted, the system will go into an unsafe state (right)
129
P1 R2
P2
P1
Bankers algorithm
A bank will never allocates its available cash such that it can no longer satisfy the needs of all its customers. When a new process enters the system, it must declare the maximum number of instances of each resource type that it may need. The system will determine if the allocation of these resources will leave the system in a safe state. If a safe sequence of resource allocation exists, the requests are successful. Otherwise, the process must wait until some other processes release enough resources.
130
Data structure for the bankers algorithm Available : A vector of length m indicates the number of available resources of each type (total of m types). Available[j]=k means k instances of Rj available. Max : An nm matrix defines the maximum demand of each process (total of n processes). Max[i,j]=k means Pi may request at most k instances of Rj. Allocation : An nm matrix defines the number of resources of each type currently allocated to each process. Allocation[i,j]=k means Pi is currently allocated k instances of Rj.
Division of Information Engineering School of Electrical and Electronic Engineering
131
(continue)
Allocationi means resources currently allocated to Pi. Needi specifies additional resources Pi may still request to complete its task
Division of Information Engineering School of Electrical and Electronic Engineering
132
The safety algorithm 1. Let Work and Finish be vector of length m and n, Work := Available; Finish[i]=false for i=1,2,n. 2. Find an i such that, Finish[i]=false and Needi <= Work . If no such i, goto step 4 3. Work := Work + Allocationi, Finish[i]=true. goto step 2 4. If Finish[i]=true for all, then the system is in a safe state; Otherwise, unsafe.
Division of Information Engineering School of Electrical and Electronic Engineering
133
The Resource-Request algorithm 1. If Requesti <= Needi , goto step 2. Otherwise, signal an error. 2. If Requesti <= Availablei , goto step 3. Otherwise Pi must wait since the resources are not available. 3. Pretend the system will allocate resource to Pi by Available := Available - Requesti; Allocationi := Allocationi + Requesti; Needi := Needi - Requesti; If the resulting state is safe, Pi will be allocated the requested resources. Otherwise the old resource-allocation state is restored.
Division of Information Engineering School of Electrical and Electronic Engineering
134
Example Consider 5 processes P0 , P1 ,.., P4 , 3 resource types A, B, C. A has 10 instances, B has 5 instances and C has 7 instances. At time T0, the snapshot of the system is as follows. Allocation ABC P0 P1 P2 P3 P4 0 10 2 00 3 02 2 11 0 02 Max ABC 7 5 3 3 2 2 9 0 2 2 2 2 4 3 3
135
Available ABC 3 3 2
Need AB C P0 P1 P2 P3 P4 7 43 1 22 6 00 0 11 4 31
< P1, P3, P4, P0, P2 > satisfies the safety criteria. Therefore, the system is safe.
A trace of the safety algorithm for the example Work = 3 3 2 finish = 0 0 0 0 0 finish = 0 1 0 0 0 order = 1 Work = 5 3 2 finish = 0 1 0 1 0 order = 1 3 Work = 7 4 3 finish = 0 1 0 1 1 order = 1 3 4 Work = 7 4 5 finish = 0 1 0 1 1 finish = 1 1 0 1 1 order = 1 3 4 0 Work = 7 5 5 finish = 1 1 1 1 1 order = 1 3 4 0 2 Work = 10 5 7 finish = 1 1 1 1 1
136
Suppose P1 request one additional instance of A and two instances of C, Request1 = (1,0,2) Since Request1 <= Available , pretend that this request can be fulfilled, the new state will be
Allocation ABC P0 P1 P2 P3 P4 0 10 3 02 3 02 2 11 0 02 Need ABC 7 4 3 0 2 0 6 0 0 0 1 1 4 3 1 Available ABC 2 3 0 < P1, P3, P4, P0, P2 > satisfies the safety criteria. Therefore, the system is safe.
137
Need
ABC 7 0 6 0 4 2 2 0 1 3 3 0 0 1 1
Available
ABC 2 1 0
138
Deadlock Detection
Single Instance of Each Resource Type:
We can define a deadlock-detection algorithm that uses a variant of the resource-allocation graph, called a wait-for graph.
139
Deadlock Recovery
Process Termination
a. Abort all deadlocked processes b. Abort one process at a time until the deadlock cycle is eliminated.
Resource Preemption
a. Selecting a victim b. Rollback. Roll back the process to some safe state c. Starvation. Ensure that a process can be picked as a victim only a (small) finite number of times.
140
Memory Management
Background
Address Binding: A user program will go through several steps before being executed. Addresses in the source program are generally symbolic (such as count). A compiler binds symbolic addresses to relocatable address (e.g. 14 bytes from the beginning of this module. The linker or loader in turn bind relocatable address to absolute address. Each binding is a mapping from one address space to another.
Division of Information Engineering School of Electrical and Electronic Engineering
141
source program
object module
linker
System library
142
Logical Vs Physical Address Space Logical address is an address generated by the CPU. Physical address is an address seen by the memory unit. Compile time and load time address binding results in both logical and physical address being the same. Execution time address binding results in logical and physical address being different. This is known as virtual address. Runtime mapping from virtual address to physical address is done by a hardware device known as the memory management unit (MMU). Dynamic loading a routine is not loaded until it is called.
Division of Information Engineering School of Electrical and Electronic Engineering
143
CPU
Dynamic Linking & shared library: With static linking, the system libraries are treated like any other module and are combined by the loader into the binary program image. With dynamic linking, a stub is included in the image for each library-routine reference. A stub is a small piece of code that indicates how to locate the appropriate memory-resident library routine or how to load the library if the routine is not already in present. With shared libraries, more than one version of a library may be loaded into memory. Programs that are complied with the different library version will use the corresponding library.
Division of Information Engineering School of Electrical and Electronic Engineering
145
Swapping
A process needs to be in memory to be executed. A process, however, can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution.
146
Contiguous memory allocation Memory is usually divided into two partitions: one for resident OS, and one for user processes. We usually want several user processes to reside in memory at the same time. In contiguous memory allocation, each process is contained in a single contiguous section of memory. Memory Protection Memory protection are needed : Limit and relocation registers can be used together to specify the limit and ranges of logical addresses.
Division of Information Engineering School of Electrical and Electronic Engineering
147
yes
memory
trap; address error Hardware support for relocation and limit registers
Division of Information Engineering School of Electrical and Electronic Engineering
148
Multiple-Partition Allocation When several user processes need to reside in memory at the same time, a fixed size partition can be allocated to each process. Each partition may contain exactly one process. The degree of multiprogramming is bound by the number of partitions. When a partition is free, a process is loaded into it. When the process terminates, the partition becomes available for another process. The operating system keeps a table indicating which parts of memory are available and which are occupied
Division of Information Engineering School of Electrical and Electronic Engineering
149
Multiple-Partition by pure memory segmentation OS keeps a table for tracking the current memory usage. A block of available memory is called a hole. OS keeps a list of the holes. When a process arrives, a hole large enough for this process will be allocated (filled). 0K OS Example Available memory = 2560K 400K Resident OS = 400K 2160K Job scheduling : FCFS
2560K
Division of Information Engineering School of Electrical and Electronic Engineering
150
(continue)
job queue Process memory time P1 600K 10 P2 1000K 5 P3 300K 20 P4 700K 8 P5 500K 15 OS P1 P2 P3 0K 400K 1000K 2000K 2300K 2560K OS P1 P3 0K 400K 1000K 1700K 2000K 2300K 2560K OS P1 P4 P3
P2 terminates
P4 allocated
151
(continue)
0K 400K 1000K 1700K 2000K 2300K 2560K OS 0K 400K 900K 1000K 1700K 2000K 2300K 2560K OS P5 P4 P3
P4 P3
P1 terminates
P5 allocated
152
Memory hole filling strategy First-fit : allocate the first hole that is big enough. Best-fit : allocate the smallest hole that is big enough. Worst-fit : allocate the largest hole. Best-fit and Worst fit strategy requires to search the entire list holes before a decision can be made.
153
External Fragmentation In multiple-partition allocation scheme, as processes are loaded and removed from memory, the free memory space is broken into little pieces, i.e. fragmentation. External fragmentation exists when enough total memory space exist to satisfy a request, but it is not contiguous. In previous example, a total of external fragmentation is 300+260=560K. P5 only needs 500K, but cannot be allocated. External fragmentation can lead to memory wastage.
Division of Information Engineering School of Electrical and Electronic Engineering
154
Internal Fragmentation Internal fragmentation is a piece of memory that is internal to a partition, but is not being used. Example
OS Next request is for 18462 bytes P1 P43 Hole of 18464 bytes
2 bytes are left unused forming a tiny hole. Keep track of a lot of tiny holes creates more overheads
Division of Information Engineering School of Electrical and Electronic Engineering
155
Compaction Memory contents are shuffled to place all free memory together on a large block. Example
0K 400K 900K 1000K 1700K 2000K 2300K 2560K OS P5 P4 P3 0K 100K 300K 260K 400K 900K 1600K 1900K 2560K After compaction
156
OS P5 P4 P3 660K
Before compaction
Paging paging allows non-contiguous memory allocation. Physical memory is broken into fixed-sized blocks called frames. Logical memory is also broken into blocks of same size called pages. (typically 4KB to 8 KB per page) Process specifies its memory requirement in terms of pages. When a process is to be executed, its pages are loaded into any available memory frames.
Division of Information Engineering School of Electrical and Electronic Engineering
157
Page table
Paging hardware
158
0 1 2 3
1 4 3 7
page table
Every address generated by CPU is divided into 2 parts: a page number (p) and a page offset (d). p is used as an index into a page table which contains the base addresses of each page in physical memory. External fragmentation is avoided but internal fragmentation can still exist. From users view, memory is seen as one single contiguous space. Actually, the user program is scattered throughout the physical memory. An address translation hardware hides this difference from user. OS keeps track of all the allocation details.
Division of Information Engineering School of Electrical and Electronic Engineering
160
Page tables are normally kept in main memory (memory access is slow). Special hardware called translation look-aside buffer(TLB) is used to speed up page look-up. The percentage of times that a particular page number is found in the TLB is called the hit-ratio. Example:
20 ns to search the TLB, and 100 ns to access memory, then a mapped-memory access takes 120 nanoseconds when the page number is in the TLB. IF we fail to find the page number in the TLB (20 nanoseconds), then we must first access memory for the page table and frame number(100 ns), and then access the desired byte in memory (100 ns), for a total of 220 ns.
Division of Information Engineering School of Electrical and Electronic Engineering
161
To find the effective memory-access time, we must weight each case by its probability: For a 80% hit-ratio, we have effective access time = 0.80 x 120 + 0.20 x 220 = 140 ns. For a 98-percent hit-ratio, we have effective access time = 0.98 x 120 + 0.02 x 220 = 122 ns.
162
Memory protection in a paged environment Protection bits are associated with each frame. One bit can define a page to read/write or read only. An attempt to write to a read only page causes a hardware trap to the OS (memory protection violation).
163
Program Segmentation A program containing subroutines, procedures, functions or module, the program, as seen from users view, contains segments of codes. Each segment is to perform its intended task. Each segment can have different length. A segmentation table is used to record the segmentation of the program. Thus, a logical address consists of a two tuple < segment-number, offset>
164
yes
Physical memory
165
logical address space subroutine segment 0 Sqrt segment 1 stack segment 3 symbol segment 4 main program segment 2
base 1400 0 6300 3200 1 segment 3 2 4300 4300 3 3200 4700 segment 2 4 4700 segment 4 5700 segment table 6300 segment 1 6700 Example of segmentation
166
Segmentation with paging Example: Intel 386. The logical-address space of a process is divided into two partitions. The first partition consists of up to 8KB segments that are private to that process. Information about the first partition is kept in the local description table (LDT), The second partition consists of up to 8KB segments that are shared among all the processes. Information about the second partition is kept in the global descriptor table (GDT). Each segment is paged and each page is 4KB. A two-level paging scheme is used, the first level is a page directory, the second level is the page table.
Division of Information Engineering School of Electrical and Electronic Engineering
167
The logical address is a pair (selector, offset), The selector is a 16-bit number 13 1 2 s - segment number, g - segment is in the GDT or LDT, and p deals with protection. The offset is a 32-bit number specifying the location of the byte (or word) within the segment in question. The physical address is 32 bits. The logical address is as follows
Page no. (level 1) Page no. (level 2) page offset
10
p1
p2
10
12
168
Virtual Memory
Virtual memory is a technique that allows the execution of processes that may not be completely in memory. Virtual memory makes the task of programming much easier, because the programmer no longer needs to worry about the amount of physical memory available. Virtual memory is commonly implemented by demand paging. Demand segmentation can also be used to provide virtual memory.
Division of Information Engineering School of Electrical and Electronic Engineering
169
Physical Memory
Virtual memory Diagram showing virtual memory larger than physical memory
Division of Information Engineering School of Electrical and Electronic Engineering
170
Memory map
Program A Program B
swap out
0 4 8 12 16 1 5 9 13 17 2 6 10 14 18 3 7 11 15 19
swap in
Demand paging - a paging system with swapping When a process is to be executed, the pages to be used immediately are swapped into memory rather than all the pages of the entire process. Thus, decrease the swap time and the amount of physical memory requirement. A valid-invalid bit scheme is used to distinguish between those pages that are in memory and those are on the disk. valid indicates that the associated page is both legal and in memory. invalid indicates that the page is either invalid, i.e. not in the logical address space of the process, or is valid but is currently on the disk
Division of Information Engineering School of Electrical and Electronic Engineering
172
0 1 2 3 4 5 6 7
A B C D E F G H Logical memory
Physical Memory
A C F
A B C D E F
Page table Page table when some pages are not in main memory
173
The hardware to support demand paging: page table, swap space Access to a page marked invalid causes a page fault trap to the OS. 3 OS Steps:
load M 1 6 2 i Page table 5 Free frame 4
1- reference 2- trap 3- page on disk 4- load in page 5- reset page table 6- restart instruction
Physical memory
174
Performance of demand paging Let p be the probability of a page fault (0 <= p <= 1). The effective access time (e.a.t.) is e.a.t. = (1-p)*ma + p* page fault time where ma is the memory access time. If no page fault exist, p=0 and page fault time = 0, then e.a.t. = ma. To compute e.a.t., we must know the time needed to service a page fault, i.e. the page fault time. The page fault service time is due to
servicing the page fault interrupt reading in the (needed) page restarting (resume) the process
175
Example What is e.a.t. if the page fault time=25 ms and ma=100ns ? E.a.t. = (1-p)*100+p*25*106 = 100 + 24999900p Therefore e.a.t. is proportional to page fault rate (i.e. p). If one access out of 1000 causes a page fault (i.e. p=10-3) e.a.t. = 25 micro-second. The computer would be slowed down by a factor of 250 (25 micro-sec/100ns) because of demand paging. It is important to keep the page-fault rate low in a demandpaging system.
Division of Information Engineering School of Electrical and Electronic Engineering
176
Page Replacement If no frame is free, find one that is not currently being used. Steps:
1. find the location of the desired page on the disk 2. find a free frame a) if there is a free frame, use it. b) Otherwise, use a page-replacement algorithm to select a victim frame. c) Write the victim to the disk; updates page and frame tables. 3. Read the desired page into the free frame; update page and frame tables. 4. Restart (resume) the user process.
Division of Information Engineering School of Electrical and Electronic Engineering
177
Page Replacement Algorithm Example Tracing a process find the following address sequence: 0100, 0432, 0101, 0612, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103, 0104, 0101, 0610, 0102, 0103, 0104, 0101, 0609, 0102, 0105. At 100 bytes per page, the following reference string can be generated: 1,4,1,6,1,6,1,6,1,6,1 If one frame is available, there will be 11 faults If 3 frames are available, there will be 3 faults (remember one fault for 1st reference of each page)
178
FIFO Algorithm when a page must be replaced, the oldest page in memory is chosen. Example
Reference string 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 2 0 0 0 1 1 2 2 4 4 4 0 3 3 3 2 2 2 1 0 0 0 3 3 0 0 1 1 3 2 7 7 7 1 0 0 2 2 1
FIFO may suffer from the Beladys anomaly: the page-fault rate may increase as the number of allocated frames increases.
Division of Information Engineering School of Electrical and Electronic Engineering
179
Optimal Page Replacement Replace the page that will not be used for the longest period of time. Unfortunately, the optimal page-replacement algorithm is difficult to implement, because it requires future knowledge of the reference string. . Example
Reference string 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 2 0 0 0 1 1 2 0 3 2 4 3 2 0 3 2 0 1 7 0 1
180
LRU (least recently used) Algorithm the page that has not been used for the longest time will be replaced. Example
Reference string 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 2 0 0 0 1 1 2 0 3 4 4 4 0 0 0 3 3 3 2 2 2 1 3 2 1 0 2 1 0 7
181
Thrashing thrashing refers to high paging activity. A process is thrashing if it is spending more time paging than executing. To prevent thrashing, we must provide a process as many frames as it needs. The Working set Model A working set window, D,is defined. The set of pages in the most recent D page reference is the working set. A page that is actively in use will be in the working set. If it is no longer used it will drop from the working set.
Division of Information Engineering School of Electrical and Electronic Engineering
182
Operating-System Examples Windows NT Windows NT implements virtual memory using demand paging with clustering. Clustering handles page faults by bringing in not only the faulting page, but also multiple page surrounding the faulting page. When a process is first created, it is assigned a working-set minimum and maximum. The working-set minimum is the minimum number of pages the process is guaranteed to have in memory. If sufficient memory is available, a process may be assigned as many pages as its working-set maximum. Automatic working-set trimming works by evaluating the number of pages allocated to processes.
Division of Information Engineering School of Electrical and Electronic Engineering
184