You are on page 1of 970

PREFACE

The world has changed a great deal since the first edition of this book ap-
peared in 1992. Computer networks and distributed systcms of all kinds have
become very cmmon. Small children naw mam the Internet, where previously
only computer pmfessimals went. As a consequence, this book has changed a
great deal, too.
The most obvious change is that the first edition was about half on single-
procrssor operating systems and half on distributed syskms. I c h s e that formal
in 1.991 because few universities then had courses un distributed systems and
whatever students l e m e d a b u t distributed systerns had to be put intu the operar-
ing systems course, for which this book was intended. Now most universities
have a separate course on distnbuted systems, so it is not neccssary to try to com-
bine the two subjects imo m e course and one boak. This buok is intended for a
first course on operating systems, and as such focuses mnstly on traditional
singk-pmessrir sy atems.
I have coauthored two other books on operating systems. This leads to two
poss i ble cúurse sequences.
Practically-onented sequence:
1. Operating Systems Design and Irnplementation by Tanenbaum and Wwdhuil
2, Distriliukd Systems by Taneribaum and Van Steen

Traditionãl squence:
I . M d e m Operãting Sy stems hy Tanenbaurn
2. Distributed Systerns by Tanenbaum and Van Steen
PREFACE

The former sequence uses MINIX and the students are expected to experiment
with MINIX in an accompanying laboratory supplementing the first course- The
latter sequencc'dms not use MINIX. Instead, svme small s i r d a f o r s are avaihble
that can be used for student exercises during a first course using this book. These
can be found starting on the author's Web page: wwrv.cs.vrr.nl/-as[/ by
clicking on Software and supplementary material for my books .
In addition to the major change of switching the emphais to single-processor
operating systems in this book, other major changes include the addition of entire
chapters on computer security, multimedia operating systems, and Windows 2000,
all important and timely topics. In addition, a new and unique chapter on operat-
ing system design has k e n added.
Another new feature is that many chapters now haw a section on research
about the topic uf the chapter. This is intended to inuuduce the reader l o modern
work in processes, memory management, and so on. These sections have
numerous references to the current research literature for the interested reader. In
addition, Chapter 1 3 has many introductory and tutorial references.
Finally, numemus topics have been added to this book or heavily revised.
These topics include: graphical user intefaces. multiprocessor operating systems.
power management for laptops, trusted systems, viruses, network terminals, CD-
ROM file systems, mutexes. RAID, soft timers, stable storage. fair-share schedul-
ing, and new paging algorithms. Many new problems haw been added and old
ones updated. The total number of problems now exceeds 450. A snlutions
manual is available to professors using this book in a course. They can obtain a
copy from their local Rentice Hall representative, In addition, over 250 new
references to the current literature have been added to bring the book up to date.
Despite the removal of more than 400 pages of old material. the book has
increased in size due to the large mount nf new material added. While the bmk
is still suitable for a one-semester or two-quarter course, iiis probably too long for
a one-quarter nr one-trimester course at most universities. For this reason, the
book has been designed in a modular way. Any course on operating systems
should cover chapters 1 through 6. This is basic material that every student show
know.
If additional time is available, additional chapters can be covered. Each of
them assumes the reader has finished chapters I through 6, but Chaps. 7 through
12 are each self contained. so any desired subset can be used and in any order,
depending on the interests of the instructor. In the author's opinion. Chaps. 7
through 12 are much more interesting than the earlier ones. Instructors should tell
heir students that they have to eat their broccoli before they can have the double
chocolate fudge cake dessert.
I would like to thank the following people for their help in reviewing parts of
the manuscript: f i d a Bazzi, Riccardo Bettati, Felipe Cabrera. Richard Chapman,
John Connely, John Diekinson, John Elliott, Deborah Frincke, Chandana Gamage,
Robbert Geist, David Golds, Jim Griffioen, Gary Harki n, Frans Kaashoek, Muk-
PREFACE XXV

kai Kriahnamoorthy, Monica Lam, Jussi Leiwo. Herb Mayer. Kirk McKusick, E v i
Nemeth, Bill Potvin, Prasant Shenoy, Thomas Skinner, Xian-He Sun, ~ i l l i a m
Terry, Robbert Van Renesse, and Maarten van Steen. Jamie Hanrahan, Mark
Russinovich, and Dave Solomon were enormously knowledgeable about Win-
dows 2OOO and very helpful. Special thanks go to A! Woodhull for valuablc
reviews and thinking of many new end-of-chapter problems.
My students were also helpful with comments and feedback, especial1y S taas
de Jnng, Jan de Vos, Niels Drmt, David Fokkema, Auke Folkerts, Peter Groene-
wegen, Wilcr~ Ibes, Stefan Jansen, Jeroen Kcterna, Joeri Mulder, Irwin
Oppenheim, Stef Post. Urnar Rehman, Daniel Rijkhof, Maarten Sunder, Maurits
van der Schee, Rik van der Stoel, Mark van D i d , Dennis van Veen. and Thomas
Zeeman.
Barbara and Marvin are still wonderful, as usual, each in a unique way.
Finally, last but not least, I would like to thank Suzanne for her love and patience,
not to mention all the druiven and kersen, which have replaced the s i n ~ s n p p e l s ~ ~ p
in recent times.

Andrcw S. Tanenbaurn
PREFACE

1.1. WHATISANOPERATINGSYSTEM'! 3
1 . 1 . 1 . T h e Operating System as an Extended Machine 3
1.1.2. The Operating Syficm as a Resource Manager 5

1.2. HISTORY OF OPERATING SYSTEMS 6 '

1.2.1. The First Generation ( 1945-55) 6


1.2.2, T h e Second Generation ( 1 955-65) 7
1.2.3. TheThirdGeneraEion (1965-1980) 9
1.2.4. The Fourth Generation (1980-Present) 13
1.2.5. Ontogeny Recapitulates Phylogeny 16

I3 THE OPERATING SYSTEM ZOO 1 8


Mainframe Operating Systems 18
Server Operating Systems 19
Multiprocessor Operating Systems 19
Personal Computer Operating S ys~erns 19
Real-Time Operating Systcms 19
Embedded Operating Sys~cms 20
Srnarr Card Operating Systems 20
vnr CONTENTS

1.4, COMPUTER HARDWARE REVJEW 20


I .4+1. Processors 2 1
1.4+2. Memory 23
1.43. l / ~ ' ~ e v i c e s28
1.4+4. '8uses 3 L

3 S. OPERATING SYSTEM CONCEFTS 34


1 +5.1. Prwcesses 34
1.5.2. Dcadlrscks 36
1.5.3,Memory Management 37
1 S . 4 . Input/43utput 38
1.5-5. Files 38
1.5+6. Security 41
1.5+7. The Shell 41
1.5.8. RecyclingofCslncepts 43

1.6. SYSTEM CALLS 44


1.6.1. SystemCalls forPlvcess Management 48
1 h.2. System Calls for File Management 50
t h . 3 . System Calls for Directory Management 51
1 ~5.4.Miscellaneous Systern Calls 53
1.6+5. The Windows Win32 API 5 3

1.7. OPERATING SYSTEM STRUCTURE 56


1.7.1. MonoIithcSysterns 56
1.7.2. byered Systems 57
1.7.3. Virtual Machines 59
1.7.4+Exokernels 61
1.7.5. Client-Server Model 6 1

1.8. RESEARCH ON OPERATING SYSTEMS 63

1.9. OUTLINEOFTHERESTOFTHISBOOK 69

1 1 0. METRIC UNITS 66
+

I . 11 . SUMMARY 67
CONTENTS

2 PROCESSESANDTHREADS
2.1. PROCESSES 71
2.1.1. The Process Modd 72
2 . I-2. Process Creation 73
2.3.3- Prncess Termination 75
2.1.4. Process Hierarchies 76
2.1.5. Pmcess Statcs 77
2.1.6- Trnple~nentati~n uf Prrxesscs 79
2.2. THREADS 81
2.2.1. . The Thrcad Model 8 1
2.2.2. Thread Usage 85
2.2.3. Implementing Threads in User Space 90
2.2.4. Implementing Threads in the Kernel 93
2.2.5. Hybrid Irnplementati~ns 94
2.2.6. Scheduler Activations 94
2.2.7. Pop-U'p Threads 96
2.2+8. Making Single-Threaded Codc Multithreaded 97
INTERPROCESS COMMUNICATION I 0 0
2.3.1- RaceConditians !00
2+3.2. Critical Regions 102
2.3.3. Mutual Exclusion with Busy Waiting 103
2+3.4. Sleep and Wakeup 108
'2.3.5. Semaphores 110
2.3.6. Mutexes 1 13
2.3.7. Monitors I 2 5
2.3.8. Message Passing 119
2.3.9. Barriers 123
2+4. CLASSICAL I K PROBLEMS 224
2.4. l. Thc Dining PhiIosophers Problem 125
2.4,2. The Readers and Wrikrs Pmblen~ 128
2.4.3. The Sleeping Barber Problem 129
2.5. SCHEDULING 132
2.5.1. Introduction to Schcdul t ng 1 32
2.5.2. Scheduling in Batch Systems 138
2.5.3. Scheduling in Interactive Systems 142
2.5.4. Scheduling in Real-Time Systems 148
2.5.5. Policy versus Mechanism 149
2.5.6. Thread Scheduling lSO
2.6. RESEARCH O N PROCESSES AND THREADS 15 1

2.7, SUMMARY 152

3 DEADLOCKS

3.1. RESOURCES 160


3.1 .1. Preemptable and Nonpreemptable Resources 160
3+1.2. Resource Acquisition 16 1

3.2. INTRODUCTION TO DEADLOCKS 163


3+2.1.Conditionsfur Deadlock 164
3.2.2. Deadlock Modeling 164

3 THE OSTRICH ALGORITHM 167

3 DEADLOCK DETECTION AND RECOVERY 168


3.4.1 Deadlock Detection with One Resource of Each Type 168
+

3.4.2. Deadlock Detection with Multiple Resource of Each Type 171


3.4.3- Recovery from Deadlock 173

5 DEADLOCK AVOIDANCE 175


3.5.1 Resource Trajectories 1 75
+

3.5.2. Safe and Unsafe States 176


3.5+3.The Banker's Algorithm for a Single Resource 17%
3.5+4. The Banker's Algorithm fnr Multiple Resources 179

3 . DEADLOCK PREVENTTON 180


3.6.1. Attacking the Mutual Exclusion Condition 1 XO
3.6.2. Attacking the Hold and Wait Condjtim 1 8 1
3.6.3. Attacking the Nu Preemption Condition 1182
3.6.4, Attacking the Circular Wait Condition 182

3.7. OTHER ISSUES 183


3.7.1, Two-Phase Locking 183
3.7.2- Nonn.source Deadlocks 1 84
3.7+3. Starvation 184

3.8. RESEARCH ON DEADLOCKS 1 85

3.9. SUMMARY 185


CONTENTS

4 MEMORY MANAGEMENT

4.1. BASIC MEMORY MANAGEMENT 190


4.1.1. Monoprogramming withnut Swapping or Paging 190
4.1.2. Multiprogramming with Fixcd Partitions I 9I
4.1.3. Modeling Multipmgraming 192
4.1.4. Analysis of Multiprogramming System Performance 194
4.1 - 5 . Relocation and Protectinn 194

4.2. SWAPPING 196


4.2,L . Mernrsry Management with Bitmaps 199
4.2.2. Mernrxy Management with Linked Lists 200

4 VIRTUAL MEMORY 202


4.3.1. Paging 202
4.3.2. Page Tables 205
4.3.3. TLBs-Translation Lookaside Buffers 2 1 l
4.3.4. hverteb Page Tables 2 1 3

4.4. PAGE REPLACEMENT ALGORITHMS 2 14


4.4.1. The Optimal Page Replacement Algorithm 21 5
4.4.2. The Not Recently Used Pagc Replacement Algorithm 21h
4.4.3. The First-In, First-Out 2 17
4.4.4. The Second Chance Page Replacement Algorithm 2 1 7
4.4.5. The Clock Page Replacement Algorithm 21 8
4.4.6. The Least Recently Used 21%
4.4.7. Simulating LRU in Software 220
4.4.8. The Working Set Pagc Replacement Algorithm 222
4.4.9. The W SClock Page Replacement Algorithm 225
4.4.:. Summary of Page Replacement Algorithms 227

4.5. MODELING PAGE REPLACEMENT ALGORITHMS 228


45.1. Belady's Anomaly 229
4.5.2. Smck Algorithms 229
4.5.3. The Distance String 232
4.5.4. Predicting Page Fault Rates 233

4 DESIGN ISSUES FOR PAGING SYSTEMS 2.34


4.6.4. Local versus Global Allocation Policies 234
4.6.2. Load Control 236
4 - 6 3 Page Size 237
4.6.4. Separate Instruction and Data Spaces 239
CONTENTS

4.6.5. Shared Pages 239


4.6.6. Cleaning Policy 241
4.6.7. V inual Memory Interface 24 Z

4.7. IMPLEMENTATION ISSUES 242


4.7.1. Operating System Invdvcrnent with Paging 242
4.7.2. Page Fault Handling 243
4.7.3. Instruutir~nBackup 244
4.7.3.Locking Pages in Memory 246
4.7.5. Backing Store 246
4.7,h. Separation of Policy and Mechanism 247

43. SEGMENTATION 249


4.8.1. Implementation of Pure Segmentation 253
4.8.2. Segmentation with Paging: MULTICS 254
4 - 3 3 . Segmentation with Paging: The Intel Pentiurn 257

4.9. RESEARCH ON MEMORY MANAGEMENT 262

4.10, SUMMARY 262

5.1. PRINCIPLES OF 1/0 HARDWARE 269


5.1.1. IKI Devices .270
5.1.2. Device Controllers 27 1
5 1 . 3 . Memory-Mapped I/O 272
5.1 +4. Direct Memory Access 276
5.1.5. Interrupts Revisited 279

5.2. PRINCIPLES OF I O SOFTWARE 282


5.2. I , Goals of the VO Software 283
5.2+2. Programmed 110 284
5.2.3- Interrupt-Driven VO 236
5.2.4. 1Kl Using DMA 287

5.3. 1/O SOFTWARE LAYERS 287


5.3.1. Interrupt Handlers 287
5.3.2. Device Drivers 289
CONTENTS

5.3.3. Device-Independent I/O Software 292


5.3.4. User-Space V 0 Software 2.98

5.4. DISKS 300


5.4.1 . Disk Hardware 300
5.4.2. Disk Formatting 315
5.4.3. Disk Arm Scheduling Algorithms 3 18
5.4.4. E m r Handling 322
5.4.5- Stable Storage 324

5.5. CLOCKS 327


5.5.1. Clock Hardware 328
5.5.2. C l w k Software 329
5.5.3. Soft Timers 332

5,A. CHARACTER-ORIENTED TERMINALS 333


5.6.1.RS-232 Terrrtinal Hardware 334
5.6+2. Input Software 336
5.6.3. Output Software 341

5.7. GRAPHICAL USERINTERFACES 342


5.7.1. Personal Computer Key board. Mouse, and Display Hardware 343
5.7.2. Input Software 347
5.7.3. Output Software for Windows 347

5.8, NETWORK TEKMINALS 355


5.8.1. The X Window 'System 356
5.8.2, The SLIM Network Terminal 360

5.9, POWER MANAGEMENT 363


5.9.1. Hardware Issues 364
5.9.2. Operating System Issues 365
5.9.3. Degraded Operation 370

5.1 0. RESEARCH O N TNPUTKIUTPUT 37 1

5.11. SUMMARY 372


XlV CONTENTS

6 FlLE SYSTEMS

6.1. FILES 380


6.1.2. Fik Naming 380
6 , I . 2 , File Structure 382
6. I 3.File Types 383
6. I +4+FiIe Access 385
6.1.5, File Attributes 386
6.1 +6. File Operations 387
6.1.7. An Example Program Using File System Calls 389
6.1.8. Memory-Mapped Files 39 1

6.2. DIRECTORIES 383


6+2.I . Single-Level Directory Systems 393
6,2.2. Two-Icvd Directory Systems 394
6,2.3. Hierarchical Directory Systems 395
6.2.4, Path Names 395
6.2.5. Directory Operations 398

6.3, FILE SYSTEM IMPLEMENTATION 399


6.3. I. File System Layout 399
6.3.2. Implementing Files 400
6.3.3, Implementing Directories 405
6.3.4. Shared Files 408
6+3.5.Disk Space Management 410
6.3.6. File System Reliability 4 1 6
6+3.7.File System Performance 424
6.3.8. Log-Structured File Systems 428

4 . EXAMPLE FILE SYSTEMS 430


6.4.1. CD-ROM File Systems 430
6.4.2. The C P M File System 435
6.4.3. The MS-DOS File System 438
6.4.4. The Windows 98 File System 442
6+4.5+T h e UNIX V7 File System 4-45

6.5. RESEARCH ON FILE SYSTEMS 448

6 . SUMMARY 448
CONTENTS

MULTIMEDIA OPERATING SYSTEMS

TNTRODUCTiON TO MULTIMEDIA 454

MULTIMEDIA FILES 458


7+2.1. Audio Encoding 459
7.2.2, Video Encoding 461

VIDEO COMPRESSION 463


7.3.1. The JPEG Standard 464
7.3.2, The MPEG Standard 467

MULTIMEDIA PROCESS SCHEDULING 4-69


7.4. I Scheduling Homugeneous Processes 469
+

7.4-2. General Real-Time Scheduling 470


7.4.3. Rate Monotonic Scheduling 472
7.4.4. Earliest DeadIine First Scheduling 473

MULTIMEDIA FILE SYSTEM PARADIGMS 475


7.5.1. VCR Unntrd Functions 476
7.5.2. Near Vidm on Demand 478
7.5.3. Near Video on Demand with VCR Functions 479

FILE PLACEMENT 48 1
7.6.I . Placing a File on a Single Disk 48 I
7.6.2. Two Alternative File Organization Strategies 482
7.6.3. Placing Files for Near Video on Demand 486
7.6.4. Placing Multiple Fiks on a Single Disk 487
7.6.5. Placing Files o n Multiple Disks 490

CACHING 492
7.7.1. Block Caching 492
7.7.2. File Caching 494

DISK SCHEDULING FOR MULTIMEDIA 494


7.8.1. Static Disk Scheduling 495
7.8.2, Dynamic Disk Scheduling 496

RESEARCH ON MULTIMEDIA 498

SUMMARY 499
8 MULTIPLE PROCESSOR SYSTEMS

8.1. MULTIPROCESSORS S M
8.1.1. Mu1t iprocessor Hardware 506
8.1.2. Multiprncessor Operating System Types 5 13
8.1.3. Multiprocessor Synchronization 5 1 6
8.1.4. Multiprocessor Scheduling 52 1

8:2. MULTICOMPUTERS 526


8.2.1. Multicomputer Hardwarc 527
8.2.2. Low-Level Communication Software- 53 1
8,2.3, User-Level Communication Software ,534
8.2.4. Remote Procedure Call 537
8.2+S. Disbibuted Shared Memory 540
8.2.6. Mutticornputer Scheduling 544
8.2.7- Load Balancing 545

8.3, DISTRIBUTED SYSTEMS 549


8.3.1. Network Hardware 55 1
8.3.2- Network Services and Protocols 553
8.3.3. hcurnent-Based Middleware 558
8,3-4. File System-Based Middleware 559
8.3.5. Shared Object-Based Middleware 565
8.3.6. Coordination-Based Middleware 572

8.4. RESEARCH O N MULTIPLE PROCESSOR SYSTEMS 577

5 SUMMARY 577

9 SECURITY

9.1. THE SECURITY ENVIRONMENT 584


9.1.1. Threats 584
9.1 2. Intruders 585
9.1.3. Accidental Data Loss ,5556

9.2. BASKS OF CRYPTOGRAPHY 587


9.2.1. Secret-Key Cryptography 588
9.2.2. Public-Key Cryptography 588
CONTENTS
9.2.3. One- Way Functions 589
9.2.4. Digital Signatures 590

9.3. USER AUTHENTICATION 591


V .3.2. Authentication Using Passwords 592
9.3.2, Authentication Using a Physical Object HI1
9.3.3. Authentication Using Biornetrics 603
8.3.4. Countermeasures 606

9.4. ATTACKS FROM INSIDE THE SYSTEM 606


9.4.1. Trojan Horses 607
9.4.2. Login Spoofing 608
9.4.3. Logic Bombs 609
9.4.4. Trap Doors 610
9.4.5. Buffer Overflow 41 0
9.4.6, Generic Security Attacks 6 13
9.4.7. Famous Security Flaws 614
9.4.8. Design Principles for Security 616

9.5. ATTACKS FROM OUTSIDE THE SYSTEM 617


9.5.1. Virus Damage Scenarios 618
9.5.2. Hnw Vimses Work 619
9.5.3.How Viruses Spread 626
9.5.4. Antivirus and Anti-Antivirus Techniques 628
9+5.5. The Inlernet Worm 635
9.5.6. Mobilt: Code 637
9.5.7. Java Security 642

9.6, PROTECTION MECHANISMS 645


9.6.1 Protection Domains 645
+

9+6.2.Access Control Lists 647


9.6.3.Capabilities 650
9.7. TRUSTED SYSTEMS 653
9.7.1. Trusted Computing Base 654
9.7.2. Formal Models of Secure Systems 655
9.7.3. Multilevel Security 657
9.7.4. Orange Book Security 659
9.7.5. C o k t Channels 66 1

9.8, RESEARCH ON SECURITY 665

9.9. SUMMARY 666


10 CASE STUDY 1: UNlX AND LtNUX
lo.!. HISTORY OFUNIX 672
10. I . 1 . UNTCS 672
10.1.2. PDP-I 1 UNIX 673
10.1+3. Portable UNIX 674
10,1.4. Bcrkdey UNIX 875
10-1.5. Standard UNlX 676
10.1.6. MINIX 677
I . . . i n 678

10.2. OVERVIEW OF UNIX 681


10.2.1. UNIX Goals 68 I
10+2.2+Interfaces to UNlX 682
10.2+3.me UNIX Shell 683
1 O.2.4. UNIX Utility Programs 686
10.2.5. Kernel Structure 687

10.3. PROCESSES IN UNIX 690


10.3.1. Fundamental Concepts 690
10.3.2. Pmce;ss Management System Calls in UNIX 692
10.3.3. Implementarim of Processes in UNlX 699
1O+3.4.Booting UNIX 708

10.4. MEMORY MANAGEMENT IN UNIX 7 10


10.4.1. Fundamental Concepts 7 1 1
lO.4.2. Memory Management System Calls in UNlX 7 14
10.4.3. Implementation of Memory Management in U N l X 715

10.5. INPUTKWTPUT IN UNIX 723


10.5.1 Fundamental Concepts 724
+

10.5.2. Input!Uutput System Calls in UNlX 726


10.5.3. Implementation of lnput/Output in UNIX 727
t0.5.4, Streams 730

10.6, THE UNIX FILE SYSTEM 732


10.6.1. Fundamental Concepts 732
10.6.2. File System Calk in UNIX 736
10.6.3. Implementation of the UNIX File System 740
10.6.4. NFS: The Network File System 747
CONTENTS XIX

10.7. SECURITY 1N UNIX 753


10.7.1 Fundamental Concepts 753
+

10.7.2. Security System Calls in UNIX 755


10.7.3. Implementation of Security in UNlX 756

20.8. SUMMARY 757

11 CASE STUDY 2: WINDOWS 2000


1 I 1. HISTORY OF WINDOWS 2000 763
+

1 1. I .I.. MS-DOS 763


1 1.1.2. Windows 95/98/Me 764
1 I. 1.3. Windows NT 765
11.1.4. Windows 2000 767

11.2+ PROGRAMMING WlNDOWS2000 771


1 1.2.1. The Win32 Application Programming lntcrface 772
11+2.2.The Registry 774

1 1.3. SYSTEM STRUCTURE 778 .


1 1.3.1 . Operating System Structure 778
11-32, Impternentation of Objects 787
1 1 -3.3. Environment Subsystems 792

I I ,4. PROCESSES AND THREADS IN WINDOWS 2000 796


I 1 -4.1. Fundamental Concepts 796
11.4.2. Job. Pmcess. Thread and Fiber Management API Calls 799
11.4.3. Implementatinn of Processes and Threads 802
1 1+4.4.MS-DOS Emulation 809
1 1.4.5+Booting Windows 2000 820

I I .5+ MEMORY MANAGEMENT 8 1: 1


I 1 3.1. Fundamental Concepts 8 1 2
1 1.5+2. Memory Management System Calls 8 1 6
1 1 +5.3.Implementation of Memory Management 8 17

I I .6. INPUTKWTPUT IN WINDOWS 2WO 824


1 1.6.1. Fundamental Concepts 824
I 1.6.2. Input,'Output APT Calls 825
11.6.3. Implementation of VO 827
t 1.6.4. Device Drivers 827
XX CONTENTS
1 1.7. THE WlNDOWS 2000 FILE SYSTEM 830
1 1.7.1. Fundamental Concepts 830
I I .7.2, File System API Calk in Windows 2000 83 I
1 1.7.3, Implementation of the Windows 2000 File System $33

1 1 - 8 . SEUUKITY IN WINDOWS 20Ml $44


I 1.8.1. Fundamental Concepts 845
1 1.8.2. Security APT Calls 847
I I -8.3. Impkrnentatim of Security X4X

11.8. CACHING TN WINDOWS 20Uo 849

12 OPERATING SYSTEM DESIGN


12.1. THE NATURE OF THE DESIGN PRORLEM 856
I 2 , l . l . Goals 856
12.1 +2.Whyjs i~Hard to Design an Operating Systems'! 857

12.2. INTERFACE DESIGN 859


1 2.2.1. Guiding Principles 859
12.2.2. Pmdigrns 86 1
12.2+3.The System Call Interface 864

12.3 IMPLEMENTATION 867


12.3, I . System Smcrure 867
12.3.2. Mechanism versus Policy 870
1 2.3.3. Orthogunality 87 1
12.3.4.Naming 872
i2.3.5. Binding Time 874
12.3.6. Static versus Dynamic Structures 875
1 2.3.7. Top-Down versus Bottom-Up Implementatinn 876
1 2.3.8. Useful Techniques 877

12.4. PERFORMANCE 882


12.4.1.Why are Operating Systems Slow? 882
12.4.2. What Should he Optirnizcd? 883
[ 2.4.3. Space-Time Trade-offs 884
12.4.4. Caching 887
CONTENTS

12+4.5.Hints 888
12.4.6. Exploiting Locality 888
12.4.7. Optimize the Common Case 889

12.5. PROJECT M A N A G E M E N T 889


€23.1.The Mythical Man Month 890
12.5.2. T e r n Structure 891
12.5.3. The Role nf Experience 893
12.5.4. No Silver Bullet 894

12.6. TRENDS IN OPERATING SYSTEM DESIGN 894


12.6.1. Large Address Space Operating Systems 894
12+6.2.Networking 895
I2+6.3.PardTel and Distributed Systems 896
12.6.4. Multimedia 896
12.6.5+Battery-Powered Computers 896
12-66,Embedded Systems 897

12.7. SUMMARY 897

13 READING LET AND BlBLlOGRAPHY


13.I . SUGGESTIONS FOR FURTHER READING 90 I
1 3. I . I . Introduction and General Works 902
13+1.2. Processes and Threads 902
13.1.3. Deadlocks 903
1 3.1.4. Memory Management 903
1 3.1 +5.InputlOutput 903
13+1.6. File Systems 904
13.1.7. Multimedia Operating Systems 905
13.1.8.Multiple hncessur Systems 906
1 3.1.9. Securiry 907
13.[.10. UNTX and Linux 908
13.1+11 - Windows 2000 909
1 3.1.12. Design Principles 9 10

13.2 ALPHABETICAL BIBLIOGRAPHY Y 1 1


INTRODUCTION

A modern cumpurer system consists uf one or more processtws. some main


memory, disks, printers, a keyboard, a display, network interfaces, and other
inputhutput devices. All in all, a complex system. Writing programs that keep
track of - all these components and use them correctly, k t alone optinrally, is an
extremely difficuIt job. For this reason, computers are equippcd with a layer nf
software called the operating system, whose job is to manage all these devices
and provide user progriims with a simpler interface to the hardware. These sys-
tems are the subject of this book.
The placement of the operating system is shown in Fig. 1-1. A1 1he bottrm is
the hardware, which, in many cases, is itself composed of two or more leve5s (or
layers). The lowest level contains physical devices, consisting of integrated cir-
cuit chips, wires, power supplies, cathode ray tubes, and similar physical devices.
How these are constructed and how they work we the provinces r ~ fthe electri cai
engineer.
Ncxr comes the microarchitecture Cevel, in which the physical devices are
grouped together to farm functional units. Typically this level cnntains some reg-
isters internal to the CPU (Central m e s s i n g Unit) and a data path containing an
arithmetic logic unit. In each clock cycle, one nr two operands are fetched from
the registers and combined in the arithmetic logic unit (for example, by addition
or Bmlean AND). The result is stored in one or more regisws. On some
machines, the operation af the data path is controlled by software, calkd Ehc
microprogram. On other machines, it is contrnlled directly by hardware circuits.
Banking Airline Web Application programs
system resewation browser
7

Operating System

Machine language
i:
Microar~hitecture

I Physical devicss I1
Figure 1-1. A computer system consisls c~fhardware. system progrnms. and ap-
plication prtjgrams.

The purpose of the data path is to execute some sel af instructions. Some of
these can be carried out in one data path cycle; others may require multiple data
path cycles. These instructirms may use registers or lother hxdware facilities.
Togcther, the hardware and instructions visible to an assembly language program-
mer form the 1SA (instruction Set Architecture) level. This level i s often called
machine language.
The machine language typically has between 50 and 300 instructions, rnostly
for moving data around the machine, doing arithmetic. and comparing values. In
this level, the inputhutput devices are cnntrdled by loading values inm special
device regiswrs. For example, a disk can be comrnandcd to rcad by loading ihe
values of the disk address, main memory address, byte count, and dircctinn (read
or writc) into its registers. In practice, many mom parameters are necded. and rhc
status returned by the drive after an operatttim is highly cnmplcx. Furthennore. h r
many 1/0 (lnput./Output) devices, timing plays an i m p r t a n t role in the program-
ming.
To hide this complexity, an operating system is provided, It consists of' a
layer of software that (partially) hides the hardware and gives the programmer a
more collvenient set of instructions lo work with. For example. read block tram
file is conceptually simpler than having to worry about the details of moving disk
heads, waiting for them to settle dnwn, and sr, on.
On lop of the operating system is the resl o f the systcm suftware. Here w c
find the command interpreter (shell), window systems. compilers, editors, and
similar application-independem programs. It is important to realize that t h e e
programs arc definitely not part of the operating system, even though they LVC typ-
ically supplied by the computer manufacturer. This is a crucial, hut subtle. point.
The uperating system is (usually) that p m i o n of the software that runs in kernd
mode or supervisor mode. Ir is protected from user tampering by the hardware
(ignoring for the moment some older or low-md microprocessors that do nor have
hardware protection at all). Compilers and editors run in wer mode. If a user
does not like a pAcular compiler. he$ is free to write his own if ht: so chnflses:
he is not free to write his own clock interrupt handler, whlch is part of the operat-
ing system and is normally protected by hardware against attempts by users to
modify it.
This distinction, however, is sometimes blurred in embedded systems (which
may not have kernel mode) or interpeted systems (such as Java-based operating
systems that use interpretation, not hardware, to separate the compunents). Still,
for rradi~ionalcomputers. the operating system is what runs in kernel mode.
That said, in many systems there icre programs that run in user mode but
which help the operating system or perform privileged functions. For example,
there is often a program that allows users to change their passwords. This pra-
gram is not p a ~ lof the operating system and does not run in kernel mode, but it
clearly carries out a sensitive function and has to be protected in a special way.
In some systems, this idea is carried to an extreme form, and pieces of what is
traditionally considered to be the operating system (such as the file system) run in
user space. In such systems, it is difficult to draw a clear boundary. Everything
mnning in kernel mode is clearly part of the operating system, but some prograins
running outside it are uguably also part of it, or at ieast closely awxiated with it.
Finally, above the system programs come the application programs. These
programs are purchased or written by thc users to solve their particular problems.
such as word processing, spreadsheets, engineering calculations, or storing infor-
mation in a database.

1.1 WHAT IS AN OPERATING SYSTEM?

Most computer users have had some experience with w operating system, but
it is difficult to pin down precisely what an ~peratingsystem is. Part of the prob-
lem is hat operating systems,perform two basically unrelated functions, extending
the machine and managing resources, and depending on who is doing the talking,
you hem mostly a b u t one function or the other. Let us now look at both+

1.1.1 The Operating System as an Extended Machine

As mentioned earlier. the architect& (instruction set, memory organization,


U0,and bus structure) of most computers at the machine language level is primi-
tive and awkward to program, especially for inputloulput. To make this puint
more concrete. let us briefly look at how floppy disk I/O is done using the NEC
"lie" should be read as "he or she" h u g t t ~ uthe
t h k .
CHAP. 1

p ~ 7 t yccJmpatjble
j controller chips used on most htd-based personal computers.
(Throughout this book we will use the terms "floppy disk" and "diskefW" inter-
changeahl .) The PD7h5 has 1 0 commands. each specified by loading between 1
and 9 bytes into a device register. These cvinrnands are f~ reading and writing
data, Inoving [he disk arm,and formatting tracks, as well as itlitidi~ing.sensing,
resetting, and recalibrating the controller and rhc drives.
The must basic commands are read and write, each of which requires 13
parameters, packed info 9 bytes. These paramrrcrs specify such items as the
address of thc disk block to be read, the number nf sectors per track. the recording
mode uscd ntl the physical medium. the inrerseclor gap spacing, and what to do
s ~ i t hu deleted-data-riddress-mark. If you d i ~not understand this mumbo jumbo,
do nut worry; that is precisely the point--it is rather csoteriu. When the operation
is cumpktedl the contrdler chip returns 23 s r i m and crrur fields packed into 7
bytes. A s if this were not ennugh, the floppy disk programtner must also be con-
stantly aware of whether h e motor is on o r off, I f the rnotor is off. it must be
tumcd on (with a Img startup delay) behre daln can be read or wriaen. The
motor cannot be left on tm long, however. the floppy disk will wcar w ~ t .The
program~neris thus forced to deal with the tradc-nff between long startup delays
versus wearing out flnppy disks (and toshp the data nn them).
Without going int.0 the r d details. i t should be clear thal the average pro-
grammer probably does not want to get too intimately involved with the program-
ming of floppy disks (or hard disks, which arc just as complex and quite dif- .,
ferent)- Instead, what the programmer warm is a simple, high-level ahstrx~ionto
deal with. In the cast: of disks, a typical abstraction would he lhat ihe disk c m -
tains a collection of narned files. Each file can be opened for reeding or writing.
then read or written, and finally closed. Details such as whether or not rccordlng
should use modified frequency n ~ d u l a t i o nand what the current state of the miltor
is should not appear in the abstraction presented to the user.
The program that hides the truth about the hardware from the programmer and
presents a nice, simple view of named filcs that can bc read and written is, of
course. the operating system. Just as the operating system shields the prograrnnler
from thc disk hardware and presents a simple file-oriented interface, it also con-
ceals a lot of unpleasant business concerning interrupts, timers, mcrnory managc-
ment. and other low-level features. In each case, the abstracticm ufkrrd by the
operating system is simpler and easier to u s e than that offered hy ihc underlying
hardware.
In this view, the function of the operating system is to prcsznt ihc uscr with
the equivalent of an extended machine or virtual machine that is easicr ro pro-
gram than the underlying hardware. How the operating system achievcs this gcml
is a long story, which we will study in dctail throughnut this hook. To summarize -.
i t in a nutshell, the operating system provides a variety of services that programs
can obtain using special instructions called system calls. f i e wiIl examine solne
of the more common system calls later in this chaptcr.
csprci:illy it' it only n r c t i s a small h x ~ i w 1l t tor:ll. 0 1 ' c0u1.r~. this 17aisc.i
issues of i:iin:ess, prorec~ion,and so on. and i t is up lo rhe opcratinp system l o
w l v c thcrn. Anothcr resource that is space multiplertrd is the (hard) disk. 111
lniinv- svsiems
- a single disk can hold files fi-orn man!- users 31 he same time.
Allocating disk spnsc iillil kccping ~ r n c kof who i s using tvhich disk hlorh:, i s a
typical operating sy stcm rcsowce tnanagemcnt task.

Operacing systrrns h~ivi.been evulving thrwgh thc ycx-.,. [ n the f-dlowinz I_

sccrirjns w t *ill briefly ltmk 31. ;i few nf tlw highlighls. Since upuraliup s_rsrcnw
havc hisrarically been ~ l o s e l ytied t o h e archirec~urcn f thcl c~srnpule~-.;
o n which
~ h c vrlin, we will l u r k at s~isucssivegeneriitirms nf wmputers Lr, sce what thcir
nperaring systems were like. This mapping of operating systcm gencratiut~sto
unmputer. generations is crude, hut i t dws p r o v i d ~some mucluru where thw:
w w l d otherwise be none.
The first true digital computer was designed h; rhe Er~pJish 1n:lrhcmariuian
Charles Bahbage ( 1792-1 87 1). Although Babbage spent mosl of his lifc ard for-
+.
tune trying to build his ";ma!ytical engine. he never got i t wurking prtqxrly
hecnuse it was purely mechanical. and the technology or h i s day cnuld nor pro-
duce the required wheels. gears, and cogs to rlw hi$ prcuisim that h c nrcdcd.
+

Needless to say. the analylical engine did no1 have an operating system.
As nn interesting histr~rical aside. Babbagc rc;rlizcd that Ire wrmld need
software for his analyiical engine, so he hi]-cd a voung W ~ I I I ~ Unamed I Ado
Lovclacc. who was the daughter of rhc famed British ptac Lord Bymn. as th;:
world's firsr programmer. The programming hngoagc ~ d a ' ~is' n;lmed a h - her.

1.2.1 The First Generation (1945-55) Vacuum Tubes and Plugboards

After Babbage's unsuccessful efforts. l i t ~ l cprogress was made in ccmstruc~ing


digital computers until World War 11. Armncl thu n d - 194Os, Hou.ard Aikcn a[
Hanfard. John vm N ~ u r n a n nat the Instilute fnr Advanced Sludy in Priacetnn. 1.
Presper Eckert and William Mauchley ctl the University o f Pennsylvania. and
Konrad Zuse in Crerrnm\.*,among others, aII succeeded in building calculaiing
engjim. The firs1 ones c~sednrechnnical r c k y hut were vc1-y slow. with cycle
times measured in seconds. Relays were later replaced by vacuum tubes. These
machines were enormous; filling up entirc morns wid1 tcns of rhousands of-
vacuum tubes, but they were still millions nt'timrs slower than even the chcapc+l
persnrrnl cwnputcrs iwailuhlu today.
In these early days. a single group of' people designed, built, programmed,
operated, end maintained each machine. Ail pmgramming was done in absolute
machine language, ufien by wiring up plughourds I{) co~ltrolthe machine's hasic
funcis~chogramming litnguages were unknown (even nssrimhly language was
unkncrwYnj. Operatiug syswms were unheard of. Thc usual mtrdc of' c~pewtiunwas
for the programmer tu sign up fur a hlock uf time on rhc signup sheet on the wall,
then come d u w to the machine room. insert his ur her plugbuwd into the corn-
puter, and spend the noxt few hours hoping that none of'the 20.00CI or so vacuum
tubes wuuld burn out during the run. Virtually all thc problems were straightfor-
ward numcricsl calculations. such as grinding out tables of sines. cosines. and log-
arit hms.
By the early E W s , the rvulinc had iniprwecl snmewhat with tlle introduction
of pii~lchedc a d s . It was now possible tu write programs on c a d s and read them
in Instead of using plugboards; ntherwise, the procedure was the same.

1.2.2 The Second Generation (195545)Transistors and Batch Systems

The introduction of the transistor in the mid- 1950s changed the picture radi-
cally. Computers kcnme reliahk enough that they could be manufactured and
s d d ro paying customers with the expeatatiun thal they would cuntinue ta func-
tion long enough to gel some useful work: done, For the first time. [here was n
clear separation bet ween designers, builders. i p x a t w s . prtygrammers. and mai nte-
nance prsrmnel .
These machines, now called mainframes, were locked away in specially air
conditioned computer rooms, with staffs of professional npcratnrs to run them.
Only big cnrporations or major government agencies or ~rniversiricscould afford
the multimillion dollar price tag. To run a job i,i.e.. a program or sct of pro-
grams), a programmer would first write Lhs program o n papcr (in FORTRAN or
assembler). then punch it on cuds. He would then bring the card deck down to
the input room and hand i t to one uf the operators w d go drink coff'ce until thc
output was ready.
When the computer finishcd whatever job it was currently running, an opera-
tor would go over to the printer and tear off the output imd carry il over to the out-
put room, so that the progrdmrner could cdtect i t latcr. Then hc wouid rake one
of the card decks that had k e n broughl tiom the input room and read it in. If the
FORTRAN compiler was needed. the operator would have to pet it from a tile
cabinet and read it in. Much computer time was wasted while operators were
walking around the machine room.
Given the high cost of the equipment, i t i s noi surprising that people quickly
looked for ways to reduce the wasted timc. The solution generally adopted was
the batch system. The idea behind it was to collect a tray full of jobs in the input
room and then read them onto a magnetic tape using a small (relatively) inexpen-
sive computer. such as the ISM 1401, which was very good at reading cards.
copying tapes, and printing output. but not at all good ar numerical calculations.
CHAP. L
I

Other. much morc expensive machines, such ria the IRM 7W4, were u s e d ' h the
real computing. 'l'his situation i s shown in Fi 2 . 1 -2 .
&.

tor then loaded a special prnerarn (the ancestor ol. today's vperati~ig system).
L.

which read thc first job i'rom tape and ran i t . The c.}ulput was written onto a sec-
ond tape. i i ~ e a dof bekg printed. After each job finished, thc operiiting system
automatically read the next job from the ~ i p cmd bugan running i t . When the
whole batch was done. th operator rcmovrd rhc input and ourput tapes, replaced
the input tape with the next batch. and brought the o u l p u ~tape ro a 140 1 for pdtlr
ing o f fline (ix..not clsnrlected to the muin cninpiitcr).
The structure nf a lypical input job i s shcrwn in Fig. 1-3. It stal-~cclnut wilh ;I
$JOB card, specifying the rnaxirnum run timr irr minutes. the account nuiuber to
be charged. md the programmer's name. Then c;imc a $FOKI'KAN card. tellin2
the operating syslcm to l o x i the FOKI'KAIS cr~mpilerfroin the system tape. TI
was fnllowed by the prugram to be otrlpiled. and then a SLOAD card, directing
h e operating system ro load the nbjcct program just compiled. {Compiled pro-
grams were o f k n writterr o n scratch tapes and had to be loaded explicitly . l Next
cnmc the $ R U N card, iclling thc operaring system to run rhe progrum with ihz
data ftdluwing it. Finally, thc $END card n w k e d the cnd of the job. These prirn-
itive control c;irds were ~ h fi~rrrunners
c of mcdern joh contml languages anti c ~ n -
tnmd interpreters.
Largc second-genernt ion computers u-rrr used mostly h r s c i e n t i f i c ;rod
engineering calculittions, such as solving h e partial d i f t e r r n r i n l rquatiorle that
o f t e n r c c u r in physics and rnginecring. They ~ c r largely
c progl-anm~edin FOR-
TRAK and assembly Language. Typical operating sy stcms were PMS (the Fortran
M m i t o r System) and LBSYS, IBM's operating system for the 7094.
HISTORY OF OPERATING SYSTEMS

1.23 The Third Generation (1965-198U) LCs and Multiprogramming


By the early l960s, innst computer manufaciorers had two dislinot. und tohlly
incurnpatible, product lines. On the [me h:md thcre were thc word-oriented,
large-scale scientific computers, such us the 7094, which werr: used for numeriul
calculiiticms in scienw and engineering. On the other hand, there werc the
character-oriented, corntnercial computers. such as rhc 1401, which werc widcly
used for tape sorting and priming by hanks and insurance cnmpanics.
Doveloping and maintaining two compklely different product litlcs *:is i11n
expensive prnposition Tor the rnanufac~urern. I n addition. many t ~ ctjmpu w tcr
customers initially needed a small machine bul later outgrew it it11d wantcd a
biggcr machine that would run a l l their old prngratns. but fnstcr.
IBM attempted to s d v e both o f these pmblerrls a1 a singlc stroke by inlruduc-
ing the Systetd360. The 360 was a series of sof~warc-compatibIc{michines rang-
ing froin 14Wsizcd tn much Inore p o w e r h l ihan the 7094. The machines ciif-
fered m l y in pricc and pcrlorlnimce (maxirnun~tnelnory, processor speed, numhrr
of I/O devices permitted, and so forth). Sincc all thc tnxhines had thc rairtc
architecture and instmcriun set, programs written fm- one machinc c m l d run nn d l
the others. at least in tbcory. Furthermore. the 360 was dcsigncd TO handle horh
scientific (i.e., numericnl) and commerciiil computing. 'Thus n single family of
rnachirles could satisfy thc needs of all vuslomcrs. In suhsequen~years, IBM has
come nu1 with compatible successors to thc 760 line. using lnorc modem icclmd-
r q y , knuwn as the 370: 4300. 30H0, and 30911 series.
~l~~360 was firs, jnoinl- cornpurzr line tcr llse ( s l n d l - ~ d Inregralcd
~ ) ('ir-
.I its ( 1 ~ ~thus
; ~ providing
. a majtv prjcdprrfornlance advnnra_ee over the ~ c o n d -
.enrralion rnachincs.
rn
were hililt up ftnm inilividuid lritnsi~tors. [ I w'as an
imlncdiatc r;uccess. and the i&a {)I'a filrnily of c o m f l d i b ~ tct>lnpllhYs was Wfln
adopted hy all [he other ml?jo1-m w n f x t u r s r s The descendants of thcsc machines
are still in use at colnputcr ccntrrs today. Nowadays they are oflen used
nlanagiog h u g &t&;~st.s (e-g,. far airline reservation syslcm.;) or aS Servers for 1,

World Wide W& sites fiat rtiust przxcss fhnusnnds d ~xquestsper sewnd.
*The frc;itest strength of the "one !am il y idea was si mu lIaneoos!y i tl; prea1r:st
"

u ' E ~ ~ ~ c 'rhe
s \ . irlten~iollwiis that all sof~ware.including the upersting syst.e~n,
iIS13150 had tr:, work im all mndels. I t had 1 1 ) rurl o n m a l l .iysllrn~s,which d t c n
just rcpl;iccd I 4 U l s fr)r cnpying i;;irds tiiprs, imd 011 very Iui-gr syhterns, whi;h
d t r n rcplirccd 7093s I'm-doing weather fnrcrrastin~and othtx heavy crmputing. It
had to bc good on s y s l ~ r n swith few peripherals and im systems with marly peri-
phcrals. la had iu wurk in c~mrnerci;tlcnv irw\rnents and in scienii fic environ-
mcnts. Above all, il hiid to be efficient for ill1 of these different uses.
Thcrc was nu way that IBM ( o r a ~ i y b r d yclsc) could write a p i c w ul' srlftw;rrr:
to meet a11 ~hc>secmflicting rcquiremen~s. The rcsulr was an rnorluous and
extraordinarily ~ u m p l c xoperating system. PI-ohahly\we) to thi-uc mders [>frnagni-
tude larger than FMS. It cnnsisted of milli{,~lsof' lines uf as~srnlsl y 1anguag.c writ-
ten by thousands of pnp-iimrncrs. and c:clnr~iincd h m w n c l s upm rhoul;ands of
bugs, which necessitated a ct~ntinuousstream nf new relenses in an alteirlpt tr,
correct them. new release fixed s o m bugs and introhced new m c s . so the
number elf bugs probald:: remained c o n s m : TI l i ~ n c .
One of the designers of OS13hU. Frcd B~-oc>ks. s~ihsequrntlywrote a witty ant1
incisive bouk (Brooks, 1996) describing his expcrir.nces with OS/3hO. While it
would be irnpussiblc to .surnmarizc. the. hunk b e r e slrfficc i t iu say thal the cover
shows a herd of prehistoric heasts stuck in n [al- oil. l'lle cover of Silherschatz cr
81. i2000) makes a sirnilm point about operotin: systems k i n e tlinosaurs.
L

I)cspite its enonnuus size and problems. OS/3hO 3rd the s i ~ ~ i i l a third- r
generation operating systems pn~duccdby i~thurcomputer ~ n a t ~ l ~ l h c u l ~;~clually -ers
sari sfied most ol' their customers reasonah1y well. Thcy also popularized severaal
kcy techniques ilhsrnt in second-ge~lcl-ariol~ operating systcnli. Probably the 1arbsi
impoflant of these was nlultiprogramrnisg. 0 1 1 the 7094. whcn thc current job
paused to wait for a tapc or other 1/0 opcr;iiicm ro complete. the CPlj simply
idle until the 1K3 finished. With heavily CYU-hound sciznt ific calculations, [/O is
infrequem. so this wabted rims is nor significant. With commrl-cia1 data pt%occss-
ing, the I/O wait time can o k n hu XO or YO pel-ccnt ihc tol:il t i ~ n e .sn svmclhirlg
had to he donc to avoid having the (expensive :I ('PU be idle so inuch.
The solution that evolved was io partition mcr-oory into hevcral picces. with a
differen1 job in cach partilion. ils shown in Fig. 1-3. While one job was w*.aib,lg
for 110 lo c ~ m p l e t e ,another job could be mirrg thc CPU. If cnoughjclhs ct~uldbe
held in main memory at unre. the CPU could be kcpr busy nearly 100 percellt of
the rime. Having multiple jobs safely in inemory at once requires special
harduw-e lo protect eachjob against snooping and mischief hy the other imes, b u ~
h e 360 and other third-generation systems were equipped with !his hardware.

' x ' Memory


___---
Job I ,.I partitions

Figure 1-4. A rnuIliprogramnlit~gsy5tcr-n naith thrw ,jdxin meinuq.

Another major feature present in third-generation operating systcms was the


ability tu read jobs from c a d s onto the disk as soon as they wcrt brought to the
computer room. Then. whcncver a running job finished, the operating s y s t e ~
could load a new job from the disk into the now-empty partition and run it. This
technique i s called sprding (from Sirnuttaneous Peripheral Operatim 0 1 1 I h e )
and was also used for t,utput. With sprmling, the 1401s were no Iongcr needed,
and much carrying of tapes disappeared.
Although third-generation operating sysrems were well si~i1t.dfur big sciun-
tific calculations and ~nassivecnmmercial data prnrressing runs, they wcrc sttll
basically batch systems. Many prugrarnmers pined for the first-generatim days
when hey had the machine all ta themselves fnr a few hours. sn they c [ ~ d ddebug
their programs quickly. With third-generation systems. rhc time betweci~submit-
ting a job and getting bnck the output was aften severat hours. so a single mis-
placed comma could cause a compilation to I'ail, and rhc prvgrarnincr t~ wsstc
half a day.
This desire for quick response time paved the way for timesharing. a wriant
of multiprogramming. in which each user has an online ternliiial. In a limeshoring
system, if 20 users are logged in and 17 of them are thiaking or talking oi. drinking
coffee, the CPU can be allocated in turn to the three jobs that want service. Since
people debugging programs usually issuc shnrt cominunds (r.g.. compilc a five-
page procedure+) rather than long ones (e.g., sart a milliun-record file), the com-
puter can provide fast. interactive service to a number of users and perhaps also
work on big batch jobs in the background when the CPU is otherwise idle. The
first serious timesharing system. CTSS (Compatible Time Sharing System), wils
developed at M.I.T. on it specially modified 7094 (Curbatd ~t 81.. 1962). How-
ever, rimesharing did not really becume popular until the necessary prchrctio~~
hardware became widespread during the third generation.
?Wewill use the terms "prmdure." "submutine." and "functicm" in~crchanpeablyin this book.
~f~~~ itle success 01. the CTSS systrtn. hlIT. Bell Labs. i ~ n dGXXJ'A Electric
( thcl, a rnaior c i . j ~ l ~ p m;mul'ac~u~.er)
l~t~f dccici~dto emhark on the develWlment of
.'compulc u t i l i t y ,*' a mactjjne illat. would supporl h ~ ~ n d r e dof s ~il~dtanc~~us
rimcsharing usrrs, Thejia l~lodelwils thc clectrici tv rlistributitm s~stem--whefl You
need cicctric powcr, you just srirk a plug ill lhe wall, and within reason. as much
powcl- you need will there. 'The designers of this system. known as M W -
TICS { Mt!LTipkxed Infurniltion and Ctmputing Service ). envisioned one
l ~ ln;rchine
~ ~ providing
~ e computing power for evcryonc in !he Boston area. The
ide;, i.h;li 111arhit1t.sFar more powerfid that) ~ h c i CE-h45 r mainframe would he sdd
ft3r ;i t t i t ) ~ l s i ~1101lii1~
~~d hy the rrlillions only 70 years latcr nnisspure science t'i~tirm.
Sort nf Iik c the idea of' supcrwnlc t r i l n ~ - : tlim l tic u ~ d c w arains n o w .
WC!I,TlC:S was a rniued success. it was d e s i y x d i r ~supyorl Izu~~dreds rjl. uscrs
ULI ii rn;whinl: unly slighily more powerful that] i111 I n t e l 3Xh-ba1.;d PC, althrlugh i i
had much mtm I10 capacity. This is t l o t quilc ;is crazy as i i suunds, since people
knew how tn write small, efficient progratmr; in those days, J skill that has subhe-
y~zentlybeen lost. Thcrc were many reawns [hiit ML1L4'l-ICSdid no1 t : i k over lhu
world, not the least of which is that i t was \sv-itren in PLA. imd thc PL./J compiler
Wiis years Iate and barely worked at a11 &hen it finally i~rrivcd. In addiriou, ML'L-
TICS was enormously ilii~biliulrst'or its ti~nc,n-ruch like Charles Rabhage's analyr-
ical engine in the nincrecnth century.
To make a long story short. MULTICS j11iroctu~:cdmany seminal itleas into !he
computer literature. bul turning it inlo a scrims product and a ~nqjorcutnrnerciai
success was a lot harder than ai~yonchad iqccred. Bell Lnhs drnppzd our of ihc
project, and Gcneral Electric quit the cnrrrpulcr htrsincs:. altogelher. Howexrrr.
M.I.1'. persisted and eventually gut MLILTICS wnrking. 11 was ul~imalelysdct as a
con~mercial product by the co~npany thar houeht GE's computer business
L.

Wmeywell) and installed by about 80 mijw unmpanies and cu~iversifir.~ world-


wide. While their numbcrs vcre small. MI:] .TTC'S users wrrc ricrccly loyal. Ckn-
era1 Motors; Furd, and thc U . S . National Security Agency. tlw example. o n l y shut
down their MUI.TICS systems in thc latc 1 , 30 yearms;tftcr3 h.IL!L.TICS \*-as
re Itasell.
For ihe moment. the concept of a cornpulcr urility has fizzled CIHI hut i t may
well come back in the form of m a s s i v e cerltrali7ed lrrlemet servers to which rela-
tivel y dumb user inachiiles are attached. with most of the work happening o n the
big servers. The morivatiun here i s likely to he that niost peoplc do not wiint ro
administrate an increasingly complex i ~ n dfinicky computer s y stmm and wnultl
prefer to have that work done by a team of professionals woi-king for the company
runnins thc server. E-commerce is already evolving i n this dircction, with various
companies running e-nialls on rnultiprocrssor servcrs lo which sinlple client
machines ctmnect. very much i n the spirit or the MULTICS druign.
Lkspitc i t s lack al' cominercial success. MIILTTCS hod huge influence
subscqucnt npemting systems. It is described ill (Corbatti et al.,1972: Corb~tdand
Vyssotsky. 1965; Daloy irnd Dennis. 1968; Organick. 1972; and Snltzer- 1974). Lt
SEC. 1 . 2 r r rsnmu OF OPERATING SYSTEMS 13
also has a still-active Web site, ~ ~ ~ . n r u l r i c ~ i u rwith
~s~~ a ~great
t - g .deal of informa-
tion about the system. its designers, and its users.
Another major developmen1 during t h c third ge~lerationwas the phenomenal
minicomputers, starting with t h DEC ~ PDP- 1 in 6 The PDP- 1 had
orl]y4K of 18-bit words, but st 5 I ZO,(KW per machine (less than 5 pcrcent of the
price ot' ;I 7094). it sold like hotcakes. For certain kinds of nonnurnerjcal work. ir
wn:. al~~li,sc as fast as the 7094 and gave birth to a whole new indusfry. It was
followed by a series of other PDPs (unlike IBM's family, dl incompati-
b k ) ~ u l m i i ~ a t i ning the PDP- L 1 .
OIICof the computer scientists 31 Bell Labs w h r ~had wcrked on the MULTICS
pruJcc.t. Ken Thumpson. slrhsequently found s small PDP-7 mi niuntnputer that n o
unc was using and sct out to write a stripped-down, one-user version of MULTICS.
This work later developed into the UNIX" operating system, which became gopu-
lar in the academic world, with governmen1 agencies, and with many companies.
The hjstnry uf UNIX has been told efsuwhers (e.g+,Salus, 1994). Part of that
story will be gjveu in Chap. lU. For now, suftke it to say. that becmsc the suurce
code was widely available, various organizations developed their own (incornpati-
ble) versions, which led to chaos. Two major versions developed, System V .
from AT&T, and BSD, (Berkeley Softwatt Distribution) horn the 'University o f
California at Berkeley. These had minor variants as well. To make i t pclssibie to
write pmgrams that could run an any LJNIX system. IEEE developed a stmrlxd
for UNIX. called POSIX. that must versions of [JMX now support. POSIX defines
a minimal system call interfxe that conformanr LJNIX sysrelns must support. In
fact, some other upcrating systems now also support the POSIX interface.
As an aside, it is worth mentioning that in 1987, the author released a small
clone of UNIX. called MINIX, for educational purposes. Functionally. W N I X is
very similar to UNIX, including POSIX suppnrt. A book describing its jn~ernal
operation and listing the source ctdc in an appendix is also available (l'anenhaum
and W d h u l l , 1997). MINIX is a v a h b l c for free {including all the snurce codc)
over the Internet at UKL w w l . r ~ . c ~ . v r r . r r ~ - ~ s ~ ~ ~ ~ i ~ i . ~ . ~ ~ ~ r n I .
The desire for a free production (as opposed to educational) version of MINlX
led a Finnish student, Linus Twvalds. to write Linux. This system was developed
on MLNlX and originally supported various MINIX features (e.8.. the MINIX file
system). It has since k e n extended in many ways hul still retains a 1n1.g~ amount
of underlying structure common to MINIX. and €0 UNlX (upon which h e formcr
was based). Most o f what will be said ahout UNIX in this hook thus applies to
Syste~nV , BSD. MINIX, Linux, and other versions a12dclones of L J M X as well.

1.2.4 The Fourth Generation (19t3tLPresent) Personal Computers

With &hedevelopment of LSL (Large Scale Integration) circujts. chips contoin-


ing thousands of transistors on a square centimeter of silicon. the age of the per-
soiial computer dawned. In terms of architecrure, personal co~nputers(initially
~ n i r r a c u l ~ ~ p u t en \r. sp ~~~ no[
: 311 that dilfi.l-L:n~I rniniclunputcrs ~ 1 [he
'
p ~ p1 -I class. hut in of yricc they cer-tainJy w e r e diffcrc-nt. W h ~ tc h mini- ~
computer ~nadei t p c l s s i l k for a dcpartmcnt in a cornpony or nnivrrsily to haW it:+
colnpulrr, the microprocessor chip made i t possible for L; single individual ! U
h;~vehis ur her own personal computer.
In 1974, whcn Intel came nut with thc XOXO, thc E i n t general-purpcw 8-bit
C P U , ir wantcd an operating system fur r t ~ eXOXO. in par1 to be nblc lo rest i t . Intel
asked one of its consultants. Gary Kildall. to write one. Kildall and n friend first
huil t a urmtrollcr for the rlewly-ruluascd S hugxr Asscwiiltes #-inchflctppy disk and
hooked thc lloppy disk up tr, the 8080, t h u s psrducing thc first micn~computcr
with a disk, Kildall then wrote a disk-hased operating systcrn called CP/M (Con-
trol Program firr Microcomputers } i b r i t . Since Intcl did lint think thal disk-
bused rnicroc.omputers tiad rnuch nf ;i firturu. whcn Kildall iidic-6 fur t h c righis ru
CP/M. Intel granled his rtxpesl. KilrhI l h c n fcm~ied ;1 cr,inp;my. Digitdl
Rcsear~h,10 further develop and sell CP/M.
111 1977. Digital Rcscnrch rewrote CP/M til rmkc it suitiiblc fi.c,r ru~.rninz o n thu
many rnicrr~uonqwtersusing the X080, Zilog ZM), anil r,tiicr8 C P U chips. hiiktjv
application programs w m r ewritten tcr run o n CP/M, alliwing ir to uirmplctcly
doinin;ite the w d d of microcomputing for. iihrlul 5 ysars.
1n t t ~ eearly 1980s. [ BM dcsigncd ~ h IHM c PC at-IJ l w k e c l ;lrourd for wftwaw
tu run on i t . People frnw IBM contacted Bill Gates [I, liCunsc his BASK' inter-
preter. They also asked him if he knew of an operating system to run un rhc PC.
Gales suggested that IBM contncl Digital Rcsc;irch. {hen the w d d ' s dr~minant
upcrating systems company. Making what was surcly thc wwst husinc~sdccisicm
in recorded histc~ry. Kildall rcfused ro mrcr with IBM. scndirig a suhni-dinar.e
instcad. To make matters w m e , his la^ yzr even refused lo sign 1RM's 11ondi.i-
closure agreement cove^-ing the not-yet-unncwnccci PC. Cr:mscyurntly. I R M w e n t
back $0Gates asking if hc could prtwidc them wirh iin opcmring sys~cm.
When IBM came beck. Gates realized that a 1oc:rl computer ~nanufilcturcr,
Seattle Cornputer Prtducrs. had a suitable operat ing sy sle m. DOS t Disk Opcrat-
ing System). He approached rhcm and asked t o buy it (allcgcdly 1 i - r Pd50.000).
which they readily accepted. Gates then offered IBM a D I W B A S I C (xrckngc.
which IRM accepted. I3M wanred certain modifications. so Gates hired the per-
son who wrote DOS. Tim Palriwm. as an cinployec of [iatos' tlcdgling company,
Microsoft, to make them. The revised sy.;tc.ln wils tnwilrnedMS-IIOS (MicroSoft
Disk Operating System 1 and quickly came to dominate the IUM PC ~nitrlcl. .A
key factor hcrr was Gaics' (in retrospect, cxtrernzlv wise) decision 10 h e l l MS-
DOS 111 computer companies fbi- buildling with thcir haniwnrc. compared lo
Kilrlall's altrnlpc t c ~sell ('P/hl to end users one ar a rime (ai h i s t il~itinlly).
By thc tinw the IBM P U A T r a m our i n l983 with ~ h cIntcl KO2NG CPCS.
MS-DOS was firmly cntrcnched and ('WM \+-aso n its last lugs. MS-DOS wah latur
widcly used cln the XOZXt, and 80486. Althuuyh ~ h initialL
c versic>n<-$MS-DOS wits
fairly pri mitivc, suhsequcnl versions included more advanced leaiures, including
Illany taken fmln U N ~ X . (Microsc~ftwas well aware of IJKIX, evert selfin$ 3
micmcomputcr version of it called XENIX Juritig rhc ~ ~ m p a n yearly ' s yews.)
clp/p,,q. MS-DOS, ottlcr operating systcnls for early lnjcroco~npulcrswere
has,-d un users typing in commands from the keyboard. That eventually
cha,lged due to rese.uch done by Doug Engelbar? at S~anfordResearch lnstitutc in
the 1960s. Engelbatt invented the GUI (Graphical User lnterlnre). ~ronounced
.'gnory.'* complete with windows, icons. mcnus, and mouse. These ideas were
by researchers iit Xerox PARC and incorpurdted into machines they built.
One day, Steve Jobs, who cn-invented the Apple cotnputcr in his garage.
visited PARC. saw ij WE,and instm-~tlyrealized its potential value, smnething
Xernx management h n ~ o u s l ydid nnt (Smith and Alexander, 1988). Jobs then
embdrked on building an Apple with a GLJL This project lcd to the Lisa, which
was t o n expensive and failed cr~mrne~ia!ly.Jobs' second attempt. the Apple
Mi~cintr~sh. was a huge success, not only h c a u s c it was much cheaper than rhe
Lisa. but also because il was user friendly, meaning that i t was intended for uscrs
who nnt only knew nothing ahout computers bur furthennore had absolutely no
intention whatsoever o f learning.
When- Microsufi drcided to build ;I S U L - W S S ~ : ~ It-o MS-r~i)S.i t was strmgly
influenced by the success of the Macinrosh. lt produced :d GLJI-based system
called Windows, which clriginally ran rm h>pnf MS-UI?S {i.u.. i t was more like ;i
shell than a truc operating system). For ahnut 10 y r m i . i'rmi~1985 r t ~1995, Win-
dows was just a grnphir:al ei~vimnrnenton rtlp lrf M S - I I O S . However. starting in
199s a fretrsi;~nding vcrsiun i b f Windows. W i n h w s 9.5, was rmelcasedthat incc~r-
parated many operating systcrn f;.:ltutas into i l . usiny the undci-]vine L.- MS-DOS s- ~ s -
ten1 rmly 1 ' b~ m ~ i n gand ru~uninycdd hl?;-lXjSprt~prilm.;, in 1008, a slighrly
iiwdified version uf this system. called Wiliifms 9%W:IS rcle;isc.d. Nevertheless,
hrlth Winduws 95 and Wirrdilws 93 still uorll~iini r Iarpc. a t m u n l of th-hir [ntel
;~wi'iibZ y language.
Atlother MicrrwA't t?per:isir.tg sysicm i5 Windt~wsNT iN-I' ~ri111ds f~ N C W
Technnlogy 1, whtch is c.rm-qsdtib!c. will1 Winch u+s i11 ii cc~-lili 11 Icvel. h ~ r CDIII-
pletr rewrite f r c m scratch j n t ~ l - ~ ~ i d 11 . ;i 11111 17-hit sysicl~l.'The Ifad desigller
l y is
f r w Windows NT was David Cutler, u fit) i t a s ;rlw r1nc 01' the derigncrs o f the
L'AX V M S operating sy4.rln. srb sll~nc.itleas fl-wn L'MS arc p-osen( in ST. Micrr,-
w f ' t expected that the first ver.sir~rlill' N'l' ~ r ~ u l hill
c i t ~ f ' f kJS-U<>S dnrj a11 r;>lhpr ycr-
%ionsof Winduws since it wns n v:l\tly \t~pc.ripr5yslen1, hut i~l ' i / ~ l c d ,C)nlv with
Windows NT 4.0 did it finally c ; ~ ~ cnn h in :i hie w?;. rspcciiilly i)n corprjralz net-
v m k s ; . V c r s h 5 of WinJuws N'l- w;ls rcrralncd Wj nrJoi<~..s 2Il(l() in e;u-]v ] WY. I t
was inttndesl tr, bu thu s w r c ~t t .j hnth ~ W~II&?W.\ 98 ; ~ l dW ~ ~ ~ ;IT L J 4.i).
O W ~
did not quitc work [rut r'illwr. su Micl-owl't c ; mout ~ with yet a~rother-\el-sion of
Windvws 98 c;dled Winbrws Me (1ZliIlsnnium edition l .
The other major c o n ~ e n d r in
~ . thc pcrconal ctrnlprrtcr world is UK[?i i~s
vat'iclos dcrivativcs 1. LiNlX ih s t r n m ~ ~ r st m i workstarions and r,ther hipll-enci coll1-
pukrs. such as network scrwrs. 1t i s cspcci:llly pupulur on machines powered by
16 INTRODUCTION CHAP. I

high-perfonnance RISC: chips. On Pentiurn-based COmpU€erS, LinW 1% becoming


a popular alternative to Windows for studenrs and increasingly mamy coTclrate
users+ ( A s an dlroughout this b o ~ kwe will ux i h ~tcml " k ~ ~ t i u l m "to
mean thc Perttiurn I, 11, 1II. and 4.)
A ]though many UKIX users, especiaH y expcricilccd prngrammers. pref'rr- a
~nrntnand-basedinterface to a GL!I, nearly dl U N l X sysktns SUppc?rl a w i l 1 d O N ' ~ ~
.;ystcln called l h X ~ Windows system produced at hl. l.T. This system handles the
hasic window managenlent. allowing users to c r e a k , delrtt.. inove. and resize
windrjws using u mouse. Often ii con1p1et.eCU1. such LLS Matif, i s avnilable to run
o n tt,p c:,f the X Winduws system giving [:NIX ;I lotlk and feel w n ~ c t h i n glike ~ h c
Macintush rlr Micrr~sr~ft Wlttdows. fr~r~ S L (!NIX
' users who want such a t h i n g
r l n inreresting develupnwnt that began uking place during zhe mid-1981)s is
t hr grwwrh uf networks trf personal urmput crs ruu lt~jz~g network operating SYS-
terns and distributed rrperating systems i'1'~nenbaurl-rand Van Sreerl. 2002). In
i t network r~peratingsysr.em. the usets arc Aware ol* the eexis~cnceof znulr.iple wr11-
pulers and can log in to rrrnrjte machinel;; ;lnd copy C i h trvm o n e tnar.hiiie to
another. Each inachine runs its o u n local uperaiiny system :md has irs own local
user (or users).
Network operating systems are nur fundamoitally differcnl trorr~ single-
P'OC"SSO~ operating systems. They obvious1 y n w d a nctwnrk interfiux cmtrollcr
and some low-level softwarc tn drive it, i l s well as pmgrarns tr, ac11iwe remote
login and remote file access, h u ~thcse additions do not change the cssentiiil strilc-
ture of the nperating system.
A distributed opcrati t q system, in ctmirnst. i.: onc that appears lib its users as a
traditional uniprocessor system, cvzn though i i is x r u a l l y composed of mu1 tiplc
pmccssors. The uscrs should not be awijrc o f whcre their pt-ogmms are bring run
vr where their liles art located; that shrmld ;ill be handled i~utrm~aticullv 3 ~ i dcffi-
cien~lyby thc operaling r y s ~ e n ~ .
True distributed operating syslcms rcquirr n w e than just d d i np a l i tt \s codc
to ii uniprocessor rqxraring systcin, because distributed anti ucntrilli~cd sustelns -
differ in crilical ways. Distributed systems. tbr r.trrrnple. often i ~ l l o wapplications
to run on scveral processors at the samc titnc. thus requiring motmecolnplex pl-o-
crssor scheduling algorithms in order lo opriminr r h iimount
~ o f parollclisrr~.
Cominunicatinn delays within thc netxrrr-k oRcn nlean that lhcsr (sod ot.hr=r)
algnrirhms must run with incomplete, wmiated. nr cvun inco~~rec.~ inFiwrrl~itinn.
This situation is radically di ffercnt from a single- proceswr SF.: xttm i n u h i c h thc
operating system has cmnplete iuformation ilhou~the syslcrrl \laic.

1.2.5 Ontogeny Recapitulates Phy togcny


AlictnCharles Darwin's buok Tlity 0 1 - i y i i r u/' (/re Spucit.s was p~~bliahrd.
thc
German ~ m l o g i s t Ernsi Haeukel ztated ihitr "Onrogcny Rccapitulntes Phylo-
geny." By this hc meant thiit the developmer~ti>f an cmbryo (ontugmy) repeats
SEC. t .2 H W D R Y OF OPERATING SYSTEMS 17

( i .e.. recapitulates) the evolution of the species (phylogeny). In other wards. after
fertilization, a human cgg goes through stages of being a fish, a pig, and so on
before turning into a human baby. Modern biologists regard this as a gross sim-
plification. but i t still h a s a kernel of truth in it.
Something analogous has happened in the computer industry. Each new
species (mainframe. minicomputer, personat computer, embedded computer.
smart card. etc.) seems to go through rhc development that its ancestors did. The
first mainframes were programmed enlirely in usscmbly language. Even complex
programs. like compiiers and operating systems, were written in assembler. By
the tinw minicomputers appeared on rhe scene. FORTRAN, COBOL, and other
high-level languages were common o n mainframes, but the new n~inicumputcrs
were i~everthelessprogrammed in assernbtcr (for lack of mcnrory). When micru-
computers {early personal computers) were invented. they, too. were prograrnrned
in assembler. even though by then minicomputers were also prngrilmrned in high-
level languages. Palmtop computers also scarled with assembly ctdc but quickly
moved on tu high-level languages (mr~stTyb e ~ a u s cthe devehprnent wurk was
done on bigger machines). The same is true for sman cards.
Now let us look at operating systems. The first mainframes initially had n o
protection hardware and no supporl fcsr multiprt~gramming,so they ran simple
opcrating systems that handled one manually-loaded program at a tirnc. Later
they acquired the hardware and operating system suppofl to handle multiple pro-
grams at once,and then full timesharing capabilities.
When minicomputers t?mt appeared, they also had n o prutcction hardware and
ran one manually-loaded program at a time. even though multiprogramming was
well established in the mainframe world by then. Gradually, they acquired pro-
tection hardware and the ability to run two or more programs at once. 'The first
microcomputers were also capable of running only one program a! a time, but
later acquired the ability to multiprograrn. Palmlops and smart cards went the
same route.
Disks first appeared on large mainframes. then on minicomputers. microcorn-
pulers, and so on down the line. Even nuw, smart cards du not have hard disks,
but with the advent of flash ROM, they will soon have the equivalent of i t . When
disks first appeared, primitive file systems sprung up. O n thc CDC 6600, easily
the most powerful mainframe in the world during much of the l960s, the file sys-
tem consisted of users having the ability tt, create a tile and thcndeclare it to be
pwmanent. meaning it stayed on the disk even after the creating program exiled.
To access such a file later, n program had to attach i t with a special command irnd
give its password (supplied when the file was made permanent). In effect, therc
was a single directory shared by all users. I t w a up to the users lo avoid file
name conflicts. Early rr~inicomputerfile systems had a single direc~oryshared by
all uscrs and so did early rnicrwornputer file systems.
Virtual memory (the ability to run programs larger than the physical memory)
had a similar development. It first appeared in mainframes, minicomputers,
rnicl.ocomputers and gra~luallv wurked i t s way down to smallci- and smaller 5)'s-
tetns. Ne~workinghad a siinilar history.
The
In all cases, the st>fmtwarrdevelopment w a s dictated by the ~echnology:?.'.
firs1 microuomputcrs. t i ~ rcxurnplc, had solnethi ng like 4 KB of memory and no
protection h i d w a r e . High-level languages arld mu11iprogr;nnining were simply
too much for such a tiny sysrem to handle. 4 s the mirrocotnputrrs evolved i n w
modern personat computers, they acquired the necessary hardware and then the
necessary software to handlc more advanced features. I t is likely that this
d e \ r e l q m x n t will continue for years to come. Other fields may also have this
wheel of reincarnation. but in the cornpurer industry i t seems ro spin faster.

1.3 THE OPERATING SYSTEM ZOO

All r ~ this
f history and duvelopinwt has lefl t ~ s~ ! i a~ widc
h variely ryt' t,pera[-
ing systems. nu1 2111 of whish are widely krluwn. tn this scctinn wc will briefly
touch up011 seven of thcn-I. We will coma back; t o w i n c of thesc differrnt kinds r d '
systems h ~ u inr the bnr,k.

1.3.1 Mainframe Optrating Systems

A1 the high end are the r.>paratinp systcnlh ['rw the maintiiuncs, thusc Iwnlrr-
sizcd cntnputcrs still fourid in 117ajor corp[wiitc L~:LT;~centers. l'hcse cr~mputersdis-
tinguish themselves fronl personal conqmtcrs in l~:rrns ul' their 110 capacity. .4
~naint'rumcwith 1000 disk?; and thnlrsands ilf gigabytes of ~ h t uis nut unusual:
pcrsr~nalcomputer with thew specit'iuatior,s wr,~~!tl bc odd indcrc?. Mriinfrnmcs
are also making s o n m h i n r or a ccmcbnc]\ ;iz high-cncl Wch wn!ers. server\ f c ~ r
u

largc-scale clectwn tc r o ~ i ~ n w r csiics,


e a r d .icrvcr> for hi]sincss- IU- b u s i r w i s t ran-
silctinrls.
The operating s y s t c m . f i w moinf'r;,rncr arc. he;t\ilv c;u.ic~li~il towar-tl p r c r c s s i n r
man? jobs at oncc. rnmi o f whi~.hnecd prudigims ; ~ i l i ~ ~ u (11' n t ni / O Thcy typically
d f e r 1 hree kinds of scrvicrs: hi~tc h. l i m s i r c ~ i oprnccssi
~~ ng. and i inc.sh~i-in,g.
~ ;4
hatch system i s ooc t h ~ tprocesses rourini. j c h ~ i r h o u tm y i n ~ e r o c i i vnszr ~
prrsunt. Cl;isms p n ~ c r s * i i~n i un
~ insurance c.olripan> 111- sales I-epoiti119 for a i h a i i ~
of stwes is typiuitlly dune i t l botch mode. 7r;i nwct i tr11 proccsiin F svst<nls hund lc
C- I

large nurnhers o( small rcqucsts. for esolllpk. chcch prcrce\uinp a1 il hank or air-
linr reser~:atkrns. Each unit c.11- wiw-!., is s n ~ d l .~ L I I heS R S ~ L Y T In ~ tmr~dk ~ i ~LIII-
drcds ur thousaiids per s c ~ r ~ Titncshari~ip
d. svs tcms ol low mulliple rcmoto usc1-s
to run juhs cm i h c c{.~mputcr ut ijnce. w c h ils quer\inp 3 big tlil~;tba%r.l'hcse fun^*-
lions itre closely tnrlarr.d:~nninfrarncoprriiliiig systems d t e n pcrtirnn all of thcr~r .

An exiirnple mai tiPritnw irpcratinp system i s (:)9390. ;I dcscendanl ot' OSi36r).


1.3.2 Server Operating Systems
level down are the server operating systcms. Thuy run on servers. which
;KC either very largc pc-rsonal computers. workstations. or even mainframes. They
aervc multiple users at once over a net work and allow the users to share hardware
and softwarr resources. Servers can provide print service, f i l e service, or Web
service. Internet providers run many server machines to support their customers
I
and Web siles use scrvers to store the Web p a p and hl-tndk the incoming
requests. Typical scrver operaring systems are UNIX and Windows 2W0. Linux
is n l w gaining ground Tor servers,

1.3.3 Multiprocessor Operating Systems

An increasingIy common way to gct major-kapuc computing p m w r i s to con-


nect mullipk CPUs inio a single system. Depending on precisely how they arc
connected and what. is shared, these systems arc called parallel ccjmputers, tni~lti-
computers, ar inu[riprr~cessoru. They need spccial operating systeitls, but often
these are varhtions ail the scrver operaling sys~crns. with special futures fnt
communication and ~ o n n c c t i v i t y .

1-34 Personal Computer Operating Systems

The ocxt category is the personai computer operating system. Their job is to
provide a g o d interface ro a single user. 'They are widlrly used for word process-
ing. spreadsheets. and Intcrnet access. Comn~onexamples are Windows 98, Win-
dows 2000. the Macintosh operating sysrein. and Linun. Persot~al compukr
operating systems are sn widely known thar probably little intrnduutirm is needed.
I n fact- many people an: n < ~even
t aware thar orher kinds exisr.

1.3.5 Real-Time Operating Systems


Another type of operating system is thc real-time system. These systems are
characterized by having time as a key parameter. For cxcrmple, in industrial p r w -
ess control systems, real-time corripuers have to collect data about the priduction
process and use it to contnd machines in the factory. Often there arc hard dead-
lines that must he met. For example, if a car is moving down an asse~nbl y linc.
certain actions must take place at certain instanis of time. II' a welding rohot
welds too early or too lare, the car will he ruined. If the :iction absolutely ~ N F I
occur at a certain rnomcnt (or within a certain range). w r h;we il hard reat-time
system.
Another kind of rcal-time system i s a suft real-time system, in which missing
an occasional deadline i s acceptable. Digital sudio or multimedia systems fall in
ihia category. VxWvrks and QNX are well-known real-time operating systems.
20 INTRODUC?'IQY CHAP. I

L3+6 Em bedded Operating Systems


Collti~~uing on down to smaller and smaller s\.ste~ns;wc comr lo palrntqJ
computers and embedded systems. A palmtop computer or PDA (Persmal Dipi-
tal Assistant) i s a srnall cornputer that fits in a shil-t pockct and perfr~rmsa sn1;rll
number 11f functions such ;is ;In clcct~.nnicddr-ess b c ~ ~iind k mcinr, y i d . Limbed-
dad s y s t e m s r u r i on ihc computers that c m t r o l tlrvicus that are n o t generally
thought of as cclrnputcrs. such as T V sets. micmwave ovens. imtl mohilc tetr-
phones. These often h a v e some chijractcristics of real-lime systems hut also h a w
s i zc, metnnry. and p w c r t e s t ticlions that t i l i i k ~them special. Examplcs of s w h
rspcl-rlting systems are Pal mOS rind Windows CL (C1rs t~sirmcrIilectrnrlic-i).

1.3.7 Smart Card opera tin^ Systems

The sn~nllestoperating systems run 011 smart unrds, which art: credit card-
sized devices cnnlzlining a CPI! chip. They h a w very sevrrre prncessing p w ~ u i .
and rnc~nm-yconstrsints. S r m e of them can handle mly n s i nglc function. such iiS
electronic payments, bur others can handle rnulliple functions on the samc sm;trr
card. Often rhesc are prr~prietarysystems.
Some smart cards are Java r~rier~ted.What !his rrieans is thal thc ROM rm the
smart card holds an interprttcr fnr thc Java Vii-iual Machine TJVM). Java ;ipplels
(smaI1 progratns) arc downloaded t r y the curd and arc- interpreted by Lhe JV,M
interpreter. Some of thesc cards can handle multiple lava applcts at tht: same
lime, leading tu n~ultipimgrammingand the need to schcduk L ~ L ' I ~Rcsrwrce.
management and protection also become arl issue when two or Imnre iipplets are
present a1 the same time. These issues must he handled by the (usuelly rxlrurnely
primitive) operating sysmn present un lhc card.

1.4 COMPUTER HARDWARE REVIEW

An upcrating system Is intimately tied t t thc


~ hardware of' the computer il runs
on. It extends the computer's instruction set and rnirnages its tmesuurcrs. To work.
it must know a great deal about thc hardw;ir-e, nl least, about huw the hardwar-c
appears to the programmer.
Conceptually, a simple personal ctmmputer can bc nhstracred ro :I rnodcl
resembling that of Fig. 1-5. The CPU. memory, and tN) devices are d l connecretl
by a system bus and communicate with one mother- c~vci.it. Modern persondl
computers h a w a more complicated structure. inwlving rnul~iplehuscs, which n8c
will look at later. For the time being. [his model will be wfficienr. In the l i > l l o ~ -
ing sections. we will bricfly review lhese components and exalninz some uf the
hardware issues that are tlf concern to operating system designers.
COMPUTER HARDWARE KEVTEW

Monitor
Hard
disk drive

I 1. . ..
Video Keyboard ,
F~OPW Hard
CPU Memory controller controller disk disk
.. controller . controller :
T
I
I

I
I
I

Bus

me "brain" of the computer is the CPU. It fetches instructinns frorrl memory


and executes them. I'hc basic cycle u l rvcry CPU is to fetch the first instmction
from memory, decode it to determine its type and operands. execute tt. and then
fetch, decode, and execute subsquem instructions. In this way, prugra~nsare car-
ried uut.
Each CPU has a specific set of instructims that it can execute. Thns a Pen-
tium cannot execute SPARC programs and a SPARC cannot execute Pentium pro-
grams. Because accessing memory tn get an instruction or data word takes much
longer than exesuttng an instrucrion, all CPUs contain sume registers inside tu
hold key variables and temporary results. Thus the instruutiim set generally con-
tains instructions to load a word from memory into a register, and stare a word
from a register into memory, Other ins~ructionscombine two operands from
registers, memory, or both into a result, such as adding twc~words and storing the
result in a register w in memory,
Ln addition to the general registers used t r ~bold variables and temporary
results, most computers have several special registers that arc visible to the pro-
grammer. One of these is the program counter. which contains rhc rnernory
address of the next instructinn to be fetched. After thn~instruction has been
fetched, the program counter is updated to point to its successor.
Another register is the stack pointer. which points to the top of the current
stack in memory. The stack contains one frame for each pmcedute that has been
entered but not yet exited. A procedure's stack frame holds those input parame-
ters. local variables, and temporary variables that arc nor kept in registers.
Yet another register is the PSW (Program Status Word). This register cnn-
tains the condition code bits. which are set by comparison instructions, the CPU
priority, the mode (user or kernel), and various other control bits. User programs
may ,loI-jualJyread the cnlirc P S W but lypicolly m a y u.ritc. only StlIllC o f i t s fii'ltls.
The PSW plays an il~~portant role i n system calls and 110.
~h~ operating system lnust bt. aware uf a l l the rugistcrs. When time multi-
plexing the CPU, the opcratinp system will often stop thc runninp PI-opramt r ~
(re)stn~-~ another one. Every time i t stops a running program, the clperating system
tnu~tsave all the registers so they can be restored when the program runs later.
To io~proveperformance. CPU drsipner..s have lnng ahandoiled rhe si~nplc
inndel o f fetching, decnding, and executing one instruction at a time. Many
madem CPUs have fxilitics fnr executing mw-c than rme instruclion at the same
rime. For example. a CPU might hatc separate Lktch. decode. a11d execute units,
sn that while it was cxwuting instruction n, i l could a h be decoding instruction
n + I and fetching instruction rr + 2. Such an nrgnni~atinnis called ;1 pipeline
rrnd i s illustrated in Fig I -6Ca) for a pipeline with three stages. Longer pipelines
are rssnmwn. In mosl pipcline designs, m c e an i r ~ ~ l r u c t ihas
r ~ n hcen frstchcd intu
thc pipeline, it must be executed, even if the preceding instructivn was a crmdi-
rional branch that was takcn. Pipelines causc compiler a.1-iters and operaling sys-
tem writers great headaches hecausr rhey expclsc the cm~plenitit.sof' the undcrly-
ing machine tr, them.

, Fetch Decode
unit unit
f I

Fetch Decode Execute :


unit
a unit upit
Decode
unit unit

I4

Evcn inore advanced than a pipeline dcsigrl i s i t supersralar CPU. shoir:n in


Fig. 1 -6(h). In his design, muhiple cxccution unils are present., for cxainple. one
for integer arithmetic. one fur floating-point arithmetic, and one Crw Hooleim
operations. Two or inore instructions are fetched a1 once. decoded, and durripcd
into a holding buffer until they can he cxscimd. As soon as a n execution unit i s
free. it looks in the holding huffer to see if there is an instruction it can handle.
and if so. it removes the instruction from the buffer and exccutes it. An irnplica-
tion rrf this design is that program instructicms arc n h n executed out o f order.
Fnr the most part, i t is up to the hardware to make sure the resull produced is the
same one a sequential irnplcmentaiion would have pn~ducrd.but an annoying
iirnount of the complexity i s foisted onto the i p x a r i a g system, as we shall scc.
Most CPUs, except w r y simple ones used in embedded systems, have lwo
modes, kernel mode and user mode. as tnentioned earlier. Usually a bit in the
23
SEC. 1.4 COMPLITER HARDWARE REVIEW

PSW controls the m{,de. When running in kernel mode. the CPU can cxecule
every ins~uctionin its instruction set and use every feature of the hardware. The
o ~ r a t i n gsystem runs in kernel mode. giving it access to the cc~mpletehudwarc.
I n contrast. uscr programs run in uscr mode, which permits only a subset of
the instructions 10be executed and a suhser of the features to be accessed. Gen-
eraljy. a l l instructions involving I/O and memory protection are disallowed in user
mode. Setting the PSW mode bit to kernel modeis also fohidden, of course.
To obtain services fmm the operating system, a user program must make a
system call. which traps into the kernel and invokes the operating system. The
TRAP instruction switches from user mode to kernel mode and starts the operating
system. When thc work has been completed. control is returned ro the user pro-
gram at the inslruction following the system call. We will explain rhe details of
the system call process later in this chapter. As a note on typography, we will use
the lower case Helvetica font to indicate system calls in running text, like t h k
read.
It is worth noting that computers have traps other than the instruction fur exe-
cuting a system call. Most of the uther traps arc caused by the hardware to warn
of an exceptional situation such as an attempt to divide by 0 or a floating-point
undert'low. In all cases the operating system gets contrtd and must decide whiit to
do- Sometimes the program must be terminated with an error. Other times [he
error can be ignored (an underflowed nutnkr can be set to 0).Finally, when the
p r o g m has announced in advance that it wanis tn handle certain kinds o f candi-
tions, control can be passed back to the program to let it deal with the problem.

The second major component in any computer is the mcmoty. Ideatly. a


memory should be extremely fast (Faster than e ~ e c u t i n ga
n instruction so the C'PU
i s not held up by the memory). abundantly large, and dirt cheap. No current tech-
nology satisfies all of rhcse goals, so a different approach i s taken. The memory
system i s con~uuctedas a hierarchy of laycrs. os shown in Fig. 1-7.
The top layer consists uf the rcgistcrs intcrnal to the CPI!. They w e made of
the same material as the CPU and are thus just as fabt as {he CPIJ. Conacquently.
there i s no delay in accessing (hem. The sloragu cnpaciry avnilablv in them is t yp-
icafly 32 x >?-bits on a 32-bit CPU m d 63 x 64-hits on ;i h4-bit CPU, Less than I
KB in both cases. Programs must managc the registers (ire..deride whal to kcep
in them) thernsdves, ir.1 software.
Kext comes the cache memory. which i s rnos~lycontrolled hv the hardware.
Main rnemury is divided up into cache lines. 1ypically 64 hyres. with addresses O
tn 63 in cache {inc O. addresses M to 1 27 in rache line I. and so o n . The most
heavily used cache lines are kept in a high-spced cache located inside o r VCI-y
close lo rhc CPU. Whrn the program need?; to read a memory word, the cache
hardware checks to sec if the line needed i s in the cache. If it is. called a cache
INTRODUCTION CHAP. 4

Typical access time Typical capacity

1 nsec , I Registers [ , 4 K8

1 Cache
64-51 2 MB
10 nsec Main memory
10 maec I' Magnetic disk 1 5-50 GB

100 sec I Magnetic tape I 20-1ODGB

hit. ~ h rcquest
c is satisfied from h e cache and no rneintlry request is sen1 w e r the
bus lo the iniiiii mernrq. Cache hits normally take about ~ w t , cluck cycles
Cache misses have ti) go to mernory, with a substantial timc penalty. C;lche
rnerrtory i s limited in size due to ils high cost. Sornu machines have two .rjr even
~ h r c elevels of cache, each one slr~werand bigger than the m e before il.
Main memury comes next. This is the workhorse of the mcmxsrv system.
Main memory is often called RAM (Random Access Memory). bid tirncrs
sometimes call it core memory, because computers in the 1950s and 1961)s used
tiny magnetizable ferrite cores for rrtuin memory. Currently, rnen~nriesare rens t r l
hundreds of megabytes and growing rapidly. All UPL' requcsts that cannot he
satisfied out of the cachc go to main memory.
Next in the hierarchy i s magnetic disk (hard disk). Ilisk storage is two rwders
of magnitude cheaper than RAM per hit and often ~ w orders o uf magnitude larger.
as well+ The only pri>blern is that the time ttr randornly access &la o n it is clostL
to three orders of magnitude slower. This Iow s p e d i s rluc ti) thc fact lhat a disk
i s a mechanical device, as shown tn Fig. I -8.
A disk consists of one rlr more metal p h ~ c r sthat rotilk at 541'10. 72W. 01-
IO,XoU rprn A mechanical arm pivots over the platters from the corner, similar to
the pickup arm on an old 33 rpm phonograph for playing vinyl records. Inf3orm;i-
tion is written onto the disk in a series uf concentric circlcs. At any r i v c t ~arm
L.

position, each of the heads can read an arlautar region called a track. l'ogolher.
all the trnck.s fnr a given arm pnsitiun form a cylinder.
Each track is divided into some nulnbcr of sectors. typically 512 hytcs per
sector. On modern disks. the uukr cyhnders crmtain inme sc-cturs than h c . inncr
ones. Moving the arm from one cylinder tn rhe nexl one takes abour 1 msec.
Moving it ro a random cylinder typically takcs 5 mscc to 10 msec. depending on
Ihe drive. Once the arm is on the c ~ m x ttm ~ k t, l w drivc musi wait fur. the needed
secror to rotate under the head. an additional delay of 5 Inarc to I D msrc. depend-
ing un the drive's rpm. Once the sector i s under the head. reading or writing
r m x r s at a rate of 5 M B k c un low-end disk5 10 160 blR/seu i-m fdster ryncs.
SEC. 1.4 COMPUTER HARDWARE REVIEW

Read/write head (Iper surface)


Surface 7

Surface 6
Surface 5

Surface 4
Surface 3 - --.
-
--
Direction of arm
__--
Surface 2
Surface 1 -.

Suhce 0

Tt-le final layer i n the mcrnory hierarchy is magnetic tape. This medium i s
often used as a backup f x disk stwage and for holding very large darn sets. Tr)
acccss a tape, it must first be put into a tape reader. either by a person or a mbnt
(autnjnarcd tape handling is common at ~nstallationswith huge databases). Then
the t a p may have to be spooled forwarded 10 gct to the requested block. All in
ail, this could take minutes. The big plus of tape is that tt is exceedingly cheap
per bit and removable, which is important for backup taws that must he stored
off-site in order'to survive fires, floods, earthquakes. etc.
The memory hierarchy we have discussed is typical, but some installations do
not have all the layers or have a few different ones (such as optical drsk). Still. in
all of them, as one goes down the hierarchy, the random access time increases
dramatically. the capacity increases equally dramatically, and the cost per bit
drops enormously. C'onscquently, it is likely that mcmnry h i e r a r ~ h k swill be
around far years to cornc.
In addition to the kinds of memory discussed above. many computers have a
small amount of ncmvolati le random access memory. lJn like RAM, nonvolatile
memory does not lose its contents when thc powcr i s switched off. ROM (Read
Only Memory) i s programmed st the facrory and cannot be uhangcd aflrrward. It
is fast and inexpensive. On some computers, rbe bootstcip Ioadcr used to slart the
computer is contained in ROM. Also, some 1/0 cards unrne with KOM fnr him-
dling low-level device control.
EEPROM (Elecltricdly Erasable ROM 1 and flash RAM ul-e also nonvola-
tile, but in contrast to ROM can he erased and r c w r i w n . Howevcr. writing them
takes orders of rnagnilude more time thau writing R A M . so thcy are used in the
same way ROM is. only with the additional fcaturr that it is now possihlc to
correct hugs in programs they hold by rewriting them in thc field.
Yet another kind of memory is CMOS, which is volatile. Many cclmputers
use CMOS memory to hold the currenr time and date. The CMOS memory and
he clclck circuir [ha[ incrcll~enlrthe timc in i r arc pc!ncrsJ b? n 5113~11balrcry,
thr: rilnc i.; Lorrccllyupdaiod, svtn w h r n rhc ci~mpurcri q irr~piugyA.'l'hi- CMOS
tncmory c a n 4 1 ~ ~hold
3 ihl: ~ w f i g ~ ~ - a tparanleterr,
ion such as *hich disk to bt.~A
fri,m. CAMi_lSis used becmse it draws su lit!le prnwr thar rhc uriginiil tBctrrv-
iosrallcd battery often l a s ~ sfor several y e a r s Hnwevcr. whzn it hegins to i i i I. the
computer can appear to h a w Alzheimer's disaasc, forgetting things thar i t ha5
known fnr ycars, like which hard disk ti> hoot fmm+
Let us now foeus on main memory for a lirrlc while. I t is oitcn dcsirahlc tit
hold multiple programs in memory at-once. If m e program i s blucked waiting f t ~ r
a disk read ro complete, another program can use rhe CPU. giving I, hettcl- CPI!
utilizalion. However. will1 two or r n r m programs in innin memrlrv 31 ijric'c. tw'r)

2 . How tu handle relocation.

Many solutions arc possible. However-. all i-11'itlcnli i r r vu!\.r. q u i p p i n g t hv CTli


with s p x i d hardware.
rible to rcfcrcncc any Dart of memory atrcw? itsti!'. T ~ u this s s c h t n ~ esolvc~.hcrrh
the protection and the relocation problem ac the cost of two ncw registers i ~ n da
C . cycle time (to perform the limit check and atlditicrnj.
slight ~ I I C ~ C ~ I Sin
Address
Flegisters
when
prograrn.2
is running
Registers +- Limit-2
User p r o g m j
and data program 1 User-2 data
+Bass-2

-l - 4
is running

-
Limit

1 Usmprogram
and data I Base-2

Base- 1 --,
User-1 data

- Lirn~t-1
Base- 1

The check and mapping result in converting an address generstcd by the pro-
gram, called a virtual address, into an address used by the memory. called a phy-
sical address. The device that performs the chcuk and mapping is called the
MMU {Memory Management Unit}. It is located on thc CPU chip or close t i ) i t .
bur is logically between the CPU and the memory.
A more sophisticated MMLJ is iliustratcd in Fig. I-c)(b). Here we have an
M M U with two pairs of base and limit registers. onc lor rhc p r o p r i m t e x t iind one
for the data. The program counter and all other rcfcrencos tu the program text use
pair 1 and data references use pair 2. As a conseqirence. it is now possible to h d v r
multiple users share the same program with only one copy of ii in memory. scrme-
thing not possible with the first scheme.. When program 1 i s running. the four
registers arc set as indicated by the arrows to the left of Fig. 1 -9(b). When pro-
gram 2 i s running, they are set as indicated by the arrows to the right of the figure.
Much mure sophisticated MMUs exist. W r will s~udysome of them later in this
1.4.3 I/O Devices

Meinory is not thc only resourcc t h a ~1 1 - 1 ~ opcl-atinp system must nlanuge. I/(.)
devices also interact heavily with the operating sysicni. As w r saw in Fig. 1-5.
I/O beviccs generally consist of T W O parts: ii ~*nnirr,llera i d the devics irseli'. The
cuntrollcr is a chip or a set of chips on a pluc-in ,_. board that ph~sicallyc u n t r d s thc
device. It accepts comtniinds from the operating syslcrn. fur rxample, l o read data
from the device, and carrics them our.
In many cases, the actual control of the devicc i s vcry complicsted and
detailed, so it is the joh of the ccmtrotler to prbescnt n simpla- inrerfilce to the
operating system. For example. a disk con ti-ollrr might accept n command to read
sector 1 1.20h from disk 2. The cuntrolltr- then has to convert this lincar sectrw
number to a cylinder, sector, and head. This conversion may hc crrrnplicetcd by
the fact that outer cylinders have more sectors than innur m c s atid that some bad
sectors have been remapped onto 01her uncs. Then thc conmller has to dewmine
which cylinder the disk arm is on and give it a scquencr of pulses to move in VI-
c ~ u tthe requisite numbcr of cylinders. It has lo wait until the proper sectur has
rotated under the head and then start reading and storing the bits as they come off
the drive. removing the prcan~blcand computing the uhecksum. 1:innlly, it has to
assemble the incoming bits intu words and storc them i n mcmury. To do all t h i ~
work, controllers often contain small entbcdded computers thal are programmed
to do their wr~rk.
The other piece is the actual device itsclf D c v i c r s havc fiiirlv simple inter-
faces. both because they cannot do rnllch and r v inakc them standard. The loiter i.;
needed so that any lnF5 disk cnntn~llercan handlc any IDE dish. For cxalnplc.
IDE stands for Integrated Drive Electronics and i s thu standard type of disk or,
Pentiums and some other computers. Since the acrual dc\ice interface is hidden
SEC. 1.4 COMPUTEK HARDWARE REVIEW

behind the cclntr.oilcr+ 1 that the operatillg syStWIl S W 5 the ir~terfdcetu the cotl-
tru\lcr. which ,nay be d i f f e ~ n from
l the interfiie to the device.
Because each type of controller is different. different software is WXdeti to
cunrl.ol tach one. The software tfiat talks to n conlrokr, g i v i n g it conlnlands and
responses, i s called a device driver. Each controller tnanufactuer has
to supply n driver for each operating system i t suppwts. Thus a scanner rnay
cumc with drivers f~)rWindt~ws98, W i n h w s 2WU, and UNIX. for example.
be used. the driver has to bc put intlo the operating system so it can run in
kcmcl n~ode.Theoretically. drivers can run outside rhe kernel. hut few current
svslcms support this possibility becausc il requires the abiiity to allow n user-
space driver €{, be ahlc to access the device i n n controlled way, a feature rarely
suppmted. There are threc ways the driver can be put into the kernel. The first
way is to relink the keinel with the new drivcr and then reboot the system. Many
UNIX systems work likc this. The second way is to make an entry in an operating
system file tellling it that it needs the driver and rhen rebout the system. At boot
time, the rjpcrating syst.en.1 goes 3rd finds the drivers it needs and loads them.
Windows works this way. The third way i s fbr the operating system to k able. tu
accept new drivers while r m n i n g and install them on-the-fly without the need to
rebool. This way used tu he rurc bur is kcorning inlrch more cornmoo now. Hot
pluggable devices, such ;is USR and IEEE, 1393 devices (discussed helow) always
need dy narnicdly hadcd drivers.
Every controller hiis a snlall number of registers that are used to com~nunicntc
with i t . For example, il tnininial disk controller might have registers for spccify-
ing rhe disk address. niernory address. sectclr count, and direction (read or write).
Ta activate the controller, thc driver gets a command from the operating sysicm,
then translates it into the appropriate values to write into the device rcgisrers.
On some computers, the device registers are mapped into the operating
system's address spacc, so rhcy can be read and writwn like urdinary memury
words. On such computers, n o special I/O instructinns are needed and uscr plmo-
grams can be kept away from the hardware by not putling these memory ad-
dresses within their math (c.g., hy using hase and limit registers). On nthcr com-
puters, the device registers are put in a sprciirl I/O port space. with each register
having a port address. O n these machines. special IN and OUT instructions are
available in kernel mode to allow drivers to read and write the registers. The
furmer scheme eliminates the need fur special 110 instruclions but uses up some
of ihc address spacc. The latter uses no address space but requires special insrruc-
tions. Both systems are widely used.
lnput and output can he done in three different ways. In the sinlplest method.
a user program issues a system call. which the kernel then translates into a pro-
cedure call to he appropriate driver. The driver then stnrrs the I/(> and sits in a
tight loop continuously polling the device to scr i f it is done (usually there is some
bit that indicates that the device is still busy). When the VCJ has completed, the
driver puts the data where [hey are needed (if any). and returns. The clperating
systcrn then returns control lo thc caller. This method i s called busy waiting a n d
has the disadvantage of lying up the CPU polling ihe device until it is finished.
'['he secrmd rnethud is for the driver to start the device and ask it ti) give 3n
L

interrupt when ir is finished. At [hsr point the driver returns. I ' h e operaring sys-
zcm rhcn blucks the r;nllzr if need he and Lwks far clther work t o do. When the
vonrroller detects the cnd of rhe ~ransfer.i t generates an interrupt ro signal corn-
pletinn.
in operating syslerns, s u l ~ us
h t c r n q ~ satbevery i~-rlpr~rtnnt r examine fhe idca
more dusely. In Fig. 1-IO(a) wc see a threc-step process fur VU. In step I , the
driver ~ 1 1 sthe controller what to do by writing into its device regisrcrs. The con-

chip wing certain bus lines in step 2. If the interrupt c ~ ~ i t r o l l cisr prepared lo
accept the interrupt (which it may not bc i f it i s bwiy with a higher priority m c ) , il
asserrs a pin on the CPU chip informing itl in step 3. Jn step 4, the interrupt coil-
troller puts the number the device on the bus so the CPU can izad it and k n o w
which device has just finished (many devices may he running at the samc t i m c ) .

Disk drive

9 Oisk
controller

2. Dispatch

Once the CPU has decided to take the interrupt. thc program crmnler and
PSW me typically then pushed onto the curre111 stack and the CPU switched intrr
kernel mode. The device number may be used as an index into part ul'rnewory
find the address of the interrupt handler for this device. This piin of rncrnory i x
called the interrupt vector. Once the intenup handler (part of the driver fior the
intempti ng device) has st:irted, it removes: tlic stacked program cclunter and PSH.
and saves them. then queries the device to learn its status. When rhe handler i s all
finished, i~ returns to the previously-running user program to the first instruction
that was not yet executed. These steps arc shown in Fig. i - lO(h).
The third method for doing 1/43 m a k s use uf a special UMA (Direct
Memory Access) chip that can control the tlow of hits between menlory and
some c~otrollcrwithout constant CPll intervention. The CPLI sets up the DMA
chip, [elling it how many bytes to transfer. thc device and rnrlnory addresses
involved. and the direction, and lets it gn. Whcn the DMA chip is done, it causes
an interrupt. which i s handled as described above. D M A and 1/0 hardware in
general will be discussed in more detail in Chap. 5 .
Interrupts can often happen st highly inconvenient tnornents, for cxarnple,
whilc another interrupt handler is running. fix this reason, the CPU has a way to
disable interrupts and then rcenable them later. While interrupts are disnblcd. any
devices that finish continue tn assert their intcrrupt signals, but the CPU is not
interrupted until interrupts are enabled again. If mu1ttplc devices finish while
interrupts are disabled. the inte.nupt cuntroller decides which m e rn lei through
first, usually based on static priofities assigned to tach device. The highest prior-
ity device wins.

1.4.4 Buses

The organization OK Fig. i -5 was used o n minicomputers for years and .;rlstl o n
the original ISM PC. Flowever. as processors and n~ernnricsgot faster. thu ahilily
of a single bus (and ceflainly the IBM PC bus) tu handlc all the traftk was
strained to the breaking point. Something had to givc. As a result. additional
buses were added, both far faster U 0 devices and fnr CPLJ to nleinm-y traffic. As
a consequence of this evoiution, a'lnrge Pentiunr sqatenr currrnti? looks some-
thing ltke Fig. 1-1 1.
This system has eight buses (cache, local, memory. PCI. SCSI, USB. IDE.
and ISA). each with a different transfer mte and function. The operating sysrem
must'be aware of all nf them for configuration and management. The two main
buses are the original IBM PC ISA (Industry Standard Architecture) bus and
its successor?the PC1 Peripheral Component Interconnect) bus. The ISA bus.
which was originally the IBM PCIAT bus, runs at 8.33 MHz and ran transfer 2
bytes at once, fora n~aximurnspeed of 16.67 MBlseo. i t is irrciuded for backward
compatibility with old and slow 1/0 cards. The PC'! hus was invenicd bv Inlei as n
successor to the ISA bus. It can run at 66 MHz and transfer 8 bvtes at a time, for
a data rare of 528 MB/sec. Most high-speed [/O devices use the PC1 bus now.
Even some non-Intel computers use the PC1 hus due to the large nurnbcr UP 110
cmds available for it.
In this configuration, the CPU talks to the PC1 bridge chip over the local bus.
and the PC1 bridge chip talks to the mernnry over a dedicated memory bus. ollcn
running at 100 MHz. Pentiurn systems havu a level- t cache on chip and a nluch
larger level-2 cache off chip, connected to the CPV by h e cache bus.
In addition, this system contains three specialized buses: IDE. USB. imd
SCSI. The IDE bus is for attaching peripherdl devices such ;ls disks and CD-
32 INTRODUCTION

Cache bus Lucal bus Memory bus

+
!
r '

i4
I I

PCI 7
Level 2 /1 GPU bridge
cache
h-. 4. k
1

, PC I bus

I I I 1 I I
,
I SA IOE Available
. & . "p
bridge disk PC1 slot

,"
ISA bus
7
P

-
I I 30um 1
- - .. ..
1
Printer
I t
Available
ISA slot

ROMs to the system. Thc i n E hus is an outgrowth of the disk cot~trnllerinterfac-u


o n the PC/AT and is now standard on nearly all Prntium-based systcnw for the
hard disk and often the CD-ROM.
The USB (Universal Serial Bus) was invented to attach 411 rhe slow 1/(3
devices, such as the kcyhoard and mouse, to the computer. It uses a small frm-
wire connec~z~r, two of which supply electrical power to the USB devices. IJSB is
a centralired bus in which a root device p d l s the 110 devices cvery I msrc to scc
if h e y have any traffic. It can handle an aggregaic toad of 1.5 MB/sec. All the
U S 0 devices share a single USB det!ice driver, making i r urrnecessary to install a
new driver fur each new USB device. Conseqocntly. CJSB d c v i c ~ scan bc added
lo the cnmputcr without the need to rebrmt.
The SCSI (Small Computer System Intwfnee) hos is a high-pcrbrn~ancc
bus inkndcd f w fist disks, scanners, and othcr devices needing cmsiderablr:
bandwidth. It can run at up to 160 MB/sec. I t has hecn present on Mxinttrsh sys-
tems since they were invented imd is also popular on UN[X and wmc Intel-hused
systems.
Yct nnothcr bus (not shown in Fig. 1-1 I ) i s IEEE 1394. Somelinizs i t i.;
catled FireWirc, although strictly speJ&ing, Firewire is the name Apple uses for
i t s implementation of 1394. Like USB, IEEE 1394 is bit serial but is designed fur
packet transfers at speeds up to 50 MB/scc, making it useful for connecting digital
and similar multimedia devices to a computer. unlike USB. IEEE
1394 does not have a central contrulter. SCSI and IEEE 1394 face compcti€ion
from a faster version of U S 3 k i n g developed.
To work in an environmen~such as that of Fig. 1- 1 1. the operating system has
lo know what is out there and configure it. This requirement led lnrel and Micro-
son tu design a PC system called plug and play, based o n a similar concept first
implcrncnted in the Apple Macintosh. Before plug and play, each I/O card had a
h e d interrupt requesl level mri fixed addresses for its I/O registers. For example.
the key hwmd was interrupt 1 and used I/O addresses 0x60 tn OxM. the floppy disk
ont troller was interrupt h and used 1/0 addresses 0 x 3 W ro 0x3F7. and the printer
was inierrupt 7 and used I/O addresses 0x378 to Ox37A, and so on.
Sr, far, so grmd. The trouble came when the user bought n souad card and n
modern c a d and h t h happened to use, say, interrupt 4. They would cuntlict and
would nnt work together. The solution was to indude DIP switches rx- jumpers a n
every 1/U card and instruct the user to please set them to select an interrupt I e v d
and 1/0 device addresses that did not conflict with m y others in the user's system.
Teenaps who devoted their lives to the inlricacics of the PC hardware c o ~ ~ l d
sometitnes d o h i s without making errors. Unti~rtunateIy,nobody else could. iead-
i ng to chaos.
What plug and play does is have the system automatically c d l c c t information
about the V 0 devices, centrally assign interrupt levels and IK3 addrcsscs, and then
tell each card what its numbers arc, Very briefly, that works as fnllnws im the
Pentiurn. Every Pentiurn contains a parenthoard (formerly c a l l 4 a motherboard
before political correctness hit the computer industry). On the paenthuard is a
program called the system BIOS (Basic Input Output System) The BIOS con-
tains low-level 110 sdlware, including procedures to rcud the keybond, write to
the screen, and do disk I/O, among other things. Nowadiiys. it is held in a flash
RAM, which is nonvolatile but which cart be updaced by the operating system
when bugs are found in the BIQS.
When the computer is booted, the RIOS is started. It first checks to see how
much RAM i s installed and whether the keyboard and othur b a s k devices are
installed and responding correctly. It starts out by scanning h e 154 and PC1
buses to detect all the devices artached to them. Somc of these devices itrC typi-
cally legacy (i.e., designed before plug and play was invented} and have fixed
intempt levels and U 0 addresses (possibly set by switches or jumpcrs on the 1/0
card. hut nut modifiable by the qxrating system). Thesl: devices ore recorded.
The plug and play devices arc: also recordd. If rhe devices present arc different
from when the system was last bouted, the new devices ere configured,
The BIOS then determines the bwut devicc by trying a list of deviccs stored in
the CMOS memory. The user can change lhis list by entering a BIOS configura-
tion program just after booting. Typically, an attempt is made to boot from the
floppy disk. If that fails the CD-ROM is tried. If neither a floppy nor a CD-ROM
1.5 OPERATING SYSTEM CONCEPTS
All operating sysren~shave certain basic concepts such as pmursses, mcmory.
and files that are ccntral to understanding rhcin, In thc tbllowing sectims. we. will
Irmk at some of these basic concepts evcr so brietlv. as an i ntrtduction. We h-ill
come back ILI each of them in great bctnil later in !his htmk. 'l'n illustrate ihese
concepts we will use examples from tirne t r ~~irne.getlersllly drawn f r c m U N l X .
Similnr. examples typicaljy m i s t in other systerrrs as well. hrnwvcr.

1-5.1 Processes

A key concept in all operating nysrems iz. the process. A proccss i s h:lsically
a program in execution. Associated wirh ruc h process is its address spare. ;I list
t ~ f 'nlciriory lncalioi~sfrom some minimum (~.~suilll y 0) to sume rniixiinum. which
the procrss can read and write. The address space contains ihe executable pn?-
gram, the program's data. and i t s stack. Also assc~cialed wilh each process is
S U I T I ~sct of registers. including the program courmr. stack pointer. and othsr
hardware registem. and all the othcr infunnation needed to run the program.
W c wilt come hack lu the process concep in much more derail in (:hap. 2. but
fnr the time king.,the easiest way to get a good intuirive lkcl for a pnlccsh is In
think about ti mesharing s y s t e i ~ ~ sPeriodical
. l y . [he opcrati ng system decides lo
stop running one pmccsu and statl running another, for example. because the first
one has had more than 11s sham of CPU tinx i n the past second.
When a prncrss is suspended te~r~porarily lihc this, i t musr laler br rcstiir.~edin
exactly the same state it had when i t was stopped. This means that all inforn~ation
about thc process must tK. explicitly saved somewhere during the suspension. For
example, the proucss may have several files open for reading at orrc. Associated
SEr. 1-5 OPERATING SYSTEM CONCEFTS 35
wit11 each d'these files is a pointer giving rile current positit3n (i.c..the n u m k r of
lhc byte record to be read next). When a process is temporarily suspended. dl
thrsc pointers hc: saved so that a read call exccutcd after t h prfJCCSs
~ is re-
stafled will read the pruper data. I11 many operating systems, all thc information
abuut each process, other than the contents of its own address space. is stored in
an operating system tablc called the process table. which i s an array (or linked
li sr) of structures, one t i ~ each
r process currently in existence.
Thus, a (suspended) process consists nf its addrcss space. usually called tl~c
core image (in honor of the magnetic core inrn~oriesused in days of yore), and its
process u b l e m t r y . which contains i t s i-ttpistsrs. among other rhings.
The kcy process management system calls are those dealing with the creation
and ~crminaticmof processes. Consider a t y p i c d example. A process ciilled Ihr:
command interpreter or shell reads cr~~nrnands frum a terminal. The user has
just typed ii cnmrnand requesting that a program be compiled. Thc shell must
nolc ur-eatr a new process that wiIl run the cumpiler. Whcn that process has fin-
ished the wmpilatiun, it executes a system call to terminate itsclf.
11' 3 process can create one or more other proucsses (referTed €0 ns child
processes) m d these processes in turn c a n create child processes. wc quickly
<arrive at the p r a e s s Ircc structure of Fig. 1-12. Related processes that arc
cooperating ?a get. same job d r m ofte~ineed to cr~ininunicatewith one annthcr
and synchronize their aclivities. This communication is called interprocess com-
municatiun, and will be addressed in derail in Chap. 2.

Figure 1-12. A proccss trcc. F'rwess A crcaicd rwo child prwcswx. K and C'.
Process B creaied h - c c child prucesses. D. L-, b',

Other prnccss system calls are available to requcst mow memtrry (or release
unused memory), wait for a child process tu terminate, iind overlay its program
with a differcnt one.
Occasionally, Ihere is a necd to convey inforn~srionto a ~.unningprocess that
is not sitting arcwnd waiting for this inforn~ation. 1431- example, a prcxcss that is
communicating with another process on a different computer docs SO hy sending
messages to the remote prwcess over a computer network. To guard against the
possibility that a rnessagc or its reply is lost, thc sendcr may request that i t s crwn
operating system notify it after a specified nulnhzr of seconds. so that i t can
retransmit h e message if no acknowledgement has been received yet. After set-
ting this timer, the program may continue doing other work.
Whcn ihc specified number of' scccmds has elapsed. the openrliag s y s t r n ~
sends an alarm signal to the prwcss. 'The signill causcs thc pImrKc5s tPm-
porarily suspend whatever it was doing. suvc its rcgis~crson the stack. a i d stad
~ . u n n i n ga special signal handling prwedure. for. example. tu rrtranstoii 3 presum-
ably lost message. When the signal halldler is donc. thc I-uoning process is re-
started in thc state it wa5 in just b e f w the signal. Signals arc the .;oftwarc analog
of hardware interrup~sand c m be gencratcd by a vat-jay or causes in addition to
timers expiring. Many traps detected by hardware, such ;is cxecut ing an 11 legid
i n s t n c h n or using an ir~validaddress, are alsu converted inlo signals to the guilty
prrux.
Each person authoi-ized to usc a system is ilssiped ;I IJID (User 1Dcntific~1-
tion) by the systcm administrator. Every prrlcess started has the CJID d the per-
son w h o started it- A child process has the sawc UID ils ils pnrent. Users can bt.
members of grwps, each of which has a GID [Group IDentification r.
One UID. called the superuser (in IJNiXl, has special prwier and ]nab vid;iic
many af the prntectim rules. In large inst.allations, nnly the sy swm ;1dtnitlis\.rii1nr
knows thc password nwded to becm-t~esupcruser, but inany of rhe rjrdiniiry users
(especially studem) devote considerable cfftsrt Ir, trying to ijnd f1aw.s in the s y s -
tem that allow them tn become superuser wit h w t the pass wrmi.
Wc will study prwesscs, interpmcess ut~rninmicutirm.and related i s u c s ia
Chap. 2.

1 5 2 Deadlocks

When rwo or mcm yroccsses are interacting, they can sornctimes pel rhetn-
selves into a stalemate situation they cannot get out of. Such a siluaiion i h ~ i d l c d
a deadlock.
Deadlocks can best be introduced wjrh a real-world example everyone i.; film-
i h r with. deadlock in traffic. Cunsider thc si~uationof Fig. I - 13ta). Herr fuur
buses are approaching an in~erscution. Behind each one are more buses (not
sh<~wn).With a little bit r ~bad
f luck, the first f 0 l l r c r ~ u l r a11
j arrivc a t the intersw-
tion simultaneously. leading 1.u the situahm of Fig. I - 13(hj, i n which they arc
deadlocked because nonc of them can gil forward. Each onc is blocliing one of
the others. They .cannot baukwmd due t o orher buses behind thern. T k i - c is r w
easy way out.
Processes in a computer can experience an annlogr~ussituation in which they
cannot make any progress. For example, imagine a colnpurer with u tape drive
and CD-recorder. Now imagine that rwn pruccsses cach need to produce a CD-
ROM from data on a tape. Process I requesa and i s granted the tape drive- Ncxt
process 2 requests and i s granted the CD-reconlcr. Then prcccss I requests the
CD-rccurder and is suspended until proccss 2 i7eturns it. Finally. process 2 re-
quests the tape drive and is also suspended because process I dready has it. Hcre
OPERATING SYSTEM CONCEPTS 37

we have a de4adlwkfrom which there is no escape, We will study cleadlrrks ilnd


what can be done aboui them in detail in Chap. 3 .

1.5.3 Memory Management

Every computer has some main memory that it uses to hold executing pro-
grams. In a very simple operating sysreni, only one pimogramat a fimc is in
memory. To mn n second program, the first m e has to be reinwed and rhc
second one placed in mcmory.
More sophisticated operating syslcrns n1Low multiple prograins tu be in
memory at the same time. To keep them from inferfixing with one another (and
with thc operating system). some kind of protection mechanism is needed. While
this mcchanisrn has to be in the hardware, i t is controlled by the operating system.
The abwe viewpoint is concerned with managing and protecling the
computer's main tnemory. A different. hui equally important memory-related
issue. is inanaging thc address space of the p170cesscs. Nounally, each process has
some set of addresses i t can use. typically running lirm~0 up to some maximum.
in the simplest case, the maximum amount uf addi-css space a process has is less
rhun the main memory. In this way, a prtlccss can t i l l up its address space and
there will be enough mom in main memory to hold it all.
However. on many computers addresses arc 3 2 or 64 bits. giving on ddress
space of 2" or bytes, I-espectively. What happens if a process has mtm
address space than the computer has nlain memolmy and thc process wants to use it
all? In the first computers, such a process was Just out of Iuvk. Now;rdays. a
technique celled virtual memory exists, in which the operating system keeps part
of the address space in main memory and part on dirk and s h u ~ l e spieces back
and forth between them as needed. This important operating system fuunction. a d
other memory management-related functions will he covercd in Chap. 4.
38 ¶NTRODUCTION CHAP. 1

All computers have physical devices for acquiring inpul and producing output. I

After all. what good would a computer be if the users could not tell it what to do
and could not get the results after i f did the work requested. Many kinds of input
and output devices exist, including keyboards, monitors, printers, and so on. It is
up to the operating system to manage these deviccs.
Consequently, every operating system has an I/O subsystem for managing its
l/O devices. Some of the U 0 software is device independent, that is, applies to
many or all devices equally well. Other parts of it, such as device drivers, are
specific to particular VO devices. I n Chap. 5 we will have a look at 1/0software.

1.5.5 Files

Another key concept suppotred by virtually all operating systems is the file
system. As noted before, a major function of the operating system is to hide rhc
peculiarities of the disks and other V 0 devices and present the pmgrammer with a
nice, clean abstract model of deviceindependent files. System calls are obviously
needed tr, create files, removc files, read files, and write files. Before a file can
be read, it must be located on the disk and opened, and after it has been read it
shwld be closed, so calls are provided to dn these things.
To provide a piace to keep files. most operating systems have h e cnncept of'a
directory as a way of grouping files together. A student, for example, might have
one directory for each course he is taking {fur the programs needed for that
course), mother directory for his electronic mail, and still another directory for his
World Wide Web home page, System calls are then needed lu create and remove
directories. Calls are also provided to put an existing file in a directory, and to
remove a file from a directory. Directory entries may be eihcr files or other
directories. This model also gives rise to a hierarchy-the file system-as shown
in Fig. 1-14.
The process and file hierarchies both are o r g a n i d as trees, but the similarity
stops there. Process hierarchies usually are not very deep (more than three levels
is unusual), whereas file hierarchies are commonly fuur, tlve. or even more levels
deep. Process hierarchies are typically short-lived. generally a few minutes at
most, whereas the directory hierarchy may ~ K I fors ~ years. Ownership and protec-
tion also differ for processes and Ales. Typically. only a parent process may con-
trol or even access a child process, but mechanisms nearly always exist to allow
files and directories to be read by a wider group than just the owner.
Every file within thc directory hierarchy can be specified by giving its path
name from the top of the directory hierarchy. the root directory. Such absolute
path names consist of the list of directories that must be traversed from the root
directory to get to the file, with slashes separating the components. In Fig. 1-14.
SEC. 1.5 OPERATING SYSTEM CONCEPTS

Root directory
u

F'lgum 1-14. A file system for a university department.

the path for file CS101 is /Facrlty/Pro$Br~wdCo1~rses/CS~O~. The leading slwh


indicates that the path is absolute, that is, starting at the root directory. As an
aside, in MS-DOS and Windows, the backslash (\) character is used as the separa-
tor instead of the slash (I character,
) so the file path given above would be written
as W a c u l t y V r ~ f : B w w n ~ C ~ u r ~ e ~ \ C S
Throughout
lOI. this book we will generally
use the UNIX convention for paths.
At every instant, each process has a current working directory. in which path
names not beginning with a slash are looked for. As an example. in Fig. 1-14, if
/Faculry\Prof.Bmwn were the worlung directory, then use of the path name
CourseJ/CS10i would yield the same Fhe as the absolute path name given above.
Processes can change their working directory by issuing a system call specifying
the new working directory.
Bdore a file can be read or written. it must he upened, at which time the per-
missions are checked. I f the access i s permitted. the system returns a small
integer called a file descriptor to use in subsequent operations. If the access is
prohibited, an error code is returned.
Another important concept in UNIX is the mounted file system. Nearly all
personal computers have one or more floppy disk drives into which floppy disks
can be inserted and removed. To provide an elegant way to deal with removable
media (iwluding CD-ROMs). UNIX alltrws thc lilc system un a floppy disk to hc
attached to the main tree. Consider the situaf on of Fig. I -15(;1). Ucfom llic
mount call, the root file system, nn the hard disk, and a scctmd fik systcm, c m a
Iloppy disk. are separaie iind unrelated.

However, the file svstern on ihe floppy cannot he used, bccause here is nu
way to specify path names un it. UNIX does not allow path names to be prcfixrd
by a drive name or numkr; that would be precisely the kind of device dependence
that operating systems ought to eliminate. Instead, the mount system call allows
the file system an the flnppy tn be attached to the r w t file system wherever the
prt>gram wants it ro be. In Fig. I-lS(h) the file system o n the floppy has been
mounted on directory b, thus allowing access tu files /.A and /b&l. If directory h
had contained m y files they would not be accessible while the floppy was
mounted, since /b would refer to the root directory of the floppy. (Nor bring able
to access these files is nut as serious as i t at first seems: filc systems are nearly
always mounted on empty directories.) If a system contains multiple hard disks.
they can all be muunted into a single tree as well,
Another important conccpt in UNlX is the speciat file. Spctai files are pm-
vided in order to make I/O devices took like files. That way. they can be read and
written using the same system calls as are used for reading and writing files. Two
kinds nf special files exist: block spefial Thes and chnrarter special files. Block
special files ;uc used to model devices that consist of a collecdon of randomly
addressable blocks, such as disks. By opening a block special file and reading.
say, block 4, a program can direct1y access the fourth block on the device. without
regard to [he stmcture of the file system contained on it. Similarly, character spe-
cial tiles are used to model printers. moderns, and other deviccs that accept or out-
put a character stream. By convention, the special files w e kept in the /dev direc-
tory. For example. /dewYp might be t he line printer.
The last feature we will discuss in this overview is one that relates to buth
processes and files: pipes. A pipe i s a sort of pseudofile that can be used ro con-
nect two processes, as shown in Fig. 1- 16. If pmcesses A and B wish to talk using
SEC. 1.5 OPERATING SYSTEM CONCEPTS 41

a pipe, tbey must set i~ up in advance. When process A wants to send data to
prwcss B, it writes on the pipe as though it were an output file. Process B can
read the data by reading from the pipe as though it were an input file. Thus, com-
munication between processes in UNIX looks very much like ordinary file reads
and writes. Stronger yet, the only way a process can discover that the output file
it is writing on is not really a file, but a pipe, i s by making a special system call.
File systems nm very important. We will have much more to say about them in
Chap. 6 and also in Chaps. 10 and 1 I .

Figure 1-16. Two prvcesscs con~ectedby a pipe.

1 5 6 Security
Computers contain large amounts of information that users often want to keep
confidential. This information may include electronic mail, business plans, tax
returns, and much more. It is up to the operating system to manap the system
security so that files, for example, are only accessible to auth~rizebusers.
As a simple example. just to ger an idea of how security c m work. consider
UNIX. Files in UNIX are protected by assigning each one a 9-bit binary protection
code. The protection code consists of three 3-bit fields, one for the owner, one: for
other members of the owner's group (users are divided into groups by the system
administrator), and one for everyme else- Each field has a bit for read access. a
bit for write access, and a bit for execute access. These 3 hits are known as the
FWX bits. Fnr example, the pmkction cude rwxr-x--x means that the owner can
read, write, or execute the file, other p u p members can read or execute (but not
write) the fiie, and everyone else can execute (but not r e d ar wrire) the file. For
a directory, x indicates search permission. A dash means that the corresponding
permission is absent.
In addition to file protection, *ere are many other security issues. Protecting
the system from unwanted inhders, both human and nonhuman (e-g.. viruses) is
one of them. We will took at various security issues in Chap. 9.

1.5.7 The Sheli

The operating system is the code that carries out the system calls. Editors.
compilers, assemblers. linkers, and command interpreters definitely are not p;ut of
the operacing system, even though they are important and useful. At the risk of
confusing things somewhat, in this section we will look briefly at the UNIX cam-
42 INTRODUCTION CHAP. 1

mand interpreter, called the shell. Although it is not part of the operating system.
it makes heavy use of many operating system features and thus serves as a good
example of how the system calTs can be used. It is also the primary interface
between a user sitting at his terminal and the operating system, unless the user is
using a graphical user interface. Many shells exist. including ~ h csh, , h h , and
bash. All of them support the functionality described below, which derives from
the original shell {sh).
When any user lops in, a shell is stand up. The shell has the terminal as
standard input and standard output. It starts out by typing the prompt. a character
such as a dollar sign, which tells the user that the shell is waiting co accept a corn-
mand. If the user now types

date
for example, the shell creates a child process and runs the dare program as the
child. While the child process is running, the shell waits for it to terminate,
When the child finishes. the shell types the prompc again and tries to read the next
Input line.
The user can specify that standard output be redirected to a file, for example,
date >file

Similarly, standard input can be redirected. as in

which invokes the sort program with input taken from ,file1 and output sent to
file2.
The output of one program can be used as the input for another program by
connecting them with a pipe. Thus
cat file1 file2 fit03 I sod M e v A p

invokes the cat program to concutenate three files and send the output to s a r i to
arrange all the lines in alphabetical order. The output of son is redirected to the
file /dev&, typically the printer.
If a user puts an ampersand after a command. the shell does not wait for it to
complete. Instead it just givcs a prompt immediately. Consequently,
cat file1 file2 file3 t sort ddevllp &

starts up the sort as a background job, allowiog the user to continue working oor-
mally while the sort is going on. The shell has a number uf other interesting
features, which we do not have space to discuss here. Mnst books on UNIX dis-
cuss the shell at some length (e.g., Kernighan and Pike, 1984; Kochan and Wood.
1990: Medinets. 1999; Newham and Rosenblatt, 1998: and Robbins, 1999).
SEC. 1.5 OPERATING SYSTEM CONCEmS

1.5.8 Recycling of Concepts

Computer science, like many fields. i s largely technology driven. The mason
the ancient Romans lacked cars is not that they liked walking so much. It is
because they did not know how to build cars. Personal computers exist no?
because millions of people had some hng pent-up desire to own a computer, but
because it is now possible to manufacture them cheaply. We often forget how
much technology affects our view of systems and it is worth reflecting on this
point fmm time to time.
In particular, it frequently happens hat a change in technology renders some
idea obsolete and it quickly vanishes. However, another change in technology
could revive it again. This is especially m e when the change has to do with the
relative p c r f o m c e of different parts of the system. For example, when CPUs
became much faster than memories, caches & m e irnpmnt to s p e d up the
W o w " memory. If new memory technology some, day makes memories much
f s t e r than Cf Us, caches will vanish. And if a new CPU technology makes them
faster than memories again, caches will reappear, In biology, extinction is for-
ever, but in computer science, it is sometimes only far a few years+
As a consequence of this impermanence, in this book we will from time to
time look at "obsolete" concepts, that is, ideas that are not optimal with c m n t
technology. However, changes in the technology may bring back some of the so-
called "obsolete concepts." Fw this reason, it is important to understand why a
concept is obsolete and what changes in the environment might bring it back
again.
To make this point clearer, let us consider a few examples. Early computers
had hsrdwind instruction sets. The instructions were executed directly by
hardware and could not be changed. Then came mjcroprogmming, in which an
underlying interpreter carried out the instructions in software+ Hardwired errecu-
tion became obsolete. Then RISC computers were invented, and microprogram-
ming (i.e., interpreted execution) became obsokte because direct execution was
faster. Now we are seeing the resurgence of interpretation in h e form of Java
applets that are sent over the Internet and interpreted upon arrival. Execution
speed is not always crucial because network delays are so great that they tend to
dominate. But th&r could change, too, some day.
Early operating systems allocated files on the disk by just placing them in
contiguous sectors, one after another. Although this scheme was easy to imple-
ment, it was not flexible because when a file grew, there was not enough room to
store it any more. Thus the concept of contiguously allocated files was discarded
as obsolete. Until CD-ROMs came around. There the problem of growing files
&d not exist. All of a sudden, the simplicity of contiguous file allocation was
seen as a great idea and CD-ROM file systems are now based an it.
As our final idea, consider dynamic Linking. The MULTICS system was
designed to mn day and night without ever stopping. To fix bugs in software, it
CHAP. I

was necessary to have n way to replace library procedures while thry wcrc k i n g
used. The conccpt of dynamic linking was invented f o r this purposc. After MI!L-
TICS died, the concept was forgotten for a while. However. it was rediscowed
when ~nodrrnoperating systems needed a way to allow many propratns 10 share
the same library procedures without having their own private copies (because
gl-aphics libraries had grown so large). Most systems now wppon some form of
Qnatnic linking once again. The lisr goes on. but these examples shmdd make
[he point: an idea that i s ohsnlete today may he the star of thc party tomorrow.
Technology is not the only factor that drives systems and software. Econom-
ics plays a big role too. In the 1960s and 1970s. most terminals were inschanical
printing terminals ur 25 x XU character-orien~edCKTs rather than hitmap graphics
termit~als. This choice was nut a questiun ut' technology. Bit-map graphics remi-
nals were in use before 1960. It is just h a t thcy cnst many tens o f thousands of
dollars each. Only when the prim came d r w n enorrnnusly c w l d people (rlther
than the military) think o f dcdicaling une Icrrrliniil ta nn individual use..r.

1.6 SYSTEM CALLS


The interface between the operaling systcnl ilml the user prngrrms is d e t i ~ ~ e d
by the set OF system calls that the operating system prnvides. To redly understlirld
what operating systems do. we must cxaminc this inrcrface ctosel y. The sysicm
calls available in the interface vary from operating system to operating system
(although the underlying concepts tend ro he similar).
We are thus forced to make a choice hctween ( 1 vague generalities ("operat-
ing systems have system calls t i t r reading files") and ( 2 ) some specific system
("UNLX has u read system call with threc parameters: one rcl spucify thu file, or~e
to tell where the data are to be put, and unr to tell how many bytes to rcad" 1.
We have chosen thc latter apprnach. It's more work that way. hut it gives
inore insight into w h a ~operating systems rcally do. Althuugh this discussim
specifically refers to POSIX (international Sla11dari-l Y-445-1). henw also to UNIX,
System. V. BSD. Linux. MINIX. etc.. most orher tnotlern operating systems have
syslcin calls that prfoimi the same t'imctions. even if the clerilils differ. S i w e the
actual mechanics of issuing a system call arc highly tilachine dependent and often
must be expressed ill nssemhly code. a procedure library is provided to tnakr it
possihlr to make system calls from C programs and otien from orhcr im~uiiges:is
we11.
It is useful to keep the following in mind. Any singlc-CPU computer can exc-
cule only one inslruction at a time. If a process i s running :I user program it) uscr
mode and needs 3 system service. such as readirlr! data t'rtm a file. i t ha^ 10 m e -
L

cute n trap or system cihl instruction to transfer colarcll to the oper;lling sysren~.
The operating system then figures out w h a ~the calling process wants by inspect-
ing the parameters. Then i t carries out thc system call and returns control 10 the
4 . 1 .A SYSTEM CALLS
45

insvuction following the system call. In a sense, m a h n g a system call i s like


making a special kind of procedure call, only system calls enter the kernel and
procedure calls do not.
To make the system call mechanism clearer. let us take a quick look at the
read system call. As mentioned above, it has three parameters: the first one speci-
fying the file. the second one pointing to the buffer. and the third one giving the
number crf bytes to read. Like nearly all system calls, it is invoked from C pro-
grams by calling a library prw-edure with the same name as the system call: rend.
A call from a C program might Imk like this:
count = read(fd, buffer, nbytes);
The system call {md the library procedure) return the number of bytes xtually
read in colmt. This value is normally the same as nbytes, but may be smaller, if, .
for example, end-of-fi le is encountered while readi ng.
If the system call cannot be carried out, either due to an invalid parameter or a
disk error, count is set €0- 1 , and the error number is pur in a global variable.
errnu, Programs should always check the results of a system call to see if an error
occurred.
System calls are performed in a series of steps. To make this concept clearer,
let us examine the read call discussed above. In preparation for calling the r v r d
library procedure, which actually makes the read system call, the calling program
first pushes the parameters onto the stack. as shown in steps 1-3 in Fig. 1-17. C
and C++ compilers push the parameters onto the slack in reverse order fur histori-
cal reasons (having to do with makmg the firs1 parameter to prin.g the fcrmat
string. appear on top o f the stack). The first and third parameters are called by
value, but rhe second parameter is passed by reference, meaning that the address
of the buffer (indicated by &) is passed, not the contents of the buffer. Then
comes the actual call to the library procedure (step 4). This instruction is the nor-
mal procedure call instruction used to call all procedures.
The library procedure, possibly written in assembly hnguage, typically puts
the system call number in a place where the operating system expects it, such as a
register (step 5). Then it executes a TRAP instructiun to switch from user i n d e to
kernel mode md start execution at a f i x e d address within the kernel (step 6). The
kernel code that starts examines the system cell number and then dispatches to the
correct system call handler, usually via a Vdbk of pointers tu system call handlers
indexed on system call number (step 7). At that point [he system call handler runs
(step 8). Once the system call handler has completed its work. control may be
returned to the user-space library procedure at the instruction following the TRAP
instruction (step 9). This procedure then returns torhe user program in the usual
way procedure calls return (siep 10).
To finish the job. the user program has to clean up the stack. as it does after
any procedure call (step 11). Assuming the stack grows downward, as it often
dws. the compiled code increments the stack pointer exactly enough to remove
INTRODUCTION CHAP. 1

Address
OxFFFFFFFF

Library

)
uwr space

1 Push nbytes f i user program


calling read

Karnel spat%
(Operating system]

Figure 1-17, The 1 1 steps in making the system call read[fd. buffer, nbytes).
the parameters pushed before the call to read. The program is now free to do
whatever it wants to do next.
In step 9 above, we said "may be returned 10 the user-space hbrary procedure
..." for good reason. The system call may block the caller, preventing it from
continuing. FWexample, if it is trying to read from the keyboard and nothing has
been typed yet. the caller has to be blocked. In this case. the ~peratingsystem
will look around to see if some other process can be run next. Later, when the
desired input is available, this process will get the attention of the system and
steps 9-1 1 will occur.
In the following sections, we will examine some of the most heavily used
POSIX system calls, or more specifically, the library procedures that make those
system calls. WSIX has about 100 procedure calls. Some of the most important
ones are listed in Fig. 1-18, grouped for convenience in four categories. In the
text we will briefly examine each call to see what it does. To a large extent, the
services offered by these calls determine most of what the operating system has to
do, since the resource management on personal computers is minimal (at least
compared to big machines with multiple users). The services include things like
creating and terminating processes, creating, deleting, reading, and writing files,
managing directories, and performing input and output.
SEC. 1.6 SYSTEM CALLS

-.
Proceas management . -.. -%--...-.. --
" I
7- DeserlptSon I
Call
..-
. -.. ......-- -
--..- -.- 1
: pid = fork( ] -._ _ _ . . _ . _ __._"._
Create- a child process identical
_-._ -
L
) pid = waitpid(pid,
_ &statbc, options)
. .,. - __. . _l,l_..
Wait
_ .-.. _ . for
.-___.__.
a child to terminate
_ ___--_---
I
I s = execve(name, argv. environp)
-.
-.--.-...-.-- -
Replace a process' core image
! erit(status) ----- . - .-- --... .A -
Terminate process execution .
..---..-

F11e mana$ernent
.
- -.-- ....., . ... -.----
----. - - . . . - -

4
A .%.

.... - . Call
. - .- ....
hacription -.-. A-

i fd = apen(file, how,*..) .- -.. ..


Open--. a file for r d i n g , writing or both ...--.A

'
s = cltxe(fd) Close an opsn file -.--.
n = md(fd, buffw, nbytes) .-from
Read data - --a
a file into buffer
...-
n =write(fd, buffer, nwes) -. .. . - . Writs-. .,--data from a buffer into a file -. ...

. .psition = Iseek[fd, offset, whence) Move the file --- -pointer - -.. -A - -....--
' s = stat(narne, &bun i Get a fite's status information .-.

-- .. .We system
Directory and -- management . ------. - ...- A

-.-- Cali Oercription - - -


s = mkdk(name, mode) - -- Create a new directory --.
a = rmdic(mme} Remove an empty directory
-1
s = link{name1, m e 2 ) Create a new entry, name2, pointing to
. --
name1
s = unllnk(name)
s = rnount(speclal,name, flag)
s = umount{spedal)
-
-.
Remove ,-.--
a
- . --
directory
Mount a file system
. Unmount a file system
.....,. ... -
entry
--.
. .- -.
:

'
.A .
i
- -- .- --.".

s = chdirtdirname)
s = chmod{nam%,mods]
Call

s = kill(pib, signal)
remnds = time(&semnds)
-., -
Mlscel~amua
----1'
:

,
'
..
% ----------..
I)ercriptim
Change the working diredory
Change a file's protection bits
Send a signal to a p r c e s s
Get the elapsed time since
-- . .. .& Jan. 1 ,
.-.-. -
1970
- ,. -
I+. 1
Figure 1-18, Some of the major POSIX system calls. The return code s is -1 if
an error has occurred. The Etum codes are as follvws: pid is a process id,]d is a
file descriptor, n is a byte count, positim is an offset within the file. and saiwnd.~
is the elapsed tima. The parmeten are explained in the text.

As an aside, it is worth pointing out that the mapping of POSIX procedure


calls onto system calls is not one-to-one. The POSIX standard specifies a number
of procsdutes that a conformant system must supply, bur it does not specify
CHAP. I

whether they are system calls. library caHs. or something else. [f a procedure can
be carried oul withou~invoking a system call (i+e., without trappirlg to the kernel).
i t will usually bc done in user space for reitsuns of performance. However, most
01' h e POSiX procedures do invoke system calls. usually with one procedure map-
ping directly onto one systern call. In a few cases. especially where several
required procedures arc only minor variations of one another, one system call han-
dlcs mare than one library call.

1.6.1 System Calls for frwess Management

The first group of calls in Fig. I - 18 deals with prtwss management. Fork is a
good place to start the discussion. Fork is the only way to create a new process in
UNIX. It creates an exact duplicate uf the original process, including all the file
descriptors, registers-+werything. After the fork. the original prucess and the
copy (the parent and chiid) go their separate ways. All the variables have identi-
cal values at. the time of the fork, but since the parent's data are copied to create
the child, subsequent changes in one of them do not affec~the other one. {The
program text, which is unchangeable, is shared between parent and child.) The
fork call returns a value, which is zero in the child and equal to the child's process
identifier or PID in the parent. Using the returned PID, the twc processes can see
which one i s the parent pmcess and which one is the child process.
I n most cases, after a fork, the child will need to execute different code from
the parent. Consider the case of the shell. Zt rends a command fmrn the terminal,
forks off a child process, waits for the child to excfote the command, and then
reads the next command when the child terminates. To wait for the child tu fin-
ish, the parent execuks a waitpid system call. which just waits until the child tcr-
minates(any child if more than one exists). Waitpid can wait for a specific child,
or for any old child by setting the first parameter to - I . When waitpid completes.
the address pointed to by the second parameter, s ~ i d will ~ , be set to the child's
exit status (normal or abnormal termination and exit value). Various options are
dso provided, specified by the third parameter.
Now consider how fork is used by the shell. When a cornmand is typed, the
shell forks off a new process. This child process must execute the user command.
It does this by using the execve system call. which causes its entire core image to
be replaced by the file named in its first parameter. (Actually. the system call
itself is exec, but several different library procedures call i t with different pararno-
ters and slightly different names. We will treat these as system calls here.) A
highly simplified shell illustrating the use of fork. waitpid. and execve is shown in
Fig. 1 - 19.
In the most general case, e x w e hiis three parameters: the name of the file to
be executed, a pointer to the argument array. and a pointer to the environment
m y . These will be described shortly. Various library routines. including execl,
pxecv. execle. and execve. are provided to allow the paramekrs to be omitkd or
SYSTEM E'A1.1-S

Mefine TRUE 1

whde (TRUE) { /* repeat forever */


Type-prompt( ): I* display prompt on the screen */
read-command(command, parameters): I* read input from terminal *I

if (fork() != 0){ /* fork off child process */


/* Parent code- */
waitpid(4, &status, 0); /* wait for child to exit */
I el= l
I* Child code. */
execve(commafid. parameters, 0 ) ; /* execute command */
1
1
Figure 1-19. A stripped-dawn shell. Throughout this bmk, TRUE i s assurl~cd
lo be defined as I .

specified in various ways. 'Thmughout this hook wc will use the namc exec tr,
represent the syskm call invoked by all of r;hese.
Let us consider the case of a command such as

used to copy file1 tofik2, After the shell has forked, the child process lacares nnd
executes the file cp and passes to it the names uf the source and target files.
The main program of cp (and main program of most uther C prngrams) c w -
tains the declaration
main(argc, argv, env p)
where argc i s a count o f the number of items on the command line, including the
program name. For the example above. amc is 3.
The second parameter, rtrgv, is a pointer to an arrity. Element iui' that array is
a pointer to the i-th siring on the command line. In our example. crrgv[O] w o ~ l d
point tu the string "cp", orgy[ I ] would point to the string "file 1 " and i r q v l 2 l
would point to the string "fik2".
The third parameter of main. envp, is a pointer to the environment. an array of
strings containing assignments of the form ncme = v d u e used to pass information
such as the terminal type and home directory name to a program. In Fig. I - 1 9.nu
environment is passed to the child. so the third parameter of cwc.ve is a zero.
If exec sterns complicated, do not despair; i t is (semantically) the most com-
plex uf all the POSIX system calls. All the other ones are much simpler. As an
example of a simple one, consider exit, which processes should use when they arc
finished executing. It has one parameter. the exit status (0 lo 255), which i s
returned to the parent via smrluc in the waitpid system call.
50 INTRODUCTION CHAP. I

Pn~ccssesi n UNlX have their memory divided up into three segments: the text
segment (i.c., h e program code), the data segment (i.c..the variahlesj, and the
stack segment. The data segment g r o w upward and the stack grows diwnwerd.
as shown in Fig. 1-20. Between [hem is a gap of unused address space. The stack
grnws into rhe gap automatically, as needed. but expansion of the data segment is
done explicitly by using a system call. brk, which specifies the new address where
the data segment is to end. This call. however. is not defined by the POSIX stan-
dard, since programmers are encouraged to use the rnnlbr library procedure tor
dynamically allocating slorage, and the underlying implernenlarion of mdhr was
not thought to be a suitable subject for standardization sincc few programmers w e

Address {hex)
FFFF
a,kl

Fiwre 1-20. Prr~cc,c;sesh a w three segrritrnts: text. data, nnri stack.

1.6.2 System Calls for File Management

Many system calls relate to $he file system. 'In this section we wii I look at
calls that operate on individual flles: in rhe next one we will examine those that
involve directories or the We system as a whole.
To read or write a file, the file must first be opened using open. This call
specifies the file name tcr be opened. either as an absolute path name or relative to
the working directory, and a code of 0-XDONLY, 0-. WRONLY. or 0 - R D W R .
meaning open for reading, writing, or both. To create a new file. 0-.(?REAT is
used. The file descriptor returned can then bc used for reading or writing. After-
ward, the file can be closed by close, which makrs the file descriptor available for
reuse on a subsequent open.
The most heavily used calls are undoubtedly read and write. We saw read
earlier. Write has the same parameters.
Although most prngtams read and writc files sequentjally, for some applica-
tions programs need to be able to access any pan of a file at random. Associated
with each file is a pointer that indicates the current position in the file. When
reading (writing) sequentially. it normally poinis to the next hytu to be read [wrir-
ten). The lseek call changes the value of the position pointer, so that subsequent
calls to read or write can hegin anywhere in the file.
SYSTEM CALLS

Lseek has three parameters: the first is the file descriptor for the file, the
second is a file position, and the third tells whether the file position is relative to
the beginning of the file, the current position, or the end of the file. The value
returned by iseek is the absolute posiiion in the file after changing the pointer.
For each file, UNIX keeps track of the file mode (regular file, special file,
directory. and so on). size, time of last modification, and other information. Pro-
grams can ask to see this information via rhe stat system call. The first parameter
specifies the file to be inspected the second one is a pointer to a sttucture where
the information is to be put.

1.6.3 System Calls for Directory Management


In this section we will look at some system cdk that relate more to directories
or the file system as a whole. rather than just to one specific file as in the previous
section. The first two calls, mkdir and rmdir, create and remove empty directories,
respectively. The next call is link. Its purpose is to allow the same f i k to appear
under two or more names, often in different directories. A typical use is to allow
several members of the same programming team to share a common file, with
each of them having h e file appear in his own directory, possibly under different
names. Sharing a file is not the same as giving every team member a private
copy, because having a shared file means that changes that any member of the
team makes are instantly visible to the orher members-there is only one file.
When copies are made of a file, subsequenr changes made to one copy do not
affect the other ones.
To see haw link works, consider the situation of Fig. 1-21(a). Here are two
users, as! and jim, each having their own directories with some files. If as( now
executes a program containing the system call

the file memo in jirn's directory is now entered into ust's directory under the name
note, Thereafter, ~ u s r / j ~ ~ m e
and
r n/usr\usdnutu
~ refer to the same file. As an
aside. whether user directories are kept in / ~ s r ,/user, h o m e , or somewhere else is
simply a decision made by the local system administrator.
Understanding how link works will probably make it clearer what it does.
Every file in UNlX has a unique number, its i-number, that identifies ir. This i-
number is an index into a table of i-nodes, one per file. telling who owns the file.
where its disk blocks are, and so on. A directory is simply a file containing a set
of (i-number, ASCII name) pairs. In the first versions o f UNIX, tach directory
entry was 16 bytes--2 bytes for the i-number and 14 bytes for the name. Now a
more complicated structure is needed to support long file names, but conceptually
a directory is still a set of &number. ASCll name) pairs. In Fig. 1-21, n r d has i-
number 16. and so on. What link does is simply create a new directory entry with
a (possibly new) name. using the i-number of an existing file. In Fig. 1 -21(b), two
entries have the same i-number ( 7 0 ) and thus refer 50 thc same file. If either m e
is lnrer r c m o v d , using the unlink system call. the ather one remains. If both are
removed, UNlX sees that no entries to the file exist (a field In the i-nudc keeps
track of the number of direcrory entries pointing to the tile), so the file is removed
from the disk.
As we have mentioned earlier, the mount systern call allows two file systcrns
to be merged into one. A common situation is to have the rrwt file system con-
taining the binary {executable) versions of the curnnlon commands and other
heavily used files. on a hard disk. The user can then inscn a floppy disk with filcs
to be read into the floppy disk drive.
By executing the mount system call. the floppy disk file system can he
attached to the root file system. as shown in Fig. 1-22. A typical statement in C t.2
perform the mount is

where the first parameier is the name of a block special fi te for drive 0. the scconc t
parameter is the place in ihe tree where il is lo he mounted, and thc third paramc-
ter tells whether the file system i s to be mounted read-write or read-only.

Figure 1-22. (a) F i l e system before the mumr, rb) F i l ~s~stcmo f m the rnaunr.

After the mount call. a file on drive O can be accrsscd hy just using its p:lth
from the root directory or the working direcrow, withoui regard tu which drive i t
is on+ In fact. second, third. and fourth drives can also he mounted mywhere in
the tree. The mount call makes it possible to integrate removable media into ii
single integrated. file hierarchy. without having lo worry a b u t which device a file
i s on. Although this evample involves floppy disks. hard disks or portions of hard
disks (often called partitions or minor devicw) can also 'be rnounted this way.
When a file system i s 110 longer needed, ii can be unmounted with the umount s p -
ten1 call.

1-6.4 Miscellaneous System Cslls

A variety of other systcm calls exist as wcll. We will look at just four c~fthem
here. The chdir call changes the current wnrking direc~ov.After the call

an open on the file xy; will open /usArsrAasr!xy, The concept rlf a working
direclory eliminates the need for typing (long\ ahsulirte path names all the time.
In UNlX every file has a m d e used for protecttun. The mu& includes the
read-wrire-execute bits for the owner, group, and others. The chmod system call
makes it possible in change the inode of n file. F w example, to make a file read-
oniy by everyone except the owner. one crmld executc

The kill system call is the way users and user prwesses send signals. Jf a
process is prepared tcl catch a particular signal, then when it arrives, a signal
handler is run+ I f the process is not prepared to handle rr signal. then its arrjval
kills the process (hence the name of rhe call j.
POSIX defines swvcral procedures for dealing with time. For exnmplc. time
just rerurns the current t h e in seconds. with O corresponding to Jan. 1 . 1070 at
midnight Cjust as the day was startinp, not endingi. On computers with ??-bit
words. the maximum value time can return i s 2" - f seconds (assuming an
unsigned integer i s used). This value corresponds to a litllc over 136 years. Thus
in the year 2106. 32-hi1 U N l X systems will go berserk. imitating the famous YIK
problem. If you currunrly have a 32-hit C N I X syslem. you are advised to trade it
in for a 64-bit one sometime before the year 2 106.

1.6.5 The Windows Win32 API

So f a r we have focused primarily o n LrXlX. Now it is time ti) look hriefly at


Windows. Windows and UNIX differ in a fundamcntnl way in their respecrive
programming mdels. A UNlX pwgranl consists o f code that does su~nethingor
other. making system calls to h a w certiiin services pcrfonnrd. In contrasr. a Win-
dows program is normally event driven. Thc main pnjprarn waits for some eve111
to happrn. then calls a procedure lu handle i t . Typical events arc keys being
struck, the mouse bcing moved, a mouse button being pushed, or a floppy disk
ittsertrd. Handlers are then called to process the event. update thc screen and
'
54 INTRODUCTION CHAP. 1

the internal program stale. AH in all, this leads to a somewhat different


style of programming than with UNLX, but since the focus of this book is on
uperat ing system function and structure, these different programming models will
not cnncern us much more+
Of course, Windows also has system calls. With UNIX, there is almost a 1-
to- l relationship between the system calls (e.g., read) and the library procedures
(cg., read) used $0invoke the sys!em calls. In other words, for each system call,
there is roughly one library procedure that is called to invoke it, as indicated in
F i g . 1 - 17 . Furthermore, POSIX has on1y about 1 OO procedure calls.
With Windows, the situation i s radically different. To start with, the library
calls and the actual system calls are highly decoupled. Microsoft has defined a set
of procedures, called the Win32 AM (Application Frogram Interface) that pro-
grammers are expected to use to get operating system services. This interface is
(partially) supprted an all versions of Windows since Windows 95. By dtcou-
pling the interface from the actual system calls, Microsoft retains the ability to
change the actual system calls in time (even from release to release) without
invalidating existing pugrams. W h a ~actually constitutes Win32 is also slightly
ambiguous since Windows 2000 has many new calls that were not previous!^
available. In this section, Win32 means the interface supported by all versions o f
Windows.
The number of Win32 A H calls is extremely large, numbering in rhe
thousands. Furthermore, while many of them do invoke system calls, a substnn-
tial number are canied out entirely in user space. As a consequence, with Win-
dows it is impossible to see what i s a system call (i.e., performed by the kernel)
and what is simply a user-space library call. In fact, what is a system call in one
version of Windows may be done in user space in a different version, and vice
versa. When we discuss the Windows systcm cails in this book, w e will use the
Win32 procedures (where appropriate) since Mjcrosoft guarantees that these will
be stable over time. But it is worth remernberi ng that not all of them are true sys-
tem calls (i-e., traps to thc kernel).
Another complication i s that in UNIX, the GUI (e.g.,X Windows and Motifl
runs entirely in user space, so the only system calls needed for writing on the
screen are write and a few other minor ones. Of course, here arc a large number
of calls to X Windows and the GUI, but these are not system calls in any sense.
ln contrast, the Win32 APL has a hugc number of calls for managing win-
dows, geometric figures. text. fonts, scrollbars, dialog boxes, menus. and other
features of the GUI. To the extent that the graph phcis subsystem runs in the kernel
(true on some versjons of Windows but not on all), these are system calls; othe17-
wise they art: just library calls, Should we discuss these calls in this book or not?
Since they are not really related to the function of an operating system, we have
decided not to, even t h ~ u g hthey may be carried out by ibe kernel. Readers
interested in the Win32 API should consult one of the many books on the subjecr,
for example (Hart, 1997; Rector and Newcomer. 1997; and Simon, 1997).
. -. -- - -. .- - .-. .
, .... .. . - . .- --
I UNrX ! Win32 . Description .---. . - 4I
CreateProcess - .-... -. . .
. - .-A
,
.1 eiecve
I
. .-.- ... . ...@one)

exit
--

i .- .-pen. .
, ,
--. . - --
- I ExltProcess
.-
I CreateFile
j CreateProcess
+.-

.
.

. _
= fork
..

, Terminate execution

-1...Create a file or open


+ exscve
___-
an existing file
.
..

-.
.

.. - -
.

i
1
. , . -.. . . .-. -- - -

MoseHandle Close
! close
1
&.--

read
I: - .... .
ReadFile
. - L . -- --
a file--

Read data from a -


- -

file
... . . -.

- - . i ---. -. --- ---- . . -.


i write ; WriteFile Write data to a file
+- -- .. - -- .. .. ... . - - .- . -
> - -. . - -- ..-- .- .. - -- .,
Move the file pointer . .. .

.-
Create a-- new
._
directory
-- . ._ .. _ . _ __
.- . . . .

(none) -. *> . .-. . .. .


Win32 does. -.not
-
support
-?-- .
links .-

, -. -- 4 I
.- - -. . - .. . . ..- -. - -- .-

- . .--- - . - . does not -


Win32
--- . suppofl
. mount .-
. -I
.I -
chclir +SetCurrentDirectory
-- .- .% I -- current
Change -the -. working
. directory
. . -
.. --
-+ ..
&
Win32 does not
Win32 does not support signals
-1
I time - 1 GetLocalTirne Get the current time I

Let us now brietly go through the list { r f Fig. 1-23. CreateProcess creates a
new prucess. It does the combincd work of fork and execve in CINlX. It has many
parameters specifying the prtyxrties of the newly created process. Windows does
not have a process hierarchy as UNIX does so there is no concept o f a parent proc-
ess and n chiid process. After a prwess is crealsd, the ureatclr and crcatee are
equals. WakForSingleObject is used to wait for an rvenr. Many possible evznts
can be waited for. If the parameter specifics a process. then the caller waits for
rhe specified process to e x i t , which is done using ExitProcess.
The next six calls operate irn files and arc func~ionall?;similar lo their U N I X
cnunterparts although they differ in the parainetm and details. Sti!l, files can bc
opened, closed, read, and written pretty inuch as in UNIX. The SetFilePointer and
GetFileAttributesEx calls set the file position and get some of the file attributes.
WjnJows has directories and [hrv are crealed with CreateDirectory and
RemoveDirecPoty, respectively. There is also a notion nf a cw-rent directory. Set
SetCurrentOirectory. The current time is acquired using GetLocalTirne.
The Win32 inierke does not have links rn files, rrlr~untcdfilc syskms, secu-
riry, or signals. st> the calls corresponding to the UNlX ones d o not ex is^. Of
course. Win32 has a huge number of other calls that UNIX does not have. espe-
cially f w inanaging the GUI. And Windows 2000 has an elaborate security sys-
rem and also supports file links.
Onc bsr note about Win32 is perhaps worth making. Win32 is not a terribly
unifnnn ar consistent interfact. The main culprit here was the need to k hack-
ward compatible with the previous 16-bit intertice used in Windows 3 . x .

2.7 OPERATING SYSTEM STRUCTURE


Ncnv that we have seen what operating systems look like on the outside (i.e..
the programmer's interface), it i s time to take n look inside, I n the following scc-
~ions.we will examine five different structures that have k e n tried. in order to get
some idea of the spectrum of pnssibilities. These are by no means exhnustius. but
they give an idea of some designs that have been tried in practice. The five
designs are mand ithic systems, layered systems. virtual mavhines. e xokernels.
and client-server systems.

1.7.1 Monolithic Systerns

By far the most common organization. this approach mirht


... well be subtitled
"The Big Mess." The structure is that there is no structure. The operating system
is written as a collection of procedures, each of which can call any of the orher
ones whenever it needs to. When this cechniquc is used. each procedure in ihe
system has a well-defined interface in terms of parnmcters and results. and each
one is free to call any other one. if the latrcr prorJides some uscful computaiion
that the former needs.
To constmct the actual object program of the operating system when [his
approach i s used, one first compiles all the individual procedures, or files contain-
ing the procedures, and then binds them all togeiher into a single object file using
the system linker. In terms of information hiding. there is essentially none-evely
procrdure is visible to every orher procedure (as opposed to a structure containing
modules or packages. in which much of the information is hidden away insidc
modules. and only the officially designated entry points can be called from out-
side the mwluk).
Even in munolithic systems. however, it is possible to have st least a littlc
structure. The services (system calls) provided by the operating system am
requested by putring the parameters in a well-defined place (e.g., on the stack) and
SEC. 2.7 OYERATlKG SYSTEM STRLiCTUKE 57

tben executing a trap instruction. This instrucrion switches the machine f i r ~ r nuser
mode to kernel mode and transfers control ro the operating system, shown as step
6 in Fig. 1- 17. The operating system then fetches the parameters and determines
which system call is lo he carried out. After that, i t indexes into a table that con-
tains in slot t a pointer to the procedure that carries out system call k (step 7 in
Fig. 1-17].
This organization suggests a basic structure for the operating system:
1 . A main program that invokes the requested service procedure.
2. A set nf service procedures that carry out the system calfs.
3. A set of uti t i ty prrxxdures that hclp the service procedures,
In this model, for tach system call there i s one service procedure that takes care
of it. The utility procedures do things that are needed by several service pro-
cedures. such as fetching data from user programs. This division of the pro-
cedures into t h e layers is shown in Fig. 1-24,

n Main
pmwdure

Sewice
procedures
-- - -- --

Vtility
procedures

Fmre 1-24. A simple srructuring model fnr a monolithic system.

1.7.2 Layered Systems

A generalization of the approach of Fig. 1-24 is to organize the operating sys-


tem as a hierarchy of layers, each one constructed upun the one below it. The first
system constructed in [his way was the THE system built at the Technische
Hogeschool Eindhoven in the Netherlands by E. W. Dijkstra (1968) and his sru-
dents. The THE system was a simple batch system for a Dutch computer, the
Electrolqica X8,which had 32K of 27-bit words (bits were expensive back then).
The system had 6 layers, as shown in Fig. 1 -25. Layer 0 dealt with allocation
of the processor. switching between processes when interrupts occurred or timers
expired. Above layer 0. the system consisted of sequential processes, each of
CHAP. I
58 INTRODUCTION

which ccmld he pmgrarnn~edwithour having to worry aboul the fact that multiple
processes were running on a single processor. I n other words. layer O provided
the basic multiprogramming of the CPU.

Figure 1-25. S t r u c t u ~of the THE npewting syslein


Layer I did the memory management. It allocated space for prucesses i n
main memory and on a 5 12K word drum used for holding: parts of pmctssss
(pages) for which there was no mom in main memory. Above layer 1 . processes
did not have to worry about whether they were in memory ur Gn the drum: the
layer I software took care of making sure pages were brmght intu m e m o q when-
ever they were needed,
h y e r 2 handled cammunicatiurt between each process and the operator con-
sole. Above this layer each process effectively had its own operator console.
Layer 3 t m k care of managing the 1/0 devices and buffering the information
s t ~ a r n sto and from them. Above layer 3 each process could deal with abstrac~
LM3 devices with nice pruperties, instead of real devices with many peculiarities.
Layer 4 was where tbc user programs were found. They did not have to worry
about process, memory, console. or I/O management. The system operator proc-
ess was located in layer 5 ,
A further generalization of the layering concept was present in the MULTICS
system. Instead of layers, MULTICS was described as havtng a series of concen-
tric rings, with the inner ones being more privileged than the outer ones (which is
effectively the same thing). When a procedure in an outer ring wanted to call a
procedure in an inner ring, it had to make the equivalent of a .system call. that is. u
TRAP i n s ~ n ttcion whose parameters were carefully checked for validity before
allowing the call lo proceed. Although the entire operating system was part of the
address space of each user process in MULTICS, the hardware made i t possible to
designate individual procedures (memory segments. actually) as protected against
reading, writing. or executing.
Whereas the THE layering scheme was really only a design aid, because all
the parts of the system were ultimately linked together into a single object pro-
gram, in MULTICS, the ring mechanism was very much present at run time and
enforced by the hardware. The advantage o f the ring mechanism i s that it can
easily be extended to structure user subsystems. For example, a professor could
SEC. I .7 OPERATING SYSTEM STR CTURE 59

write a program to test and grade student programs and run this program in ring n,
with the student programs running in ring n + 1 so that they could not change their
grades.

1.7.3 Virtual Machines


The initial releases of ow360 were strictly batch systems. Nevertheless, many
360 users wanted to have timesharing, so various groups, both inside and outside
IBM decided to write timesharing systems for it. The official IBM timesharing
system, TSS1360, was delivered late, and when it finally arrived i r was so big and
slow that few sites converted to it. It was eventually abandoned after its develop-
ment had consumed some $50 million (Grrtharn, 19701, But a group at IBM's
Scientific Center in Cambridge, Massachusetts. produced a d i c a l l y different sys-
tem that IB M eventually accepted as a probuct, and which is now widely used on
its remaining mainframes.
This system, originally called CPKMS and later renamed VMi370 (Seawrigh t
and MacKinnon. 1979), was based on an astute observation: a timesharing system
provides ( 1 ) mu~tiprograrnmingand (2) an extended machine with a more con-
venient interface than the bare hardware. The essence of VM(370 is to completely
separate these two functions.
The heart of the system. known as the virtual machine monitor. runs on the
bare hardware and does the multiprogramming, providing not one, but several vir-
tual machines to the next layer up, as shown in Fig. 1-26. However, unlike: dl
other operating systems, these virtual machines are nut extended machines, with
files and other nice features. Instead, they are e x ~ c rcopies of the bare hardware,
including kerneVuser mode, I/O,interrupts, and everything else the real machine
has.

instructions here
Trap here

I 370 Bare hardware 1


Figure 1-26, The structure of V W 7 0 with CMs.
Because each virtual machine is identical to the true hardware, each one can
nm any operating system that will run directly on the bare hardware. Different
virtual machines can, and frequently do, run different operating systems. Some
run one of the descendants of OS1360 for batch or transaction processing, while
other ones run a sing le-user, interactive system called CMS (Conversstional
Monitor System] for interactive timesharing users.
When a CMS program executes a systcm call. thc call is lrappcd to thc operar-
ing system in its own virtual machine, not ro VM1370. just as it wuuld if it were
running on a real machine instead of a virtual one. CMS then issues the normal
hardware VO instructiwns for reading i t s virtual disk or whatever is needed lo
carry out the call. These 1/0 instructions are trapped hy VM/370. which then per-
forms them as part of its siinulation o f the real hardware. By completely separat-
ing the functions of multiprogramming and providing an extended machine. each
of the pieces can be much simpler. more flexible. and easier t(1 maintain.
The idea of a virtual machine is heavily used nowadays in a different context:
running old MS-DOS programs on a Penrium (or ocher 32-bit Intel CPU). When
designing the Pentium and its software. both Intel and Microsoft realized that
there would be a big demand for running old software o n new hardware. For this
reason, lntcl provided a virtual 8086 mode on the Pentiurn. In this mode, the
machine acts like an 8086 (which is identical to an 8088 from a software point of
view), including 16-bit addressing with a 1 -MI3 limit.
This mode is used by Windows and other operating systems for running MS-
DOS programs. These programs are started up in virtual 8086 mode. As long as
they execute normal instructions. h e y run on the bare hardware. However, when
a program tries to trap to the aperating system to make a sysrcm calt, or tries to do
protected 110 directly, a trap to the virtual machinc monitor wxurs.
T w u variants on this design are possible. In the first one, MS-DOS itself is
loaded into the virtual 8086's address spacu, so the virtual machine monitor just
reflects the trap back to MS-DOS, just as would happen on a real 8086. When
MS-DOS later tries to do the 1/0 itself. that uperation is caught and carried out by
the virtual machine monitor,
In ~ h other
c variant, the virtual machine monitor just cat.ches rhc first trnp and
dues the 110 itself, since it knows what all the MS-DOS system calls a r r and thus
knows what each trap i s supposcd to do. This v m i a n ~is less pure that] thc first
om, since i t only cmulatcs MS-DOS correctly. and not other operaling systcms, as
thc first one does. On the other hand, it is much fdster. since it saves thc trouble
of starting up MS-DOS €0do the I D . A further disadvantage of acriially nlnning
MS-DOS in virtual 8086 mode is that M S - W S fiddles art~undwith thc interrupt
enablddisable bit quite a lot. all of which must be einulatcd at considel-ahle cnst.
1t i s worth noting that neither of these approaches are redly thc same as
VM1370. since the machine being emi~latedi s nor a full Pentium. bur only an 8086.
With the VM1370 system, it i s possible to run VM1370, itself. in rhc virtual
machinc. With the Pentiurn, ir is not possible to run, say. Windows in the virtual
8086 because no version of Windows runs on an 8086: a 286 is rhe minimum for
even the oldest versiun, and 286 emulation is not provided (let alone Pcntiurn
emulation). However. by modifying the Windows binary slightly. this emulation
is possible and even available in commercial products.
Anuth~rarea where virtual machines arc used, b u ~in a sclmewhst different
way, is for running Java programs. When Sun Microsystems invented the Java
SEC. 1.7 OPERATING SYSTEM STRUCTURE 61

prqprnming ianguagc. it also invented a virtual machine (i.e..a cornpurer archi-


tccture) called the JVM (Java Virtual Machine). The Java compiler produces
code for IVM. which then typically is executed by a software JVM inrerprcter.
The advantage of this approach is that the J V M code can be shipped over the
Internet to any computer that has u IVM interpreter and run there. If the compiler
had produced SPARC ur Pentium binary programs, for example. they could not
have been shipped iind run anywhere as easily. (Of course, Sun could have pro-
duced a compiler that produced SPARC binaries and the11 distributed a SPARC
interpreter. hut JVM is a much simpler architecture to inrerpret.) Another advan-
tage of using JVM is that if the interpreter is implemented properly. which is not
completely trivial, incoming JVM programs can be checked for safety and then
executed in a protected environment so they cannot steal data or do any damage.

With VMfl70, each user process gets an exact copy of the actual computer.
With virtual 8086 mode on the Pentiurn. each user process gets an exact copy d a
different computer. Going one step further, researchers at M.l,T. have built a sys-
tem that gives each user a clone of the actual computer. but with a subset of the
resources (Engler et al., 1995). Thus one virtual machine might get disk b k k s 0
tu 1023, the next one might get blocks 1024 to 2047, and so on.
At the bottom layer. running in kernel tnude. is a program called the exoker-
nel. Its job is to allocate resources to vifiual n ~ x h i n e sand then check attempts to
use them to make s u e no machine is trying to use somchcdy eke's resources.
Each user-level virtual machine can run its own operaling system. as on VM/370
and rhe Pentium virtual 8086s,except that each one is restricted to using only the
resources It has asked fr and been allwated.
The advantage of the exokernel scheme is that it saves :r layer of rnapp~ng.In
the other designs. each virtual machine thinks it has its own disk, with blocks run-
ning fmm 0 tr, some rnaxirnum, so thc virtual machine n~oniturmust n~ainzilin
tables to remap disk addresses (and all other resnurces). With the exokernel, this
remapping is not needed. The cxokernel need only keep track of which virtual
machine has been assigned which resource. This method still has h e advanrage
of separating the multiprogramming (in the exokernel) from the user operaling
system cade I'in user space). but with less overhead. since all rhc exokernel has to
do is keep the virtual machines out of each other's hair.

1.7.5 Client-Server Madel

VM1370 gains much in sitnplicity by moving a large part of #he traditional


operating system code (implementing the extended machine) into a higher layer.
CMS. Nevertheless, VM1370 itself i s still a complex program because simulating a
CHAP. I
62 INTRODUCTION

number of virtual 370s in their entirety is not that simple (especially if you want
to do it reasonably efficiently).
A trend in modem operating systems is to take the idea of moving code up
into higher layers even further and remove as much as possible from kernel mode.
leaving a minimal mierokernel. The usual approach i s to implement most of the
operating system in user processes. To request a service, such as reading a block
o f a file, a user process (now known as the client process) sends the request to a
server process, which then does the work and sends back the answer.
Cliant
process
Client
prams
. Pmess
server
Terminal
wrvet
.
, +
File
wnmr
1
1
user mode

Microkernel
, } Kernel mode

Client obtains
service by
sending messages
to server processes

Figure 1-27, The client-server model.


In this t n d e l , shown in Fig. 1-27, all the kernel does is handle the communi-
cation between clients and servers. By splitting the operating system up into
parts. each of which only handles one facet of the system, such as file service.
process service. terminal service, or memory service, each part becomes small
and manageable. Furthermore, because a11 the servers run as user-mode proc-
esses, and not in kernel mode, they do not have direct access to the hardware. As
a consequence, if a bug in the file server is triggered, the file service may crash,
but this will not usually bring the whole machine dawn.
Another advantage of the client-server model is its adaptability to use in dis-
tributed systems (see Fig. 1-28). If a client communicates with a server by send-
ing it messages. h e client need not know whether the message i s handled loeall y
in its own machine, or whether it was sent across a network to a server on a
remote machine. As far as the client i s concerned. the same thing happens in both
cases: a request was sent and a reply came back.
The picture painted above of a kernel that handles only the transport of mes-
sages from clients to servers and back is not completely realistic. Some operating
system functions (such as loading commands into the physical UO device regis-
ters) arc difficult, if not impossible, to do from user-space programs. There are
two ways of dealing with this problem. One way is to have some critical server
processes (e.g., I/O device drivers) actual1y run in kernel mode, with complete
access to all the hardware, but still communicate with other processes using the
normal message mechanism.
The other way is to build a minimal amount of mechanism into the kernel but
leave the policy decisions up to servers in user space (Levin et a]., 1975). For
SEC. 1.7 OPERATING SYSTEM STRUCTURE

Machine 1 Machina 2 Machine 3 Machine 4


Ciiant 7'
I
4 Fib server i Pmessawar , Terminal server I
4

1 b +
- + * Kernel Kernel Kernel Kernel

NetwQrk
Message from
client to sewer

Figurn 14.8. The client-server model in a distributed system.


example, the kernel might recognize that e message sent to a certain special
address means to take the contents of that message and load it into the VO device
registers for some disk. to stan a disk read. I n this example, rhe kernel would not
even inspect the bytes in the message to see if they were valid or meaningful; it
would just blindly copy them into the disk's device registers. (Obviously, some
scheme for limiting such messages to authorized processes only must be used.)
The split between mechanism and policy is an imprfant concept; it occurs again
and again in operating systems in various contexts.

1-8 RESEARCH O N OPERATING SYSTEMS


Computer science is a rapidy advancing field and it is hard to predict where it
is going. Re~archmsat universities and industrial research labs are constantiy
thinking up new ideas, some of which go nowhere but some of which become the
camexstone of future products and have massive impact on the industry and users.
Telling which is which turns out to be easier to do in hindsight than in red time.
Separating the wheat fmm the chaff is especially difficult because it offen takes
20-30 years from idea to impact.
For example, when President Eisenhower set up the Dept. of Defense's
Advanced Research Projects Agency (ARPA) in 1958, he was trying to keep the
A m y from killing the Navy and the Air Force nver the Pentagon's research bud-
get. He wz~5not trying to invent the Internet. But one of the things ARPA did
was fund some university research on the then-obscure concept of packer switch-
ing, which quickly led to the first experimental packet-s witched network. the
ARPANET. It went live in 1969. Before long, other ARPA-funded reseamh net-
works were connected to the ARPANET, and the Internet was born. The Internet
was then happily used by academic researchers for scnding email to each other for :
20 years. In the early l990s, Tim Berners-Lee invented the Wurld Wide Web at
the CERN research lab in Geneva and Marc Andreesen wrate a graphical browser
for it at the University of Illinois. All of a sudden the internet was full of chatting
teenagers. President Eisenhower is probably rolling over in his grave.
Research in operating systems has also led to dramatic changes in practicill
systems. As we discussed earlier. the first commercial computer systems were all
batch systems. until M.I.T. invented interactive timesharing in the early 1960s.
Computers were all text-based until Doug Engelban invented the mouse and thc
graphical user interface at Stanford Research Institute in the late 1960s. Who
- -

knows what will come next?


In this section and in comparable sections throughout tbe book, w e will take a
briel' lovk at some of the ntsearch in operating systems that has taken place during
the past 5 ro 10 years, just ro give a flavor of what might be on the horizon. This
introduction i s certainly not comprehensive and is based largely on papers that
have been published in the top research journals and conferences because these
ideas have at teast survived a rigorous peer review process in order to get pub-
lished. Most o f the papers cited in the resear~hsections were published by either
ACM, the lEEE Computer Society, c3r USENIX and are available over &heInter-
net ro (student) members of these organizations. For more infomation about
these organizations and their digital libraries. see
ACM http:/ww .acm.org
IEEE Computer Society http:/www.computer.org
USENIX http:/~.usenix.org

Virtually all operating systems researchers realize that current nperating sys-
tems are massive, inflexible, unreliable. insecure, and loaded with bugs, certain
ones more than others (names widdwlb hem tn prutect rke guihy). Cnnsequcntly,
there is o lot o f research om how to build flexible and dependable sysiems. Much
of the research concerns microkernel systems. These systems have a minimal
kernel, so there is a reawnable chance they can be made reliable and he
debugged. They are also flexible because much of the real operating system runs
as user-mode processes, and can thus be replaced or adapted easily, possibly even
during execution. Typically, dl the microkernel does is handle low-1eve1 resource
management and message passing between the user processes.
The first generation rnicrokernels, such as Amoeba (Tanenbaum et al.. 1990),
Chorus (Rozier et a]., 1988). Mach (Accelta et a].. IY86), and V (Cheriton. 1988).
proved that these systems could be built and made to wnrk. The second genera-
tion i s trying to prove that they can not only work, but with high performance as
well (Ford et al., 1996; Hartig et al., 1997; Liedtkc 1995, 1996; Rawson 1997; and
Zuberi et al., 1999). Based on published measurements, it appears that this goal
has been achieved.
Much kernel research is focused nowadays on building extensible operating
systems. These are typically microkernel systems with the ability to extend or
customize them in some direction. Some examples are Fluke (Ford et al.. 1997).
Paramecium (Van Dmrn et a\., 1993, SPIN (Bershad et al.. 1995b). and Vino
(Seltzer et a]+,1996). Some researchen are also looking at how to extend existing
S K . 1-8 RkSEAKCH ON OPERATIKC; SYSTEMS 65

sysrcmx (Ghormley et 31,. 1 W X ) . Many of chese systems illlow users to add their
c , \ N ~ccl& in the kernel. which brings up the obvious pmblrtrn of how to d h w User
extensjons in a secure way. Techniques include interpreting the extensions. res-
rricting them t~ code sandboxes, usjng type-safe languages, and code signing
tGrimm iind Bcrshad. 1997: and Small and Seltzer. 1998). Dwschel ct al. ( 1997)
prcsrnt o dissenting view, saying rha~loo much effort is going into srcuimityfor
user-extendable svstenl?;. In their view, researchers should fiyurc out which
extension.; are useful anti then just make those a r~ormalpart of thc kernel. without
the ability tu have users exlcnd ihe kernel on the fly.
Although one apprcwch to eliminating bloated. buggy. un~.eli;ibleoperating
s y s l r m s is lo makc them smaller. a more radical one i s to eliminate [he operating
syslem iilrogcther. Ttiis approach is k i n g taken by the group t ~ fKaastioek at
M.I.T. in heir Exnkerr~elresearch. Here the iden is tr, have ;I thin layer of
suAware running on the barc metal, whose only job is to securely nllwate the
hardware resources arntsng the users. For cxirmple. it must decide who gels ti, use
which part of the disk and where incoming network packets should be delivered.
Everything else is up tn user-levcl processes, making it pussihle tn build both
general-purpose and highly-specialized upcrating systems (Engler and Kaashoek,
1 W 5 ; Englcr et al+,1995; and Kaashoek et d.,19971.

1.9 OUTLINE OF THE REST OF THIS BOOK


We have now completed our introduction and hird's-eyc view of h e operaling
system. It is time to get down to the details. Chapter 2 is about processes. It
discusses their prnperties and how they comrnunica~ewith one another. It also
gives a number of detailed examples of huw it-tterproucss communication works
and how to avoid some nf the pitfalls.
Chapter 3 i s dmut deadlocks. We briefly showed whai deadlocks are ill this
chapter. hut there is much more to say. Ways tu prevent or avoid ihetn are dis-
cussed.
In Chap. 4 we will study menwry mmilgement in detail. The irnpo1t.int topic
of virtual memory will he examined, along with closely relatcd concepts such as
paging and segmentation.
InpuVOutput is c u v ~ r e din Chap. 5 . The cor~ccptsof device independence and
device depmdence will k looked st. Several importan4 devices, including disks,
keyboards, and displays, will he used as examples.
Then. in Chap. 6, wc come to the all-important topic of file systems. To a
considerable extent, what the user sees is largely the file system. We will lc-mk st
both the file system interface and the file system implementarion.
At this point we will have completed our study of the basic principles 4of
single-CPU operating systems. However, there is more to say. especially about
advanced topics. In Chap. 7. we examine n~ultimediasystems. which have a num-
6.6 INTRODUCTION CHAP. I

b e r of properties and requirements that differ from conventional operating sys-


tems. Among other items. scheduling and the file system are affected by the
nature of mu1timedia. Another advanced topic is mu1tiple prmessor systems.
including multiprocessurs, parallel computers, and distributed systems. These
subjects are covered in Chap. 8.
A hugely important subject is operating system security. which is covered in
Chap 9. Among the topics discussed in this chapter are threats (e.g+,viruses and
worms). protection mahan isms, and security m d e l s .
Next we have some case studies of real operating systems. These are UNIX
(Chap. 10) and Windows 2000 (Chap. i 1). The book concludes with some
thoughts a b u t operating system design in Chap. 12.

1.10 METRIC UNITS


TO avoid arly c o n f u s h , it is worth stating explicitly that in this b w k , as in
computer science in general, metric units are used inskad of traditional English
units (the furlong-stone-fornight system). The principal metric prefixes are listed
in Fig. 1-29, The pretixes are typically abbreviated by their first letters, with the
units greater than I capitalized. Thus a 1-TR database occupies 10" bytes of
storage and a 100 psec (or 100 ps) clock ticks every 10-lo seconds. Since mtlli
and micro both begin with the letter "m," n choice had to be made. Normally,
"m" is for rnilli and "y" (the Greek letter mu) is for micro.

Figure 1.29. The principal rnctrir: prefixes.

It is also worth pillring c>ut that for measuring memory sizes, in commcm
indusky practice, the units have slightly different meanings. There Kilo m e m s
2'" (1024) rather than 10' (1000) because rncmcwies are always a power of two.
Thus a I - K B memory crmtains 1024 hytcs, rwi I(NO bytes. Similarly. a I-MB
memory contains 22U (1.048,576) bytes and a 1-GB memory contains 2-'"
(1,073,741,824) bytes. However. a I -Kbps cnrnmunicatim line ~ransmits 1IH)O
bits per second and a 10-Mbps LAN runs at 10,000,000 b i t s k c because these
speeds are not powers of'two. Un furtunutcl y. many people tend to mix up these
SEC. 1.10 METRIC UNITS 67
two systems, especially for disk sizes. Tn avoid ambiguity. in this b w k , w e will
use the symbols KB,MB.and GB for 2". 2*', and z3'
bytes respectively, and the
syrnbols Kbps, Mbps, and Gbps for 103. 1u6and I o9 bitdsec. respectively .

1.11 SUMMARY
Operating systems can be viewed frum two viewpoinls: resource managers
and extended machines. In the resource manager view, the operating system's job
i s ro manage the different parts of the system efficiently. In the extended machine
view, the job of the system is to provide the users with a virtual machine that i s
mwc convenient to use than the actual machine.
Operating systems have a long history, starting from the days when they
replaced the operator, tc, modern multiprogramming systems, Highlights include
early batch systems, multiprogramming syskrns, and personal cainputer systernh.
Since operating systems interact closely with the hardware, some knowledge
of computer hardware is useful to understanding them. Compurers are built up of
processors, memories, and VO devices, These parts are connected by buses.
The basic concepts on which all operating systems are built are processes,
memory management, VO management, the file system, and security. Each uf
these will be treated in a subsequent chapter.
The heart of any operating system is the set of system calls that it can handle.
These tell what the operating system really does. For UNIX, we have looked ;it
four groups of system calls. The first group of system calls relates to process
creation and kmination. The second group is for reading and writing files. Thc
third group is for directory management- The fourth group contains miscellanc-
ous calls.
Operating systems can be structured in several ways. The mos~cnmmon ones
at as a monolithic systcm, a hierarchy of layers, a virtual machine system, Jn
exnkernel, or using the client-server model.

PROBLEMS

1. What are the two main funciir~nsof an operating sysiern'?


2. What is rnultipruamming'?
3. What is spooling? Do yuu think that advanced personal computers w i l l have spooling
as a standard feature in the future?

4. On early computers, every byte of data read or written was directly handled by rhe
CPU (i.e., there was no DMA). What implicatiims does this organization have t i ~ r
inultiprogramming'?
CHAP. I PROBLEMS

16. Why i s ihe p~-nccsstahlc nccdcd in a timcshuring n y t c n l ? Is i t also neetled i n pel-wnal


cnmpurcr systcrns in which n n l y imr pnjccss cxisls. [hat prtlccss laking over the m l i r e
machine u n t i l i t i s finislicd'?

17. Is there anv reason why yilu might w ~ n 10


t mount ;I f i l e system nn a nonempry dirtx-
trsry'! !f 50, what is it'?

19, Can thc


count = write{fd, buffer, nbytes);

20, A T'ilu whtmr lilc dcsa~-iplr,ris,f[/ cantitins the fnllowing sequencc of' b p s : 3. 1. 3. 1, 5 ,
9: '2,6. 5 . 3, 5 . Thc r o l l o w i n g svslcm calls are madc:
Iseek(fd, 3, SEEK-SET);
read(fd, &buffer, 4);

21. What is the essential dil'fercnce k t w c e n blwk special f i k and n c h a r w t c r special


t'iIts '7
22. [n ~ h example
c given in Fig. I - 17. zhe l i b w r y yrtrccdure i s c d lcd wod itnd [he s y s ~ c r n
c a l l itself is c d l c d read. Is i~ essential zhar h r h 01 thcsc h a w the hamc ILIITW'! Tf m t .
which rsnc i s morc i~nporrant'?
23. The slicnl-server ~ n n d e il s popular in distributed systems. C a n il also he used ill ;1

single-cornpu~crsy slem'?
24. To a pri~grammcr,a system. c;dl l m k s likr any ulher call tu ii Iibritry ~ I - L X ~ ~ L H - C[s. i~
impm:int thai a prryriunnler k n i w which lib r ; q pruc-edurcs result in sy sir111 cdll;'.'
[Jnder what circun~stanccsand why'?

27. Write shell ihat is * i m i l u r lo Fig. 1- 19 b u l c o ~ ~ t i l i nenough


;I s ctde t h a ~il actually
works so yrru can lesr i t . Y o u might also add sr~rnc~~XJL~I-L:.; s u c h ns r e d i r c c t i m of
input and oulput. pipcs. and hackgn~undjrrhs.
INTRODUCTION

uulimitcd number of child pmcsscs and observe what happens. Befclrc r w n i n g rhe
cxperirnent, type sync ro thc shell to flush the file system buffers to disk lo avoid ruin-
ing the file system. Note: Do nor try this on ;i shared system without first gc~tingper-
mission from the system administrator. T h r consequences will be 'ins~anll):obvhus s r ~
you arc likely to be caught and sanctirms may follow.
29. Examine and try to interpret the contents of ri L I ~ l X - h k cor Windows directory with 3
ton1 like the I JNIX rrd program or thc MS-DOS DEBUG program. Hint: How you do
this will depend upon whnl the OS a h w s . One trick that may work is lo create a
directory on a floppy disk with one upcrating system and then read the raw disk datil
using n different oper~lingsystem that allows such access.
PROCESSES AND THREADS

We arc nuw iibc~utI.{:, err~bark1-m a det.ailcd study a f hr>w ryxriiting sysfitrns Jre
designed and constructed. 'The most central concept in a n y i-rperating system is
the p w ~ ~ ~ , san
L s abstraction
: of ii running pnqram. Everything else hingex un this
concept. and i t is impm-toni €hat Ihc operating system designer {,;lndstudent) have
a thnrough cinderstanding o f what a process is 3s earl!. ;IS possible.

2.1 PROCESSES
All rnodern crmputcrrs can du several Ihinps nr ~ h csame tinic. Whilc running
a uscr prngrarn. a computer can also bc r i d i n g from ;I disk and u u t p ~ l ~ t itexl
n ~ It)
ii screen or printer. In a ~nultiprograrninings y s t m ~ .thc CYU also switchc.; f n n
progrim t u prr,gratn, running each for tens ur hundreds r-d- milliseur~nds. While.
strictly speaking. at any instant of time. the C'PU i h running only m e p r u g m n . in
h e course of 1 secmd, it may work an sevcrltl pmgrarlis. thus giving thc users the
iliusion of parallelisni. Surnetitnes peuple speak of ps~udoparallelismin this
context, to contrast it with the true hardware parallelism of multiprrscessor sys-
tems (which have two or nwre CPUs sharing the same physical sncmnry). Kecy-
iag track of multipie, parallel activities is hard for yet-qlr: lo do. Therefurtl, opcr-
ating system designers uver the ycars have evolved a conceplual rnodel (seqom-
tial processes) that makes paralklisrn easier to deal with. That model, its uses,
and some of its consequences form the subject of this chapter.
72 PROCESSES AND THREADS

2.1.1 The Process Model

i n this model, all the runnable software on the computer, sometimes including
the operating system. is organized into a ilumhcr of sequential processes. or just
processes for short. A process is just an executing program. including thc current
values of the program counter, registers, and variables. Conceptually. each proc-
c s s has its own vidual CPU. In reality, of coursc. the real CPU switches back and
f m h from process to process. bur to understand !he system. it is much easier to
think about a collection of processes running in (pseudo) parallel, than to 1ry t o
keep track of hnw the CYU switches fmm program to program. 'This rapid
switching hack and forth is called multiprclgramming. as wc saw in Chap. I.
Jn Fig, 3-l(a) wc see a computer multipmgramming four programs in
memory. In Fig. 2- I [b) we sce four processes, each with its own jlrs w l?f contrul
kt.,its own lngical proqram counter). and each m e running independently o f the
L

other ones. O f course, there is only one physical pnrgram counter, so when each
pmcess runs, its bgicai program counter is Ina3ed intn [he real program munter.
When it is finished for the time being, the physical program counter i s saved in
thc process' logical pnjgrarn counter in inemury. In Fig. 2- l( c ) we see that
viewed over a long enough time interval, all the processes have lnatlr progress,
but at any given inatant only one process i s aurually running.
One progmm counter I

Four program counters


Press
switch

With the CPU swi~chingback and forth amnng the processes, the rate at
which s prucess prforlns its computation will not be uniform and probably not
even reproducible if the same processes are run again. Thus, processes must not
he programmed with built-in assumptions about liming. Consider, for example,
an ID process that starts a streamer rape to restore backed up files. executes an
idle loop 10.000 times to let it get up to speed. imd then issues a command to read
the firsr record. If the CPU decides to sunitch to amxher process during rhe idle
Loop, thc tape process might not run again until after the first record was already
past the read head. When a process has cri~icalreal-time requirements like this,
SEC. 2.1 PROCESSES 73

that is, pal.ticular events mrrst occur wirhin a spccilird nu~nberof milliseconds.
special nleasures must he taken to cnsure that they do occur. Normally. huwevrr,
most processes x e not affcctcd by the underlying multiprogramming of the CPU
or rhe relative spccds o f different processes.
dit'ferencc between a process and a prograin is subtle, but crucid. An
analogy makc hclp here. Cunsider a culinary-minded computer scientist who is
baking a binhday cake for his daughter. H e has a birthday cake recipc and a
kitchen well stocked with all the input: flour, cggs, sugar, extract of vanilla, and
r o o n . In this analogy, the reeipc is the program (ix., an algorithm expressed in
sume suitable notation). the computer scientist is the processor (CPU), and the
cake ingredients are the input data. The process is the activity consisting of nur
baker reading the recipe, fetching the ingredients. and baking the cakc.
Now imagine fhat the computer sckntist's son wmes running ia crying, say-
ing that he has been stung by a bee. The computer scicnrist records wherc he was
in the recipe (the state of the current process i s savcd). gcts out a first aid book.
and begins follnwing the directions in I t . Hcre we see the processor being
switched from one process (haking) to a higher-priority prmcss (administering
medical care), each having a different prugrarn (recipe versus first aid book).
When the bee sting has been taken care of, the cornputcr scientist gocs back lo his
cake, cnntinuing at the point where he Zef~off.
The key idea here is that a process is an activity nf some kind. It has a pnl-
gram, input, output, and a state. A single prxessor may be shucd among several
processes. with some scheduling algorithm being used to determine when to stop
work on one process and service a different one.

2.1+2 Process Creation


Operaling systems nced somc way to irlake sure all ihe necessary processes
C X I SIn
~ .very simple systems, or in systems designed €or running only a single
application (e+g.,the conhwller in a microwave oven). i t may be possible to have
all the processes that will evrr be needed be presenl when the system ctrlnes up.
I n general-purpose systems. however, some way is oeedcd to create and term iiia t e
processes as needed during operation. We wilf nnw louk at some of the issues.
There are four principal events that cause proccssus t.0 be created:
I . System initialization.
2. Execution of a process creation system call by a running process.
3. A user request to create a new process.
4. Initiation of abatch job.
When an npcrating system is booted. typically scvetal processes are created.
Some of these are foreground p~~oocsscs,that is, processes that interact with
PROCESSES AND THREADS

{human) users and perform utork for them. (Ithers are backgmulld processes.
which re not ( I S S W ~ & ~with p ~ t i c u l a rusers. but instead have some specific
function. For example, one background process may be designed to accept
incoming ernail, steeping most of the day but suddcnly springing lu life when
email arrives. Another background prwess may he designed to accept incoming
rcquesh for Web pages hosted on [ha1 machine. waking up when a requesr arrims
ro service the request. Processes that stay in the background to handle some
activity such as email. Web pagcs, news, prindng. and so o n are called daemons.
Large systcms commonly have dozens or thcm. I n UNIX. the ps pnlgram can be
used to l i s t the runni!~g processes. In Windows [15/98/Me, typing CTRL-ALT-
DEL once shows what ' r running. In Windows 2(N)O. the €ask mnnrrart. is uscd.L.

In addition to the processes created a1 bcmt titne, new prrxesscs ran be crealcd
iifterward as wcli. Oftrn a running pmcess will issue system calls lo c'rtxtt. Llne tbr
rwire riew processes 112 help i t do i t s job. Creating new procrsses is particuhrly
useful when the work to be done can easily be C~~rrnu1atc.d irl terms r7f sevcral
relnked, but otherwise iriclependenr interwtitlp pn~uesses. h r exii.ltnple, ir a large
amount of dara is k i n g f e t c h 4 over a netwurli for subscqucnt prncrssing. i t tr~uy

In interactive systems, users can star1 a prograrn by typing crsrrrlnand or


(double) clicking an icon. Taking either n i rhesc actions starts n new process and
runs the selected program in it. l o comma~id-basedI!NIX systclns r u l ~ n i r ~Xg Win-
dnws, rhe new process takes over lhc wint;irn+ in which j r was sm-red. In MIrrw-
srjfi Windows, when a prncess is shried i t dne5 t w t have il t ? i i n d ~ w b
, ~ i ti t can
create one (or more) and inohi do. I n both sysrelns. uscrs may have nmltiple win-
dows n p n at once, each running sorrv process, Using the mousc. ~ h w r e r can
selecl a window a i d inlcrxl with the procc5. for exnrnplc. prirviding i l l put whcn
needed.

T r o h ~ ~ i u a l l yin. all 1hcse cases. ;l ncw process is creatcd by having ail cxisting
L

prlress executc a p n x c s s cresticm systern call. That proccss ma): he a running


user p r e s s , a system prrlcess i ~ ~ v o k c from ci the keyboard or ~nousc.~,r n batch
matlager process. What that prtxess does is 2xecute a system call 10 cli-cate rhe
new process. This systrm cull tells the operatin_r system lo urentc a new process
and indiciucs, directly or. indircctl y , which program lo tvn in it:
Ln UNIX. thew is only one system call lu create a new process: fork. This d l
creates an exaci clone of the calling process. After the fork. thr two processes. the
SEC. 2.1 PROCESSES 75
parcllt atld the child, have the same memory image. the sa11-Wenvit~flmentswings.
and the sallle open files. That i s all there is. Usually. the child process then exe-
cutes execve or a similar system call to change its memor). image and run a new
progratn. For example, when a user types s command, say, surf, to the shell, the
shell forks off a child prtxcss and the child executes sort. 'The reason for this
rwo-strp process is allow the child ro n~anipulateits file drscriplors after the
fork bul before the execve to accomplish r-edirection of standard input. standard
output, and standard errtw.
In Windows, in cclntrast, a single Win32 function call, CreateProcess. han-
dles both pmuess creation ~ n loading
d the correct program into the new process.
This call has 10 pxamelcrs, which include the program to be executed, the com-
mand lint. paratneters t o feed that program, various security attributes. bits that
control whether open files are inherited, ptic~ritvinformation, i-l specitlcatiui~af
the winduw to be created for the process (if any), and a pointer to a structure in
which information a b u l the newly created process is returned tr-l the caller. In
addition to CreateProcess, Win32 has abmt I 0 0 uther f'urictims for managirig
and synchronizing processes and re1ated topics.
In both IJNlX and Windows, afrer a pnJcess is created. both rhe parent and
child have their own distinct address spaces. If either process changes a w.ortj in
i t s address space. the changc i s no1 visibIr tn the other process. Tn UT%~X, the
child's initial address space i s a cr>py of the piircnt's. but there are two distinct
address spaces iiivnlved; no writable mrmory is shared (some C'NIX implrmen~a-
lirms share the program text belween the two since that cannut be rnuditied). 11 is.
however. possible for a newly created process tn shrirc somc of ils crciitot's r,thsr
resources, such as open files. In Windows. the parcnt's and child's address spaces
arc different from the start.

2.1.3 Proces Terminat i m


After a process has been created. it starts running aud dtws whatever its job is.
However. nothitlg lasts forever. not even ptmuccsses.Sooner or later thc new pmc-
ess will terminate. usually duc ro onc of thc following conditions:

I . Nr?rm;ll exit ( volunrxy).


7. Errnr exit (volu t~tary).
3. Fatat error (invi-luntuy).
1. Killed by anothcr process (involuntary).
Most processes terminate because they h w e done rhcir work. Whcn ;I am\-
piler has compiled the program given to it. ~ h ccnrnpiler executes a system call to
tell the operating system that i l is finished. This call is exit in UNlX and ExitPro-
cess in Windows. Screen-orien~edprograms also support voluntary termination.
PROCESSES AND THRFADS

Word prc.~rssors,Internet hrnwscrs and s i inilni- progtmilrnsalways hove an icon or


lnenu irem that the uscr can click to tell t h e pi-ocess to rcmovc ;my tcrnporary files
i t has upen and then terminate.
The second reason for termination i s that the process discovurs a htal rslmr.
Fur example. if a user types I h c c o ~ n m a i ~ d

ro corrpilc the program fou.r and n o siich file exists, the cum pile^. sitnply exits.
Screen-clricnted inieractivc proccsses generally do not c x i t when given had
parirmeters. Instead they pop up a dialog hox and ask ihs user to try ;\gain.
The third reason for terininiltlcm is nn t.r-rnr c,.aut;tdby the prrxess, often due tr~
a program bug. Examples include executing an illegal instruction, referenuing
nonexistent meInory, nr dividing by zero. In snrne systcms (erga.U N I X 1, a process
can tell the operating system that it wishes to handle certain errors itself, in which
case the process is sigr~aled(interrupted) instead i ~ fterminated when one of he
ct-rors I'lccurs.
The fourth reason a process might terminare is that a process execules a sys-
tem call telling the operating system to kill snmc other process. In UbiIX this call
is kill+ The correspondrr~gW in32 functilm is TerminateProcess. In b o ~ hcases, the
killer must have the necessary uuthmimtion to ciu in the killee. In snme systems.
when a process terminates, either vduntariiy or otherwise, d l processes it created
are immediately killed as well. Neither UWTX nur Windows works this way, how-
ever.

2.1.4 Process Hierarchies

In some systems, when a prwess urGi1te.c; amther pl-occss. rhe parent process
and child process continue to be associated in certain ways. The child pi-ocess can
itself create more proccsses. forming a proccss hierarchy. Note that unlike plants
and animals that use sexual reproduction. a process has only one parent (hut zem.
one, two, or more children).
In UNIX. a process and all of its children and fulther dcscendalits rogether
form a process group. When a user sends a signal from the keyboard, the signal is
delivered m all members of the prcress group currently associated with the key-
board (usually all activc processes char were created in the current window)- Indi-
vidually, each proccss can catch the signal. ignore the signal. or rake thc defitult
action, which is to be killed by the signal.
As another example of where the procr.; hierarchy pla1.s n ride. Ict us look it{
how UNIX initializes itself when i t is s t a n d . A special process. called h i r . is
present in h e boot image. When i t starts running. j t rcads n file relling how m:my
terminals there are. Then i t forks off one new process per terminal. Thcse proc-
esses wait for someone to log in. If a login i s succcssful. the login prtxess ere-
cutes a shell to accept commands. These cotnn~andsmay start up more processes.
SEC. 2.I

and so li~rth.Thus, all the prucrsses in the wholc sysrem belong to a single tree.
will1 i ~ . r i c3t the ma.
In conIrast, Windrws does not have any concept of a process hierarchy. All
proccsscs arc equal. 'The only place wher-c therc is something like 3. process
hierarchy is that when u process i s created, the parent i s given n special ioken
(called a handle) thal it can use to con~roltho child. However, i t is free tn pass
~ J tokenS to some othcr process, thus invalidating the hierarchy. Processes in
LJMX cannot disinherit their children.

2.1.5 Process States

Alrhuugh each prucoss is an independent entity, with its own program counter
and internal state, processes often need tr, interact with other processes. One
process may generate srlme autpur that another prwcss uses as input. I n thc shell
ccln~rnand
cat chapter1 chapter2 chapter3 I grep tree
h e first process, running crrf, concatenates and autputs ihree f ~ k s .The secmd
process, running grep, selects all lines containi~lgthc word "trec." Depending on
the relative speeds of the two processes (which depends nn both the relative cum-
plexity of the programs and how rnuch CPli time each one has had), ir may hap-
pen that grep is ready to run. but there i s no input waiting for it. l r must then
blmk until some inpur is available.
When a process Mocks, il d i ~ sso because logically it cannot continue, typi-
cally hecause it is waiting for input that is not yet available. I t is also possible fur
a process that is conceptually ready and able to run to be stopped because 1ht.
csperating system Bas decided to allocate the CPU to another prwcss h r a while.
These two conditir~nsare completely different. In the first case, the suspension i s
inhereni in the problem {you cannnl prwess the user's command line until it has
been typed). In the second case, i t i s a technicality trf the system (nut enough
CPUs to give each prcwess its clwn private processor). In Fig. 2-2 we see a stare
diagram showing the three states a process may be in:
1 . Running (actually using I h c CPU nt thal instant).
2. Ready (runnablc: temporarily stopped to let another process run).
3. Blncked (unable to run until sumc cxtemal event happens).
Logically, the firs1 two stales are similar. In both cases thc process is willing io
run. only in the second one, there is ten~pc~rarily nu C'PU availiiblz f i x it. The
third state is different from the first twn in that the proccss cannot run, cvcn if tbc
CPU has mlhing else to do.
Fuur trmsitions are possible among these thrce states. as shown. Transition I
occurs when a process discovers that it caonot continue. In some systems the
1 . Process olnck$ for input
2.Scheduler picks another process
3. Scheduler picks this process
4 Input becomes available

p,+L~,ccssmusl cxecule a sys~cincall, such as block o r pause. lo grr itlw t ~ l o c k ~ d


slalz, In other systems. including I:NIX. whcn a process r u d s ti-OW a pire 01, ';PC-
cial file (e.g., a tennitlall and there is n o input avnilahlr, t h e prucc5s i s ~ U ~ U T I I ; ! ! ~ -
d l y b\ocked.
Transitions a ~ t d3 ;w caused by thc process schedulrr. a pait of i h r operdt-
ing systunl, without the proccss cven knowing about thcm Tlnnsition 2 occurs
when the scheduler deci~iest h i t thc iunning p n x r s s has ruIi long cnr~ugh.m d i t i s
time tr, 1ct mother pructx have some CPLI time. Transitirm 3 occurs u.hm all t h r
uther processes have had theit. fair sharr: and ir i s tirnc fur the firsr pAocessti, get
the CPU to run again. The subject of scheduling. that is. deciding which prowis
should run when and for how long, i s an imprmant one; wr will 1nr.A at i t later in
this chapter. Many algorithn~shave been devised to t r y to baiatwr thc snmpeting
demands of efficiency for the system as a whole and fiiirness to individ~ral
processes. Wc will study some nf them latrr in this chptcr.
Transition 4 occurs when the external event h r which a process was waiting
(such as the arrival of some input) happens. If no orhcl prncess is running ar that
instant, transition 3 will he triggered and the pnxess will start running. Otherwiw
i t may have to wait in rua& state for a htlc while t~nti!the CPU is available a i d
its turn comes.
Using the process model. it becomes rrluch cusier 10 think ahsut what i s going
on inside the system. Some of ~ h cproccsscs rut-] prugrarns that c a r 0 vul cum-
rnands r y p d in by a user. Other processes arc part of the system and tiar~dlctasks
such as carrying out requests for tile senices nr inar~agingthe detaj 1s ul' ruming u
disk 01. a tape drive. When a disk interrupt occurs, the systetn rnakcs a decision to
stup running the current process and run the disk prmess, which wits blocked
waiting for that interrupt. Thus. instead of thinking ahout interrupts. we can think
iibnut user processes. disk pmcesses. icnninal processes. and so on. wllich bltxk
when they are waiting for something to happen. When the disk has beell :-earl or
~ h character
c typed, the process wailing for i t is urthlncketi iind is elipiblc ro nln
again.
This vicw gives riw to the modcl shown in Fig, 2 - 3 . Helde ~ h lou,t"st
s level of
t hc operating system is the schedulrr. with a variety of prrwesscs 011 top uf it. A l l
the itlterrupt harldhng and details of actually starring and slopping pibwcesses;ire
hidden away in what i s here called the scheduler. which is actually not much
code. The rest of thc operating system is niccly structure J i ~ process
r fctm. kicw
red syste~nsare as nicely struuturcd as this. h o w v e t .
Processes

Scheduler I

2.1.6 Implementation of Processes

To implement the process modcl. the uperaring system imintains a rablu (an
array uf structures), called the prwess table. with rm-e centry per pruucss. (Sorrbc
authors call these entries process control hlucks.) This entry crmtairw infonm-
tion about the pruccss' slalc, its progrnim counter. stack pointer, mtinrry dlucii-
tion, the status at' its opcn fjles, its accounting and scheduling inhrn~atiun,and
everything else about the process that must he saved when the pi-ocess is switched
from mnning to r e d y or b1m'krJ statc so that i t can bc restnrtcd li.il~ras if it had
never been stopped.
Figure 2-4 shows some of the more important: fields in ;I rypiciil syslern. The
frclds in thc tifit column relatc to proccss tnanapcnwnt.. Thc other two ci~~uiniis
relate to rnernory management and file rnan;lgeti~ent,rcs pectivcty. t t shrluld be
noted that precisely which fields the process table has is highly system dependent,
but this figure gives a general idca of the kinds 01' informatirm ~ieeded.
Now that we have looked at the prncess table, i t is possible to explain a littlc
more about how the illusiun of multiple sequential processes is ~nitintninedo n a
machine with m e CPU and many 110 dcvlces. Associated with cach 1K.I d ~ v i c c :
) u I w a t i m (oficn near t h r
ciass ( e g . floppy disks, hard disks, timers, k m ~ i n a l s is
bottnm of memory) called the interrupt vector. It crmt.ilins the sddres8 t,C the
intermpc scn~iceprocedui-e. Suppose that user process 3 is running n h e n a disk
intempt OCCUTS. User p~-DCCSS 3's program counter. ptmgrmn ~Latuswword.and
possibly one or more registers are pushed onlo the (current) stack by the interrupt
hardware. The computer then jumps to the ;~tldressspecified in the disk interrupt
vector. That is all rhe hardware does. Frrm herc o n . i t i s up ro the software. i n
particular, the interrupt service prncedure.
All inkrrtrpts star1 by saving the rcgistn-s, often in the pl-oress t:hle entry fcjr
the currcnt process. Then thc intormation pushed onto the stock by the interrupr i c
removed ant1 the stack pointer is set to point I U a lrmporary stack used by the
.- . . . . . . . . . . ... - -... --
File managemen
Process management
Registers
Program counter
I Memory management
Pointer to text segment
i Pointer to data segment
R o d directory
Working directory
Program status word i Pointer to stack segment File descriptors
Stack pointer User lD
Process state Group ID
Priority I

Scheduling parameters I

Process ID
Parent prc-cess
Process group
Signals
Time when pmcess started
CPLI time used
Children's CPU time
Time -of. . next
-. .- . .-
alarm

process handler. Actions suuh as saving thc tegisters and setting the stack pointer
cannot w e n be expressed in h i g h - b e l lancuagcs such ;is C. so they are per-
b

formed by a small a.;sembly language routinc. usually rhc s m l e me- for all inler-
rupts since the work ol' saving the registers i s identical. nc, n m k r what t h e c;iusc
of the interrupt is,
Whcn this routinc i s finished, it calls 3 C prr,cedwe tr:, do the rest uf' lhrr work
far this specific intempt type, (We assume the upcrating system is wti~teriirr C.
the usual choice for all real operating systems.) When it has done i ~ job. s possibly
making somc prmws now ready, the scheduler is called I t 1 szc u h i ) tu run n e x t .
After that. control is passed back t o the asscmbly language c d r : l o l o a d up the
repistcrs and mcrnw-y map for the now-current proccss and slarl i t runnirlg. Inter-
rupt handling and scheduling are summarized in Fig. 2-5. It is wu1d-1noting that
the details vary sumewhar. frwm system lo system.
- .---- -
. . - . . . . -. . . ...... .. -. ....
F H a r d w a r e stacks program counter. atc. I
2.Hardware loads new program counter from interrupt vector.
3.Assembly language procedure saves registers.
4. Assembly language procedure sets up new stack.
5. C interrupt service runs (tjtpically reads and buffers input).
6.Scheduler decides which process 1s to run next.
! 7. C procedure returns to the assembly code.
1..8. .-Assembly
: language procedure starts up new current process. .
. . . . . . -....
PROCESSES

2.2 THREADS
In traditional openting systems. each pnxess has an address space and a sin-
c~Iet hmid nf cnntrol. In fact, that is alnlvsl Ihu definition of a process. Neverthe-
.3

Icss, there are frequently si~uiltions in which it is Jesirahlc to have multiple


thrcads of cuntrtd in the silme addmss spacc running in quasi-parallel, ils though
lhcy were separate processes (rxuept for rhc shared address space). In the follow-
i i w seclions we will discuss h s e situations and their irnplic:atims.
%-

2.2.1 'Che Thread Mndd

The process model as we h a w discussed ir thus far is based m twu indepcn-


dent cl3nr;cpts: resource grouping and cxecu t i m . Sr~rnetin~csi t is ~ s t f u ltu
separate ~hcrn;this is where thrmds wine in.
One way of looking at ii process is that i~ is way lu group related resources
together. A prrxess has an address spaw containing program tex! and data. ns
well ns other. resources. These rcsourm may include open files, child pruceshcs,
pending alarms, signal handlers, accounling inforrnatinn. and mure. E3y putring
them tr~getheri n [hc form o f ;l process. they can be managed mure easily.
The olher concept a process has is a Ihread uf execution, usually shortened ta
just thread. The thread has a program crsuntcr that keeps track of which instruc-
tion to execute next. It has registers. which hold its current working variables. I t
has a stack. which contains the execution history. with one 1'riime fur each pro-
cedure called but not yet rcturncd froin. hhhough a thrcad must e x r c u k in sunle
proccss, ihc thrcad and its proccss arc difKcreni ur-mcrpts and ciin he treated
separately, Pnxxsscs are used t o grwp resources together; ~hreadsare r h entities
~
scheduled fbr executiun on rhc CPU.
What threads add lo the proccss model is to illlr~w~rlultiplecxrcutionr to take
place in the same process environment. to a large dcgret: independent o f ( m e
another. Having multiple threads running in parallel in onc proccss i s malogous
lo having multiplc processes running in parallel irr one computer. In thc fomicr
case. the threads share an address spacc, npen files, and ut.her resources. In the
latter casc, proccsses sharc physical memory, disks. printer.;, and other resources.
&cause threads have wmr uf the propcrries of processes, rhcy arc sometinxs
~ u l l e dlightweight processes. The term multithreading is also used to deswihe
he situation of allowing multiple threads in h e same process.
In Fig. 2-6(a) we see three traditional proccsses. Each pnlcess has i t s own
address space and a single thread uf contml. In ctmtrast, in Fig. 2-hib) we see i3
single process with threc thrcads of contrnl. Although in both cases wc hnvc threc
threads, in Fig. 2-6(a) each of them operates in a diffcrenl addrcss space. whercas
in Fig. 2-6(b) all threc o f tbcm share the same address space.
When a rnultithreaded process is run o n a singlc-CPU system. the threads take
turns running. In Fig. 2 - 1 , we saw how multiprogramming of processes works.
Process

User
spocc

Thread

space \) I Kernel

Bv switching back and forth among multipk proccsscs. the syslcm gives the ilh-
sion of separate sequential processes running in paraliel. Multithreading works
the same way. The CPU switches ripidly back and forth among the rhreads pm-
viding the illusion that the threads are running in parallel. albeit on a slnwer CPU
than the real one. Wlth thrce compute-bcmnd thrcads in ;I pnKesr;, the threads
wr~uldappear tr, be running i n parallel, each m e o n a CPU w i l h mw-third the
speed of the real CPU.
Different threads in a process are not quite as independent as dif'krenr
processes. All threads have cxnctly the same address s p a w which means [hill
they also share the same global variables. Since every thread can access evcry
memory address within thc process' address spuce, one thread can read, write. o r
even cun~pletelywipe out another thread's stack. Thcre is no protection hctween
threads because ( 1 ) it is impossible. and ( 2 ) it should not br: necessary. Unlike
different processes. which may be from different users and n h i v h may be hostile
co one another, a process i s always owned hy a single user. who has presumably
created multiple threads xu that they call cu~spcrate.not fight. In addition to shar-
ing an address space, all the threads share 111csame S C ~r ~ fupen files. child
processes. alarms, and signals, etc. us shown in F i g 2-7. T h u s the organization uf
Fig. 2-6(a) woutd be uscd when the thwc prwcsses are ussentially unreliitcd,
whereas Fig. 2-6(h) would be appropriate whcri ihe three threads arc actual1y parm!
o f the 5ame jot? and are actively and c l t ~ l cwperitting
y with each othcr.
The items in the firs[ column are p n m s s properties. not thread propcrtieh.
For example, if one thread upens a filc, that file i s visible to the othcr threads i n
the process and they can read and write il. '!'his i s hgical since thc prncess is the
unit (,f resource rnnnngen.lcnt. not the thread. Tf caoh thread had its own address
spuce, open tiles, pending alarms. and so vn. it would he a separate process. What
we are trying to achieve with the thread conccpt is thc ability for multiple threads
- -- "- . . .
- -. I

Per process items Per thread items


I

; Address space Program counter


I Global variables ; Registers
Open files ! Stack
: Child processes State
I
I Pending alarms
Signals and signal handlers
Accounting information ..
. . .- .

I'rmn ~ o m e
task.
Like a traditinnal process {i-e., ii process with ~ r d yULIC thread), a rhrad can
be in any m e nf several states: running, hlnckud, ready. or terminated. A running
thread currently has [hi: CPC; and is active. A l~locksdthread i s waiting h r srmw
w e n t tn unblock it. F m example. when a thread prrfor~nsa system c d l t o read
from the keyboard, it i < bhckcd until irlput is typed. A thread cun hluch wiring
for some external even1 lo happen or for some other thread to u~ihjocki t . A r e d y
thread is scheduled to run and will as soon as its turn curncs up. Thc transi~ions
between thread states are the same as the transitions k l w e e n process slates mb
are illustrated in Fig. 2-2.
It is important to realize thiii each thread has its own stack, as shown in
Fig. 2-8, Each thread's stack wntains o t ~ cframe f'or each p w e d u r e called bul nut
yet returned frrmt. This frame contains the prrxxdure's lwal variabks and thc
return address to use when thr pmccdure call has finished. Fur example. if pro-
cedure X calls procedure Y and this one calls prmcdurc Z , while Z is executing the
frames for X, Y, and Z will all be o n the stack. Each thread will generally call ilif-
ferent procedures and a thus a different execu~ionhistory. This is why i s thread
needs i t s own s~ack .
When multithreading is present, processes nomially s1at-t with a single thread
present. This thread has the ability to create new threads by calling a library pro-
cedure, for example, fhreud-rreure. A parameter to rhrrd-crrnru typically
specifies the name of a procedure for the n c n thread to run. It is nor ncctssnry (or
even possible) to specify anything about the new ihrcad's address space since il
au~ornaticnllyruns in thc address space of thr crealing ihrcati. Sometimes threads
are hierarchical. with a pare.nt-child relationship. hot often ilo such relarimship
exists, with all threads being equal. Wirh ur without a hierarchical rtll:riionship,
the creating thread is l~suallyreturned a thread identifier thai niilmes thc ncw
rhrcad.
When a thread has f<nished i t s work. it can exit by calling a l i b r a ~ ypro~edure,
say. thrrird-eri~. I t then vanishcs and i s rro lotrger schedulable. In snme thread
PROCESSES ANT) THREADS

Thread 2

Process

Thread 3's stack


Thread 1's
stack

Kernel

k'ignre 2-8. Each thread has irs own stack.

systems, one thread can wait for a (specific) ~hreadio exit by calling a pmcedurc.
far example, head-wrrir. T h i s procedure b h c k s the calling thread u t ~ t i l a
(specific) thread has exired. In this regard, thread creation and termminatimis very
much like process crention and termination. with appmxirnately the sijrne uptims
as we1 l.
Another common thread cdl is rhrrrrd-~irid.which allows n thrcacl to volun-
tarily give up the CPU to let another thread'run. Such a call i s importan1 becausc
there is no clock interrupt to actually enforce timesharing as there is with
processes. Thus it i s imponant for threads to be polite and voluntitrily surrendtsr
the CPU from time to time to give other threads a chancc t o run. Other calls
allow one thread to wait for another thread to h i s h some wnrk, for a ~ h r c a d1 0
announce that iit has finished some work, and so on.
While threads are oftcn useful, they alsn intrrxluci: 3 number ~ n fr;urnpliuatirms
into the programming mr~dei. Tr, start w i h , consider the effects of thc l?YIX fork
system call. If the pawn1 process has multiple threads, shr,uld the child alsu have
thei-h'? If not. thc process may nat function prqserly, since d1 o f them may be
essential.
Howevcr. if the child process gets as many threads as the parent. what hap-
pens i f a thread in the parent was blocked on a read call, say, from the keyhoard?
Are two threads now biockcd on the keyboard, one in the parent and (mi:in thc
child? When a line is typed, do both rhl-eads get a copy of' i17 Only the parenr?
Only the child? The samr problem exists with open ne~workcnnneclions.
Another class of problems is related to the fact that threads share many da\;.i
structures. What happens if nnc thread clnses a file while anrjther m e is still read-
inp from it? Suppose that une thread noticcs that thcre is too 1irrle rllernury anti
starts allocating mare memory. Pan way through. a thread switch occurs, and thc
new thread also notices that there is too iirtle memory and also starts allocating
THREADS

more memory. Menlory w i l l probably he allocated twice. These problems can be


solved with some effort, but careful thcsugh~and dcsign are needed to make mul-
t ithreaded programs work correct1y .

2.2.2 Thread Usage

Having dcscrihed whar threads are. i f is now time to explain why anyone
wants them. The main reason for having threads is that in many applications.
multiple activities are going on at once. Some of these may block from time to
lime. By decomposing such an appiication into mu ttiple sequential rhreads that
run in quasi-parallel, Ihe programming m d e l kcornes simpler.
We have seen this argurnenl before. It is precisely the argument tur huving
processes. instead of thinking about interrupts, timers, and context switches. we
can think abuut parallel processes. Only now with rhreads we add a new clcnient:
the ability for the parallel entities to share an address spacc and all af its data
amung themselves. This ability i s essential for certain applications, which i s why
having muhiple processes (with their separiite address spaces) will not wtlrk.
A second argument f w having threads i s that since they do nor have any
resnurces ateached to ihurn, they are easier it, crenlc and destroy than processes.
In many systems, creating a thread grxs 100 tirncs faster than ui7eating u pmuess.
When the number of threads needed changes dynamicntly and rapidly, this pro-
perty is useful.
A third reason for having threads is also a pcr1;mnance argurnenx. Threads
yietd no perfnmance gain when all of ihem are CPU hound. but when there is
substantial computing and also substantial l/O. huving threads allows thcse astivi-
ties to overlap, thus spceding up the application.
Finaljy, threads are uscful on systcrns with milltiple CPUs. whew r e a l paral-
lelism is possible. We w i l l come hack to this issue in Chap. 8.
It is prt~bablyeasicsl to sce why threads arc useful hy giving some concrete
examples. As a firs1 cxample, consider a word processor. Mosl word prwcshors
display the document being cwared tro h e screen formatted exactly as it will
apwar on the printed page. I n particularc all the line breaks and page hrcaks are
in their crlrrect and r i n d position so the uscr can inspcct them and h a n g the
document i f need br: k g . . to elirninare widows and nrphans-incomplttk top imd
bottom lines on a page. which are considered esthet i d l y unpleasing).
Suppose that the usur i s writing a book. From thc authofs p i n t of vicw, it i s
casiest to keep rht: emire bonk as a single file t c ~lnakc i~ easier to search for
topics, perform global substitutions. and so o n . Alternatively, each chapter might
be a separale file. Hnwevel-. having every section and suhscction a a separate
file is a ma1 nuisance when global changes have ro he rtlade to the entire book
since rhm hundreds o r files have to bc individually edited. For example, if pro-
posed standard x x x x . is approved just bctirrc the book gws to press, all
vccLirrences of "llraft Srandard xxxx" h a w tu be changed to "Standard xxxx' at
the last minule. If the entire book is one file, typically a single cornnland can do
the .;uhslitutions. In contrast, if the book 15 spreild tIVCF 3NI fiics. c ~ fllKh
must be edited separately.
Now consider what happens when the riser suddenly dcletrs one s c n t ~ l l u r
from pnge 1 of an 800-pagc docurncnt. After ctieckinp h e changed papc t o rtlakc
sure It i s correct, the user now wants to nmkc ;mother uhangc 011 pagc A(#) and
types in a command telling the word pruoctssca- to go to that page ((possjbly by
searching for a phrase occuning only there). The word p~occssoris now forced to
refofma1 thc entire book up to page 600 on the spot be.ca\~sei t does not know what
the f i r s { linu o f pagc 60I) will be until ir has prmocesscdall the previous pilges.
There may be a substantial delay before pnge 600 can hr displqed, Icading tu an
unhappy uscr.
Threads can help here. Suppose thal the word prt>cessor 1% wwrit~enas a cwu-
threaded program. One thread interacts with h e user and the o ~ h e handles
r ~cftlr-
matting in the background. As soon as the sentenre is delctd I r m ~ p q e t Ihe
interactive thread tells the reformatring thread ro rcl'urmnt the whole h w h .
Meanwhile, the int-eractivc thread continues tu listen 10 the keybr~ardand inouse
and responds to simple commands like scmlling page 1 while the orher tliread is
cumpuling madly in the btlckground. With a little luck, the rcformacting w i l l he
completed before the user asks to see pagc 600, so it can he disptaycd instanlly.
Whilc we are at it, why not add n third thread? Many word processors have I(
feature of automaticaLly saving the entire file tn disk every few minutcs to pl-otect
the user against losing a day's work in the event of a program crash, syslcm ur;lsh.
or power failure. The third h e a d can handlc the disk backups without interfering
with the other two. The situation with three threads is shown in Fig. 19-9.

Kernel
Keyboard Disk

Figure 2-9. :I word proccsstx with khrce threads.


~f w e e dnp\e-thred&d. (hen u heclever a disk backup started:
progrkl,rem
from the keyboard and rnuusc wuuld be ignored urltil the backup was
finished, The user %rould p e r ~ c i v et h i s as sluggish p € ' r f ~ m ~ ; l ~ cfi11~.rllati\'ely+
e.
keyhoard and *nouseevents could inlcrrupr thc disk b i ~ k l ~ allowing p. g c ~ perfor-
~d
mance bur leading lo n complex intrrrupl-driven pn-qrarnming model. With thrcc
threads. 1hc prog.r!ralmming model is t ~ ~ u csirnplcr.
h The first t h ~ just d irmxacts
with t t 3 t user. The second thread reformals the dixwnerrt when ~ n l d tcr. The third
I hwad writcs the cantents of R A M disk periodically.
I t should be clear that having three separate processes would nut work here
because a11 three threads need tu opcratc n n the dncunlenl. Ry having three
threads instead o f rhrctr prtscesscs. they share a ccmmun t n z m q and thus all have
access tu the docurnen1 being cdiced.
An analcsgnus situation exists with many other irttenwtive prt)grams. For
cxnmple, iln eleutrt~nicspreadsheet is a pnyrarn that allows i i user tn muintiin a
~nirtrix,some of whose elcn~entsare data prrwidrd by the user. Other elerncnts
arc crm~pu~ed based r m the inpul dora using potentially cunlplex fnnnirlns. When
a user changes one element, many other elements may have tn be recrmipcrtcd. By
having a hackgrr~undthread do the recomputation, \hc interaclive thread ran nilow
the user t c ~rnake nbditimal changes while thc compuiatian is going o n . Similar1y ,
a third thread can handle periodic backups trj disk un its uwn.
Nr>w crtnsider yet mother example nf where threads arc uucful: a server likr n
World Wide Web site. Requests for pages come in and rhe requested page is sent
back to the client. At most Weh sites. some pages are more coml~ionlyaccewcd
than other pages. For example. Sony's homc page is accessed far more than a
pave deep in the lree cont~lining the technical specifications o f some particular
b

carnuorder. Web servers use this fact to iniprove prriormi~nceby maintaining a


collection of heavily used pages in main incmory to climii~atcthe r ~ e e dto gu to
disk Lo get thcni. Such a cdlection is cnllcii a cache and is used in inany other
wntexts as well.
Onc way Lo organize the Web server is s h ~ m ~ iini Fig. 2-10(3). Hcrc m e
thread, the dispatcher, reads i n c u n l i ~ ercquests f t ~ wurk
L. r f w m thc network. At'ter
exmining the requcst. it chrmses an idle ( i . l~lockccl)
~ worker thread and hands
il thc ;equent. possibly by writinp n pointer to the rlrcssage into a spccial word
ossociatcd with u c h thread. The dispatcher then wakes up ihe slccping worker.
moving it h u m b l w k e d state i n ready state.
Whl-n the worker wakes up, it checks t o sel: i f the rcqurst can he satisfitxi
from the Web page cub. to which all thrzaris h a w access. It' nut, it. starts ;I read
operation lo ger thc page h r n thc disk and blocks until rhe disk c~perationcom-
plctrs. When the thread blocks on the disk operation. another ihrcad is uhnscln to
run. possibly the dispatcher. in order IIJ acquire more work, ur possibly ilnother
wnrker h a t is nmv ready tu run.
This model allows ihe server t o he written as ;1 collection of sequential
threads. The dispatcher's program consis~sof an infinite lcmp for getting n work
PROCESSES AND THREADS

Web $ w w r process

I,
Dispatcher thread

space
I
Web page cache
I

\ Kernel
J space

request. and hmding it ufl'to a worker. Each worker's code consists o f an infinite
loop consisting of accepting a request from h c dispatcher and checking the Web
cache 10 see if the page is present. If stl, it i s returned to the client and the wnrkci-
blocks waiting for a new request. If nnt, it gcts the p a p frr~mthe disk, rcturns il
to the client, and bhcks waiting for a new request.
A rough c~utlineof thc cude is given in Fig. ?-I J . Herc. as i n the rest of this
book, TRUE is assumed to be the constant I. Also, b@and prrgv are structure?.
appropriate for holding a work request and a Web p q c . respectively.

while (TRUE) { while (TRUE] {


get-next ... request(&buf); wait for ..work(&buf)
handoff-work(&buf); look-for-page-in-cache[&buf, &page);
1 if (page-.m t h ..cache(&page))
read page-from disk(&buf, &page);
return page(8page);
1
(a1 v4

Consider how the Wcb server cr~uldbe written in the absetwc of threads. Ont-
possibility is to h a w it operate as a single thread. The main loop of the Wch
servcr gets a request, examines it, and carries it o u t tn completion bcfure ge~tinp
thc next one. While wailing fur the disk, the server is idle arid does not process
any other incoming requests. If the Web server is running un a dedicated
SEC. 2.2

machine, as is comnlonly the case, the CPU is simply idle whilc the Web server i.;
C be pro-
wailing for the disk. The net result is that many fwcr ~ C ~ U ~ S € S / S CCan
cesscd. Thus thrcads gain considerable perfomance, but each thrcad i.s pro-
gramrt~cdwqucntinlly. in the usual way.
S o f i r we have seen ~ w possible
o designs: a inultithreaded Web server and a
singlc-threaded Web server. Supposc that threads are not available but the system
designers find the performance loss due to single threading unacceptable. If a
nunblocking versivn of the read sys~cmcall is available. a third approach i s possi-
ble. When ii request cornrs in. the one and only thread examines it. If it can be
satisfied fi-om the cachc. Fine, hut if nul. a nm~hIockingdisk c~perationis started.
'I'he server records the state of the current ruqucst in ;itable and then goes and
wts the next cvent. The ncxt event tnay cither be ii request fur ncw work or a
c
I-cply from thc disk about 2 previous operaticm. If i t i s new wr~rk.that work i s
started. If i t is a reply frnrn the disk, the telcvant infixmation i s f'erchd from the
table and the reply prucessed. With nunblr~ckingdisk I/<),a reply prnbably will
have to take the f o m of a signal o r interrupt.
i n this design, the "sequential prwess" model that we had in the fip,t [wr,
cases is lost. The state of the crsrnpiitaticrn must be explicitly saved i ~ n dre.~tored
in the table every lime the scrver switches from working on one requrst to
another. In effecl, we are simulating the thrcads and their stacks the hard way. A
design like this i n which each rr~mputatiunhas a saved stale and thcre exists w r n e
set o f events that can occur lu change the st8to is called a finite-statc machine.
This concept is widely used throughnut computer science.
It should now be clear what threads have to offer. They make i t pussibk t o
retain Lhc idea of sequential processes h a t m a k t blmking system calls (e.g.. fnr
disk 1/0) and still achicve prvallelisnl. Blocking system calls make prrlgranmirlg
easier and paralldistn inlproves perfwmance. The single-threaded scrvw rct~ains
the ease of blocking sysrcm calls but pivcs up pcdbnnnnce. The third approach
achieves high performance through parallelism but uses nonblocking calls and
interrupts and is thus is hard to prtyram. These mt,dels arc sutnmarized in
Fig. 2- 12,
. -.. -. --
7 '

Character istics

1
.- .- .. -. .

.. - - - . ---- - . system calls. --- .. --- -


I
~ ~
Single-threaded ~ process
~ d
No s
parallelism, blocking system calls
- _

7-
:
- . . - .. - - . -- ... - -
. .i
.
Finite-state
-.
machine
-. Parallelism, nonblocking
.. system calls, interrupts
....

Figure 2-12. Three n ' a ys ro construct a serr.er.

A third example where threads are useful is in applications that must process
very large amounts ell data. The normal approach i s ro read in a block of dara.
process it. and then write it out again. Thc problem here is that if only blocking
system calls are available. \he process blocks while data are coming in and data

You might also like