Unix

Unix India CompuMasterLtd,.
INTRODUCTION TO UNIX &

ACCESSING THE UNIX SYSTEM
Introduction to the UNIX Operating System
What is an operating system?
What is a UNIX system?

Starting a UNIX session - Logging In
Basic UNIX Commands
Introduction to the UNIX Operating System
UNIX is the most popular operating system on multi-user environments. It has evolved
over the past 25 years from its conceptual stage into a powerful and effective operating
system. Ken Thompson of Bell Laboratories originally designed the UNIX operating
system in 1969. This was evolved from one of the early time-sharing operating systems
called MULTICS in response to the programming requirements of that time.
UNIX was first loaded onto the PDP-7, a minicomputer from DEC with less processing
power. When the DEC PDP-11, a faster machine was introduced, the entire project was
shifted to this machine.
UNIX was originally written in assembly language. However, with the development of
the ‘C’ programming language in 1971 by Dennis Ritchie, the core of the UNIX system
was recorded in ‘C’ in 1973, as were any further developments to the system. This was
the first instance when an entire operating system, or at least a major part, had been
coded in a high-level language. The portability that resulted, is one of the major
reasons for its popularity. However, some of the lower-level functions of the core in the
UNIX system are still in assembly language.
What is an operating system?
An operating system (OS) is a software program that acts as an interface between a

user and a computer. The purpose of an operating system is to provide an environment
in which a user can execute programs.
An operating system is thus an important part of every computer system.
Operating systems are of two types:
● Single-user operating systems
Example: DOS, Windows-95.

● Multi-user operating systems
Example: UNIX, Zenix.

Single-user operating system
A personal computer is a single-user system. A single person can use a PC, at a time.
Generally, the operating system used on single-user systems is MS-DOS.

Multi-user operating system
An operating system that can cater to the needs of a number of users at a time, is
known as a multi-user operating system. This can be done by using the technique
known as time-sharing. The operating system divides the total time into a number of
time-slices and schedules the tasks one after another in a given priority. The switching
is done so fast that the user has the illusion that the total computer resources are
available to him.
What is a UNIX system?
A typical UNIX-based computer system includes a number of hardware and software

components.
Hardware
A system unit that has the system’s central processing unit and one or more disk
drives for mass storage including a backup device (floppy disk, magnetic tape drive).
A user terminal from where users interact with the system, connected through cables.
A console from which system operation is controlled. It may also be used as an
ordinary terminal.
Communication lines connecting the system to other UNIX-based systems or mainframe
computers.
Software
Like all operating systems, UNIX is also a software operating system. UNIX functions on
a computer system in conceptual layers.
The UNIX system has built-in methods to keep the activities of each user separate.
Every user of the UNIX system has his or her distinct identification, which consists of a
user name and password. This distinct user identification is called the user’s account.
All systems have a user who looks after the administrative duties of the system, and
has all the privileges and is called the superuser or administrator. The user name of
the superuser is root. The superuser is responsible for creating various users and
maintaining accounts of the users.
To use the system, the user has to log into the system. The procedure of entering the
system is called logging in, logging on or signing onto the system. A user’s interaction
with the system between login and logout is termed as a session.
Starting a UNIX session - Logging In
A user of a UNIX based system works at a user terminal. After the boot procedure is
completed, the following messages appear at each user’s terminal.
login:
Here, the user has to type in the login name/user account and press the <Enter>
key after which the system asks for the password. The password is used as a means of
security and is not echoed (displayed) when typed. The password must consist of at
least 6 characters. These characters may be a combination of letters, numbers or
special symbols.
Normally, only the first eight characters of the password are significant. However, this
length can be set and is decided by the superuser. The password can be changed only
by the user after successful login or the superuser and hence, provides effective
security to the system.
If the login name or the password is invalid or typed with an error, the "login
incorrect" message is displayed and another login prompt appears. The login
procedure has to be repeated till successful. If the user is unable to login correctly after
a fixed number of attempts, the system locks out the account. Also, if the user is unable
to enter the password within a certain amount of time of entering the user-name, the
system disconnects by giving the message login time-out. These features help prevent
unauthorized access to the system.
So, to log onto the system the user has to identify to the system that he is an
authorized user of the system. The login name or the user-id is used for identifying the
user to the system and in turn, to the other users.
When a valid user name and password are entered at the terminal, the $ symbol is
displayed on the screen. This is the UNIX prompt. When a user logs in, he is taken
directly into his HOME directory created at the time of creation of users. This indicates
that the system is now ready to accept commands from the user and execute them. By
default, the user has all the permissions in his home directory. However, to access the
other areas of the system, users must have the appropriate permissions to access other
users’ directories.
System Messages
After a successful login, the user may receive any or all of the following system
messages.
● Last successful login and last unsuccessful attempt to login
● Copyright notices
● Message of the Day
● Mail Notification
After successful login, the system displays messages regarding the last time the user
had successfully logged-in and the last time the user had unsuccessfully tried to login.
And finally, it displays the Shell prompt indicating that the system is ready to accept
commands. The $ is therefore the command prompt.
Cautions about logon
The UNIX system unlike the DOS system distinguishes between upper case and lower
case characters. Since user name and password can only be lower case letters, it is
necessary to type user name and password in lower case, or else the system may not
recognize the user.
Changing your password

The passwd command changes the password. Whether or not a password is required
for account access is really a matter of system administration policy. The password
helps in preventing unauthorized users from accessing a user’s account. It is a good
practice to change the password from time to time. The new password must differ at
three positions at least.
When a user tries to change the password the passwd program prompts the user for
the old password. This prevents unauthorized users from changing another user’s
account password. If the password is being set for the first time, the user is not
prompted for the old password.
The program prompts the user to re-enter the password in order to make sure that the
user has actually remembered the password.
If the user is unable to enter the old password or if the password re-entered for second
time does not match, the program terminates with a ‘sorry’ message.
A password must contain at least two alphabetic characters and at least one numeric or
special character. The password must be different from the user’s login ID.
There are a number of situations under which the user cannot change the password. A
few conditions are listed below.
● The new password does not follow the rules for password.
● The minimum time between password changes has not elapsed.
● The superuser has not conferred upon the user the privilege to change his/her
password.
Ending a UNIX session - Logging Out
Logging out of the system ends a session. When a user logs out of the system,
resources that were being used can be allocated to the other users on the system.
Logging off ensures that the commands are stopped and files that you were using are
closed property.
By logging out of the system, the user avoids the user account from being misused by
some other user.
To logout of the system, the user has to type exit at the Shell prompt or press ^D.
Basic UNIX Commands
UNIX commands can be entered after the $ prompt appears on the VDU. All UNIX
commands should be entered in lower case.
Command: date
Function: The date command tells the current date and time.
● Example:
$ date
Thu Apr 1 09:34:50 EST 1999

$
Command: who
Function: This command gives details of all users who have logged in to the Unix
system currently.
● Example:
$ who
vijay ttyp2 Apr 1 11:56
rag ttyp1 Apr 1 11:53
$ who am i
rag ttyp1 Apr 1 11:53
Command: man
Function: This command provides help about the specified command.
● Example:
$ man who
#This command gives complete details about the #usage of the who command.
Directory Commands
Command: pwd (print working directory)
Function: Displays the full path name for the current working directory.
● Example:
$ pwd <RET>
/usr/bin
$
#Here, /usr/bin is the directory in which the #user is currently working.
Command: cd (change directory)
Function: Changes the working directory.
● Example:
#Assume that user Raghu has logged in and then

#gives the following command:
$ cd/user <RET>
$ pwd <RET>
/user
$
#Here, the full path name of the directory which #the user wants to make current,
has been
#specified.
Relative path names can also be used to change to a parent directory. For example,
Raghu can enter the following command after logging in, to change to the parent
directory of his HOME directory:
● $ pwd <RET>
/user/raghu
$ cd.. <RET>
$ pwd <RET>
/user
$
#Here, .. refers to the parent directory of
#the current working directory.
Note: The cd command without any path name always takes the user back to the
user’s HOME directory.
Command: head
Function: Displays the initial part of a text file.
● Example:
#Assume that the user needs to display the first #four lines of a file called
wordlist.
$ head -4 wordlist
Command: tail
Function: It is a complement of the head command that displays the last part of a text
file.
Example:
#Assume that it is needed to display text from the
#10th line from the beginning of a file called wordlist
$ tail +10 wordlist
Command: mkdir (make directory)
Function: Creates sub-directories.
● Example:
#Assume that Raghu has just logged in and
#gives the following command
$ mkdir prog_files <RET>
$
The sub-directory prog_files is created under the current directory. However, the
current working directory does not change to the new directory.
Full path-names can be specified when creating a directory.
Command: rmdir (remove directory)
Function: Removes empty directories.
● Example:
$ rmdir cob_prog <RET>

$
#Here, cob_prog, the directory to be removed, is #the directory under the
current
#working directory.
#The directory cob_prog should:
#be empty (not contain files or sub-directories)
#not be the current working directory.
The full path name may also be specified with rmdir.
Command : ls (lists the contents of a directory)
Function: Lists names of files and sub-directories of a directory. There are many
options available with this command.
● ls –l: Lists files in the long format. The files are displayed along with their mode,
number of links, owner of the file, file size, modification date and time and
filename. (The command l is equivalent to ls -l).
● ls –t: Lists in order of last modification time.
● ls –a: Lists all entries including the hidden files.
● ls –d: Lists a directory file instead of its contents.
● ls –u: Lists files in order of last access time.
● ls –r: Lists the files and directories in reverse order.
● Example:
#Assume that in the sub directory

#/user/raghu, raghu has 2 files

#called prog_files.
$ ls /user/raghu <RET>
data1
data2
prog_files
$
#Here, the names are printed in stored order.
The directory name is optional if the names of files and directories under the current
working directory are to be listed out.
● Example:
$ ls -l
total 32
-rw-r--r-- 1 raghu group 63 Feb 22 17:44 100no
-rw-r--r-- 1 raghu group 100 Dec 8 17:35 add
-rw-r--r-- 1 raghu group 14 Jan 12 12:57 ca
drwxr-xr-x 2 raghu group 512 Apr 1 12:36 caa
-rw-r--r-- 1 raghu group 99 Jan 12 12:54 chco
-rw-r--r-- 1 raghu group 115 Dec 8 18:00 comp
-rw-r--r-- 1 raghu group 74 Feb 22 17:53 eod
-rw-r--r-- 1 raghu group 124 Feb 22 17:53 evodd
-rw-r--r-- 1 raghu group 40 Dec 8 17:37 ex1
-rw-r--r-- 1 raghu group 139 Dec 9 14:17 fibno
-rw-r--r-- 1 raghu group 61 Feb 22 18:28 forno
-rw-r--r-- 1 raghu group 149 Dec 8 17:51 greater
-rw-r--r-- 1 raghu group 205 Dec 8 17:57 greater1
-rw-r--r-- 1 raghu group 109 Dec 8 17:41 mul
$
MORE UNIX COMMANDS

File Commands
More on Relative Path-Name in File Commands
Pipes and filters

File Commands
Command: cat (concatenate files)
Function: Displays the contents of one or more files.
● Example:
$ cat data1 <RET>

A sample file
$
#Here, the command assumes that data1 is
#in the current working directory.
Full path-names can also be specified to display a file in another directory.
The cat command can also display more than one file as shown in the following
example:
● Example:
$ cat data1 data2 <RET>

A sample file
Another sample file #(contents of data2)
$
Command: cp (copy one file to another)
Function: Duplicates files.
● Example:
$ pwd <RET>
/user/raghu
$ ls <RET>
data1
data2
prog_files
$ cp data1 data3 <RET>
data1
data2
data3
prog_files
$
#Here, the file data1 is copied to a new file called #data3. If data3 already
exists, its contents will be #destroyed and replaced by the contents of data1.
Files can also be copied to and from another sub-directory.
Command: rm (remove files)
Function: Removes files.
● Example:
$ rm data1 <RET>
$ ls <RET>
data2
data3
prog_files
$
#Here, the files to be removed are assumed to be #in the current working
directory.
Command: mv
Function: Changes the names of a file or directory/moves files from one directory to
another.
● Example:
$ mv data3 new file<RET>

$ ls <RET>
data2
newfile
prog_files
$
Example:
#Renaming a directory
$ mv/user/raghu/prog_files/ user/raghu/programs <RET>
$
In UNIX, a file can also be moved to another directory (not copied) as shown below:
● Example:
$ mv data3/user/raghu/programs/data3 <RET>
Command: ln
Function: Establishes an additional file name for the same ordinary file.
● Example:
$ ln add addition
$
The advantage of the ln command is that several users can have access to common
data files. Any modification made in the additional file or in the original file is reflected
in both of them.
More on Relative Path-Name in File Commands
In any command, a directory can be referred to in one of two ways:

By it’s full path-name. For example:
● /user
#Refers to the directory user under the root #directory.

In case the root directory is the current working directory, then its sub-directory-user
can be referred to simply by its name.
The usage of the above is illustrated below:
● $ pwd <RET>
/
$ cd/user <RET>
/user
$ cd.. <RET>
$ pwd <RET>
/
$ cd user <RET>
$ pwd <RET>
/user
$
A single dot (.) is used to refer to the current working directory. Consider the following
example:
● $ pwd <RET>
/user/raghu
$ cp/user/raghu/abc <RET>
$
#Here, the dot (.) refers to the current working #directory. So, this command
copies the file abc #from the directory /user/raghu to the current #working
directory i.e. /user/raghu, under the
#same name. In place of the dot, /user/raghu #could have been used but using
the dot saves on #typing.
The rm –r command
With the rmdir command, UNIX allows only an empty directory to be removed.
UNIX provides an option with the rm command to remove a directory without having to
go through the rigmarole of removing each file and subdirectory.
The command to do so is shown below.
● Example:
$ rm –r olddir <RET>
$
#Here, olddir is assumed to be under the current #working directory. The
directory to be removed #cannot be the current working directory.
#The –r option must immediately follow rm #(separated by one or more spaces).
The full path name or relative path name for a directory may be specified with the rm-r
command.
Another option provided with the rm command confirms the removal of any file before
actually removing it.
● Example:
$ rm -i file1 <RET>
file?
#Here, file1 is a file under the current working #directory. -i option like –r must
immediately #follow rm. For confirmation of removal of a #directory interactively,
the –r and –i options #have to be used in combination, as follows:
$ rm – ri olddir <RET>
olddir?
#Here, the user is asked to confirm the removal #of the directory. Once this
confirmation is #given, the files under the directory are #displayed one by one for
individual confirmation #of removal.
$ rm –ri olddir <RET>
olddir?y
accounts?y
letters?y
$
Pipes and filters
In Unix, commands were created to perform single tasks only. If we want to perform
multiple tasks in one command, it is not possible. Redirection provides an answer to this
problem. But it creates a lot of temporary files, which are redundant and occupy disk
space. Pipes and filters are used to overcome this obstacle.
Pipes
A pipe is a mechanism, which takes the output of a command as its input for the next
command.
● Example:
$ who | wc -l
3
$
Here, the output of the command who is taken as the input for the wc -l command and
the result of that is displayed.
The important advantages of pipes are that they prevent users from making temporary
files using I/O redirection.
Filters
A filter takes input from the standard input, processes it and then sends the output to
the standard output. Filters also take input from a file.
Filters are used to extract the lines that contain a specific pattern, to arrange the
contents of a file in a sorted order, to replace the existing characters with some other
characters etc.
Filters are also used to store the intermediate results of a long pipe. We can extract
specific columns of a file and can merge two or more files together using filters.
The following are some of the filters:
sort filter
The sort filter arranges the input taken from the standard input in alphabetical order.
This filter comes with various options like -r, -f, -n, -b, -t etc.
● Example:
$ sort
anitha
zarina
yamuna
<press Ctrl +d to see the sorted output>
anitha
yamuna
zarina
$
-r option displays input taken from the keyboard in the reverse alphabetical order.
● Example:
$ sort -r
jaya
mala
rita
rita
mala
jaya
$
-f option ignores the case distinction and arranges the output in alphabetical order.
● Example:
$ sort -f
Vanita
vimala
girija
Gomati

girija
Gomati
Vanita
Vimala
$
#Note: The ASCII values for A to Z are less #than that of a to z.
+pos1, -pos2 option is used to sort on any one field.
● Example:
#Let us assume that we have a file called names #to contain the following:
$ cat names
george mathew thomas
gideon kumar anand
slyvia mary peter
$
#Let us assume that we want to sort on
#the middle name. Then, $ sorts +1 -2 names
#gideon kumar anand
#slyvia mary peter
#george mathew thomas
$
#Let us assume that we want to sort on the last #name. Then, $ sort +2 -3
names
#gideon kumar anand
#slyvia mary peter
#george mathew thomas
$
grep filter
This command is used to search for a particular pattern from a file or from the standard
input and display those lines on the standard output. grep stands for "global search for
regular expression". The various options available with this command are:
● -v: Displays only those lines which do not match the pattern specified.
-c: Displays only the count of those lines which match the pattern specified.
-n: Displays those lines which match the pattern specified along with the line
number at the beginning of the line.
-i: Displays those lines which match the pattern specified ignoring the case
distinction.
● Example:
#Let us assume that we have a file called data #whose contents are as follows:
$ cat data
India 6890 Asia
China 8765 Asia
France 3243 Europe
Nigeria 3212 Africa
Argentina 1234 South America
Mexico 4563 North America
$
#If we want to extract those lines that have #"Asia", then
$ grep "Asia" data
India 6890 Asia
China 8765 Asia
#If we want the above example to ignore case #distinction then
$ grep -i "asia" data
India 6890 Asia
China 8765 Asia
$
egrep command
This is an extension of the grep command. egrep means "extended global search for
regular expression". Multiple patterns can be searched for in a file using a single
command line. These multiple patterns should be separated by ‘|’ symbol.
● Example:
#Let us consider the same above example data. Suppose #we want to search for
the lines which have "India", #"Nigeria", "Argentina" then
$ egrep "India|Nigeria|Argentina" data
India 6890 Asia

Nigeria 3212 Africa
Argentina 1234 South America
$
fgrep command
The fgrep command stands for "fixed grep". This command is used to extract only
fixed strings without the use of any regular expression. The fgrep command when used
with -x option is used to extract those lines that match the string exactly.
● Example:
#Let us suppose that we have a file called old whose #contents are as follows:
$ cat old
this is a test
to extract the exact string.
this is only possible
by this command.
$
#Suppose we specify the string to be searched for as #"this is", then
$ fgrep "this is" old
this is a test
this is only possible
$
pg filter
This displays the output of a command on the screen page by page.
● Example:
$ ls -l | pg
more filter
It functions in the same way as the pg filter. The difference between the pg filter and
the more filter is that viewing a screenful of the latter can be done by pressing the
space bar, while that of the former is done by pressing enter.
● Example:
$ who | more
cut command
One particular field from any file or from the output of any command can be extracted
and displayed using the cut command. One particular character can also be extracted
using the -c option of this command.
● Example:
#Let us consider the following file called mast:

$ cat mast
b001 : kane and abel : j archer
b001 : kane abd abel : j archer
boo3 : the naked face : s sheldon
$ cut -d ":" -f2 mast
kane and abel
kane and abel
the naked face
$ cat mast | cut -d ":" -f3
j archer
j archer
s sheldon
paste command
This command merges the contents of two files into a single file. It reads a line from
each file in the file list specified and combines them into a single line.
● Example:
#Let us suppose that we have two files, first and #second.

$ cat first
george
victoria
slyvia
$ cat second
mathew
thomas
peter
$ paste first second
george mathew
victoria thomas
slyvia peter
$
tr command
This command is used to translate characters. This command when used with -s option
squeezes multiple spaces into a single space.
● Example:
#Let us assume that we have file called bag.

$ cat bag
The bag contains an assortment of things.
$ cat bag | tr "[a-z]" "[A-Z]"
THE BAG CONTAINS AN ASSORTMENT OF THINGS.
$
THE STRUCTURE OF
UNIX
The Kernel
Functions of an operating system
Features of UNIX
Portability
Modularity
File structure and Security

I/O Redirection and Piping
File Management
The components are:

● The Kernel
● The Shell
● Commands and
Utilities
● Applications
The Kernel
The Kernel is the heart of the UNIX operating system. All the other components call on
the services of the Kernel. When the system is booted, the Kernel is read into memory.
It stays in memory while the system is running. The Kernel programs consist of 10,000
lines of ‘C’ code and 1,000 lines of Assembly code, out of which, 200 lines for hardware
efficiency and 800 lines of code for certain operations are available which was not
possible in ‘C’ language.
The function and services of the Kernel are:
● File Management and Security Management
● Input/Output Services
● Process Scheduling and Management
● Memory Management
● System Accounting
● Interrupt and Error Handling
● Date and Time Services
Functions of an operating system
Memory Management
Every program that is executed by the user initiates a process. Every process that is
initiated needs memory for execution. The operating system allocates and deallocates
memory for the various processes running on the system as per their requirements and
priorities.
Disk Management
Data and programs that the user generates while working on the system are stored in
magnetic form on the disks. Therefore, it becomes necessary to maintain a record of
the disk space occupancy by the various files on the disk. The OS maintains a record of
the occupancy in a separate location on the disk, which is called the File System. Each
OS has its own way of performing these tasks.
Peripheral Management
The various devices such as Monitors, Printers, Communication Ports etc., connected to
the system are called as Peripherals. Just like programs, the peripherals also needs
some memory on the system for their functioning. The peripherals have to be initiated
before there are actually used by the system. This is done by generating appropriate
interrupts. The operating system takes care of the proper functioning of the peripherals
attached to the system as per the user's requests.
The Shell
The Shell is the component that interacts directly with the user. It is a command
interpreter. It takes commands from the user and carries them out, one at a time.
There are several types of Shells available to UNIX users. The Bourne Shell and the C
Shell are available on the most prominent UNIX systems.
The Shell begins to operate as soon as the user logs onto the system. Although the
Kernel allocates resources, the Shell accepts keystrokes and essentially arranges for the
Kernel to run a command.
The Shell thus forms a layer above the Kernel and hides it from the user. It interacts
with the Kernel by passing commands and the other information- arguments or
parameters to the Kernel. The Kernel, in turn, loads programs and commands and
ensures that the commands have access to the parameters and other information. The
Shell thus forms a layer of abstraction over the Kernel, protecting the system resources
from incorrect accesses.
Commands and Utilities
Commands are programs that performs specific operating–system tasks. The
commands are invoked by name through the Shell. UNIX commands and utilities cover
a wide range of operations, from editing, copying, and erasing files to sending and
receiving electronic mail messages to managing the progress of a large software
development project.
Application
An applications program is software that performs a specific type of task. Word
processors, spreadsheets, database managers, and CAD/CAM programs are all
examples of applications. As far as the Kernel is concerned, applications are just
programs, as are Shells and commands. To start an application, you enter its name just
as you would a command’s name. An application is executed in a Shell.
Features of UNIX
UNIX has a number of features and capabilities. Its major features are:
● Multi-user, Time-sharing OS
● Multi-tasking OS
● Portability
● Modularity
● System security
● File Structure and Security
● I/O Redirection and Piping
● Device independence
● Communication
Multi-User Operating System
UNIX is a multi-user operating system, which allows several users to work on the
computer system at the same time. The users can work on the system through multiple
terminals connected to a host computer. To provide a multi-user environment, UNIX
uses the time-sharing method. The users’ programs are queued up and executed for a
fixed amount of time, after which its status is saved, and the next user’s request from
the job queue is executed. The first process gets its chance again after all the other
programs are executed once. This rule may be violated due to priority level differences.
Multi-Tasking Operating System
UNIX, as a multi-tasking operating system allows a user to perform a number of tasks
at the same time. This can be done by placing some tasks in the background while the
user continues to work in the foreground. The user cannot control the execution of the
tasks in the background.
Portability
Portability refers to the ability of software that operates on one machine to operate as
efficiently on any another machine, without making major changes to the software.
Portability has three aspects:
Porting the Operating System
UNIX OS is portable i.e., it is hardware independent and can be easily installed on
machines with varying architecture. UNIX gets its portability not because of some
special design but because of the language used to develop the system. Around 90% of
UNIX is written in machine-independent ‘C’ language and only the remaining small part
of it is written in machine-dependent assembly language.
To port the UNIX OS to another machine, only the machine-specific part of the Kernel
has to be rewritten in the assembly language of the new machine, while the rest of the
system is obtained by compiling the higher level ‘C’ code.
Porting the Data
UNIX utilities are available for transferring data from one system to another.
Porting an Application
Application programs written in any high level language, such as COBOL, FORTRAN, C
or BASIC can be easily transported to a new system by recompiling the source codes.
Modularity
One of the unique features of the UNIX system is its modular design. The UNIX system
comes with a large number of utility programs, which are designed to independently
perform a specific task. The programs are called modules. Any module can be mounted
or unmounted from the system as when the need arises.
Modular Structure of UNIX System

System Security
UNIX being a multi-user operating system, it becomes necessary to offer protection to
the user’s information from illegal accesses. The system maintains a list of users who
are allowed to access the system. It also keeps track of what files and resources each
user is authorized to use.
New users have to be added to the list before they can have an access to the system.
Users are assigned a user name and password. The concept of superuser helps in
maintaining and administering the system i.e., the administration of the entire system.
The system is thus not open to all the users at all levels. It is the system administrator
who creates user accounts and assigns them privileges. Without the necessary
privileges, the user cannot access the system’s resources.
File structure and Security
UNIX has a hierarchical file structure. Its file structure is like an upside down tree with
the roots at the top. The UNIX file structure allows the files to grow dynamically. It
allows the files to be organized into various directories for easy access. This structure
also allows implementing the File security system.
File Security system
UNIX provides security at file and directory level. Read, write and execute permissions
are given to each file at three levels.

USER: The user is the owner of the file. The owner of the file is the one who has
created the file or the one to whom the ownership has been transferred by the creator
of the file.
GROUP: The members of a group have the same group-id but different user-id’s. The
members of a group may be members of a programming team who share the same
data for testing their programs.
OTHERS: The other members are the ones who are neither the owners nor group
members of a file. Read, Write and Execute Privileges can be assigned to the owner, the
group and others.
I/O Redirection and Piping
UNIX commands are designed in such a way that they take input from a conceptual file
called the Standard Input and send their output to another conceptual file called the
Standard Output. UNIX being hardware independent provides a redirection facility
where input-output can be done to or from the files and other devices like printers.
Furthermore, UNIX has a facility called Piping, by which output of one-command acts
as the input of another command. Pipes allow data to be processed by a sequence of
commands and preclude the use of temporary files, and hence, speed up the operation.
Device Independence
The UNIX system considers all devices connected to it as files, this allows input/output
independent of the hardware device actually accessed. The special files through which
the devices are accessed are called device drivers. By writing to and reading from
these files all input/output activity can be performed. Once a device is associated to the
special file, the devices can be considered as files.
Communication
UNIX has three levels of communication:
● Communication between terminals connected to the same computer
● Communication between two computers at the same site which may not use either
the same hardware or software, through local area networks.
● Communication between two computers at remote locations through public and
international data communication networks.
File Management
UNIX directory structure
root
etc dev bin lib tmp usr

where
etc directory comprises of all system calls and commands
bin directory comprises of all other commands
dev directory comprises of all device files entry
lib directory comprises of all library call and functions
tmp directory is used by the Kernel for dumping any data
which is not processed.
usr directory has all the files related to users
vi EDITOR
Entering vi
Basic vi commands
Using vi
Deleting and changing text

Control commands
Editing is one of the major things done while working on a computer. A screen editor is
used for this purpose.
A screen editor allows you to see the portions of your file on the terminal’s screen and
to modify characters and lines by simply typing at the current cursor position. The
cursor is the little blinking line or box that shows you where the next character will be
printed, either by you or by the system. In a screen editor, you move the cursor around
your file until you find the part that you want to modify, then you add or change the
text to your linking.
The screen editor supplied with UNIX system is vi, which is more popular. vi was
developed by the University of California at Berkeley and is also supplied with the
Berkeley distribution of the UNIX system.
Entering vi
Now, you can run vi just like any other UNIX command. When you start vi, it will print
out the file name, number of lines, and number of characters at the bottom of your
screen.
Moving around
The basic screen motion commands are h, j, k and l. The motions of h, j, k, l are left,
down, up and right respectively. You can precede these keys with numbers, which allow
you to move more than one column or line at a time.
If you try to move quickly to the beginning or end of the file, vi will beep. In general,
when vi doesn’t like one of your commands, it will beep. Instead of the l key, the space
bar can also be used. The space bar has exactly the same effect on the cursor as the l
key.
Adding Text
For adding text, position the cursor over a character and type in ‘a’. This put you in a
special mode of operation called append mode. Now, everything you type is appended
to the text after the character the cursor was positioned over.
When you press the Esc key, the cursor moves back to the last character you entered.
This way, you know that you are no longer in append mode.
The second way of adding text is with the i command. i works like the a command, but
it inserts instead of appending, meaning the characters you type in are placed before
the current character position. i also puts you insert mode, requiring an Esc to finish
inserting. After the Esc is pressed, the cursor moves back to the last character inserted,
just as with the a command.
Deleting text
Now, that you can add text to a file, the next thing to learn is how to delete text.
Basically, there are two commands that delete text in vi: x and dd.
To delete one character, you use the x command. x deletes the character at the current
position, moving the rest of the line left into the void created by the deleted character.
To delete a line, we use the dd command.
Saving the file
There are several ways to write a file in vi, but the easiest is with the ZZ command.
When you type in two capital Z’s, vi will automatically write the file and quit, putting
you back in the Shell. Once you’re back in the Shell, all the special screen-editing
features go away.
Summary of the basic vi commands
Command Operation
H Move cursor left

J Move cursor down
K Move cursor up
L Move cursor right
A Append after cursor
I Insert
X Delete character at
current position
DD Delete
P Paste
ZZ Write file and quit
Using vi
Scrolling
Screen editors use scrolling to edit large files. Only the first 23 lines are displayed by vi
when you start it on a large file. As you work with the file and move about in it, the top
lines will sometimes scroll up off the screen as more lines are displayed at the bottom.
Some times the lines at the bottom will scroll down off the screen as more lines are
listed at the top. As you can see, when you try to move past the bottom of the screen,
vi scrolls the screen up a line. After scroll, the text on the message line usually gets
clobbered.
vi also supplies you with commands for scrolling several lines at a time. The ^d and ^u
scroll half a screen down or up if there are enough lines in the file to do so. You also
have the commands ^f and ^s that scrolls forward and back one full screen,
respectively.
When you type in a /, vi puts a / on the message line at the bottom of the screen. As
you type in the characters you want to search for, vi puts these characters on the
message line, so you can see the string you’re searching for. The / command searches
forward or down through a file, finding the next occurrence of a string. Suppose you
want to search for the same string in the next occurrences, you can use / command
without typing the string again.
If you used the ? Command instead of /, then it would search upwards for the previous
string.
Words
The objects called words that are simply letters and numbers separated by blanks,
tabs, or punctuation marks can be used in vi. vi allows you to move from word to word,
deleting them, and changing them with simple commands. The w command moves the
cursor to the next word, the b command moves the cursor backward a word, and the e
command moves the cursor to the end of a word.
Deleting and changing text
vi provides you with several ways to delete and change text. One method of deleting
the text is with the d command. The d command is always followed by another
character that specifies what will be deleted. The dd is a special case of the d command
that deletes the current line. The c command changes whatever lies in specified motion.
It puts you in input mode so you can type in your changes.
Control commands
There are certain commands in vi that are preceded by a colon (:). These are
controlling commands that have to do with external file manipulation and some special
function of vi. The file manipulation commands include: q, which quits vi; :w, which
writes the file without quitting vi; and ql: which quits vi without writing, discarding all
changes.
The control commands that perform special functions include: number, which moves
the cursor to the specified line number, scrolling if necessary and the ed -command,
which causes the ed command to be executed.
● The w command advances the cursor to the beginning of the next word.
● The b command moves the cursor back to the beginning of the current word.
● If the cursor is already at the beginning of the word, it will be moved back to the
beginning of the previous word.
● The e command will advance the cursor to the end of the current word. If the
cursor is already at the end of a word, a subsequent e command advances the
cursor to the end of the next word.
● The return key is used to end input for editing commands that take a variable
length argument.
Programming WITH THE SHELL

System variables
Environmental variables
The MAIL variable
The variables PS1 and PS2:

The SHELL variable
The TERM variable
The Shell not only plays an important role in command interpretation but also in more
things than that. The Shell also has rudimentary programming features, which, coupled
with the use of UNIX commands and other programs, make it an externally useful
programming language. The scope for Shell programming often goes beyond the limits
of conventional languages. In addition to the external commands, you can use the
entire set of the Shell’s internal commands inside a Shell program. These internal
commands can be strung together as a language, with its own variables, conditions and
loops. What makes Shell programs powerful is that an external command can be used
as a control command for any of the Shell’s constructs. The language has features
borrowed from C, though most of its constructs are compact and simpler to use than
those in C. However, Shell programs run slower than those written in high-level
languages.
System variables
While you are free to set up your own variables, there are a number of variables that
are separately set by the system, some during the booting sequence, and some after
logging in. These variables are called system variables. To have a complete list of
variables you can user the set statement.
● $ set
HOME = /usr/icm
IFS =
MAIL = /usr/spool/mail/icm
PATH = /bin: /usr/bin
PS1 = $
PS2 = >
PWD = /usr/kk/icm
SHELL = /bin/sh
TERM = ansi
$
By convention, such built-in variable names are defined in uppercase. If you use
variables of your own, then for the purpose of distinguishing them from the ones built
in, it is preferable to use lowercase names only.
Environmental variables
Besides user-defined variables, the Shell has special variables called environmental
variables.
The PATH variable:
The PATH variable contains a list of all full path names of directories that are to be
searched for any executable program.
● $ echo $PATH
/bin:/usr/bin:
$
This shows a list of directories that have to be scanned by the Shell while hunting for a
command. In this case, the search will begin from /bin, through /usr/bin, and finally,
to the current directory indicate by a dot (.). This sequence makes the commonly used
UNIX commands universally available to every user. By default, every user has the
directories /bin and /usr/bin included in his PATH variable.
The HOME variable:
When you log in, UNIX normally places you in a directory named after your login name.
This directory is called the home or login directory, and is controlled directly by the
variable HOME. You can switch from any directory to your home directory by using the
evaluated variable $HOME as an argument to the cd command.
● $ pwd
/usr/bin
$ cd $HOME
$ pwd
/usr/icm
$
The IFS variable:
The IFS variable contains a string of characters, which are used as word separators in
the command line. The string normally consists of the space, tab and newline
characters. All these characters are invisible, but the blank line following it only
suggests that the newline character is part of the string. You can confirm the contents
of this variable by taking its octal dump.
● $ echo "$IFS" | od –bc
0000000 \t \n \n
040 011 012 012
0000004
$
The space character is represented by the ASCII octal value 040, while \t and \n
universally represent the tab and newline characters, respectively. You can reset the
value of this variable to a #, for instance and use this call character as the word
delimiter.
● $ IFS = #
$ cat #emp.lst
$
The MAIL variable:
MAIL determines where any mail addressed to the user is to be stored. All incoming
mail will be stored in the directory defined by the variable. The Shell searches this
directory every time a user logs in. If any information is found here, then the Shell
informs the user of this with the familiar message "You have mail".
The variables PS1 and PS2:
The Bourne shell has two prompts, stored in PS1 and PS2. The PS1 variable contains
the system prompt ($). You will see when using the Shell loop structures how a
multi-line command makes the shell respond with a>. This is the secondary prompt
string indicated by PS2. Normally, PS1 and PS2 are set to the characters $ and >,
respectively.
You can change the primary prompt string to C> if you find the MS-DOS environment
more reassuring.
● $ ps1 = "c>"
C>
To reset the variable to the familiar $, simply make the reassignment at the C>
prompt.
● C> ps1 = "$"
$
Although $ is the most commonly used primary prompt string, the system administrator
uses the # as the prompt while working in the root directory or as a super user.
The SHELL variable:
SHELL determines the type of the Shell that a user sees on logging in. Today, there are
a host of Shells that accompany any UNIX system, and you can select the one you like
the most. Besides the Bourne Shell; this is the most popular Shell in the UNIX world,
the C and Korn Shells have also carved out a niche for themselves, because of some
inherent advantages. The C shell is known by the program csh, and the Korn shell by
ksh.
The TERM variable:
TERM indicates the terminal type being used. There are some utilities, which are
terminal-dependent, and they require knowing the type of terminal being used. One of
them is vi editor, which makes use of a control file in a sub-directory
/usr/lib/terminfo. If TERM is not set correctly, vi won’t work, and display will be
faulty.
The Script Executed During Login Time – the . Profile
The .profile in UNIX is similar to the Autoexec.bat file in DOS. The ls–a command
locates the .profile in your login directory. The file is created when the system
administrator adds a user to the system. This is really a Shell script, which is executed
by the Shell when a user logs in.
● $ cat .profile
HOME = /usr/icm/progs/
PATH = /bin:/usr/bin:.:/usr/icm/progs
MAIL = /usr/mail/icm #mailbox location
IFS =
PS1 = $
PS2 = >
echo "Today’s date is ‘date’"
cd
echo "You are now in the HOME directory"
Some of the system variables have been assigned in this script. The HOME variable
here, is set to the progs sub-directory, so that when you cd without arguments, you
will switch to this directory.
The .profile alters the operating environment of a user, which remains in effect through
the login session. That is why every time you make changes to it, you should logout and
login again. The file is routinely executed during login time by the Shell in a special
manner.
CONDITIONS AND LOOPS

The if condition
if’s companion - test
test – file tests
test - string comparison

The case…esac construct
The if condition
Like the constructs for looping and decision-making, the UNIX Shell also offers
constructs that can be used in shell scripts.
if <condition>
then
<execute command(s)>
else
fi
if evaluates a condition which accompanies its command line. If the condition is fulfilled,
then the sequence of commands following it is executed. Mark the keywords then and
fi, which must necessarily accompany every if conditional. The construct also provides
for an alternate action using the optional keyword else. This is not always required, and
the simplest form of the if statement thus condenses to:
if <condition>
then
fi
if’s companion - test:
When if is used to evaluate an expression, the test statement is invariably used as its
control command. test evaluates the condition placed on its right, and returns either a
true or false exit status. This return value is used by if taking decisions. For this
purpose, test uses certain operators to evaluate the condition. The complete set of
operators is shown in the following table.
Operator Meaning
-eq Equal to -ne
-ne Not equal to
-gt Greater than
-ge Greater than or equal to

-lt Less than
-le Less than or equal to
test – file tests
test can be used to test the various file attributes. You can test whether a file has the
necessary read, write or executable permissions. As abridged list of the possible file
related tests is shown in the following table.
Test Exit Status

-f <file> True if <file> exists and is a regular file
-r <file> True if <file> exists and is readable
-w <file> True if <file> exists and is writable
-x <file> True if <file> exists and is executable
-d <file> True if <file> exists and is directory
-s <file> True if <file> exists and has a size greater than zero
test - string comparison
The third type of usage of the test statement is in testing and comparing strings. There
is nothing unusual in these tests which you can’t find elsewhere, though test uses a
different syntax. The following table lists the string handling results.
Test Exit Status
-n stg True if string stg is not a null string
-z stg True if string stg is a null string
s1 = s2 True if string s1 = s2
s1 != s2 True if string s1 is not equal to s2
stg True if string stg is assigned and not null
The test output is true only if both the variables are non-null strings i.e. the user enters
some non-whitespace characters when the script pauses twice.
The case…esac construct
The case statement is the second conditional statement offered by the Shell. It doesn’t
have its parallel in most languages. This construct is used in Shell scripts to perform a
specific set of instructions depending on the value of a variable and is often used in
place of the if construct.

The general syntax of case statement is as follows:
case <expression> in
<pattern1> <command>
:
:
;;
● <pattern2> <command>
:
:
;;
<……….>
● esac
The keywords here are the in and esac, and the symbols ;; are used as option
terminators. The construct also uses the ) to delimit the pattern from the action. It
matches the expression first of pattern1, and if successful, executes the commands
associated with it. If it doesn’t, it then falls through and matches pattern2 and so on. A
pair of semi-colons terminates each command list, and the entire construct is closed
with an esac.
● Example:
$ cat menu.sh
echo "MENU"
1) List of Files
2) Process of User
1. 1.
2. 3) Today’s Date
3.
4. 4) Users of System
5.
6. 5) Quit to UNIX"
7. echo "enter your option: \c"
read choice
echo
case "$choice" in
1) ls –l ;;
8. 2) ps –f ;;
9.
10. 3) date ;;
11.
12. 4) who ;;
● 5) exit
esac
$
case can also match more than one pattern with each option. Programmers frequently
encounter a logic, which prompts the user for a response. case becomes even more
powerful in its handling of the Shell wild cards for matching patterns. case and the for
loop are the only Shell control flow statements that accept the same set of
metacharacters as used in matching filenames.
CONDITIONS AND LOOPS1

The while Statement
The break and continue commands
The until Statement

The set and shift Statements
The shift command
The while Statement
The Shell features three types of loops – while, until and for. The first two are
basically complementary to each other.
The while statement should be quite familiar to most programmers. It repeatedly
performs a set of instructions till the control command returns a true exit status. The
general syntax of this command is as follows:
while <condition>
do
<execute commands>
done
The keywords here are do and done. The set of instructions enclosed by do and done
are to be performed as long as the condition remains true. Like in the if statement,
this condition is actually the return value of a UNIX command or program. This means
that you can use the test command here also, with its associated expressions, numeric
and string comparisons, and file tests.
The break and continue commands
break and continue are also commands that are used in C. The break and continue
commands are used with for, while and until commands. The break statement causes
control to break out of the loop and the continue statement suspends execution of all
statements following it, and switches control to the top of the loop for the next
iteration. The continue statement continues if the condition is true till the loop
terminates while the break statement comes out of the loop when it occurs.
The until Statement
The until statement complements the while construct in the sense that the loop body
here is executed repeatedly as long as the condition remain false.
$ until false
do
……
…….
done
Here, the while command is replaced by the until statement. Thus,
while true
has now become
until false
Looping with for
The for loop is different in structure from one used in other programming languages.
There is no next statement here neither can a step be specified. Unlike while and
until, it doesn’t test a condition but uses a list instead. The syntax of this condition is
as follows:
for <variable> in <list>
do
<command>
<command>
done
The loop body is identical in structure to the while and until loops. The additional
keywords are variable and list. The list consists of a series of character strings, with
each string separated from the other by white space. Each item in the list is assigned to
the variable in turn, and the loop body is executed. It performs the loop, as many times
as there are words in the list.

The set and shift Statements
The set statement makes it possible to convert its arguments into positional
parameters. A simple use of this statement can be:
● $ set 9876 2345 6213
$
This assigns the value 9876 to the positional parameters $1, 2345 to $2 and 6213 to
$3. It also sets the other parameters $# and $*. This feature is especially useful for
picking up individual fields from the output of a program.
The shift command
The Shell creates up to only 9 positional parameters, i.e., $1 to $9. So, if there are
more than nine arguments the shift command can be used.
shift transfers the contents of positional parameters to its immediate lower number.
This goes on as many times as the statement is called. When called once, $2 becomes
$1, $3 becomes $2, and so on. The important thing to remember is that the contents
of the left most parameter, i.e., $1 is lost every time shift is invoked. In this way, you
can access more than nine positional parameters in a script.
I/O REDIRECTION AND SYSTEM PROCESSES

Input redirection
Output redirection
System Processes
Input redirection
Taking input from a file other than the user’s terminal keyboard is termed as input
redirection. For this, a ‘<’ symbol is used.
The syntax is
$ <program> < <file name> <RET>
Here, the < (less than symbol) implies redirection from the named file.
Output redirection
The singular statement for adding a record to the file uses the >> symbol. Opening and
closing a file involves disk I/O, resulting in significant overheads. And, when large data
files are processed in this way, performance becomes slow.
But this can be avoided by some simple means. All the Shell’s conditionals and loops
can be redirected. Though they are internal commands, the Shell accords them the
same status as the external commands. All these constructs are redirected by using the
redirection symbols after the final keyword. For example:
done > $filename
Here, the > (greater than symbol) implies redirection of output to the named file. The
file is opened only once, at the beginning of the iteration, and is closed finally when the
loop iteration ends. This speeds up execution time, but all individual statements inside
the loop also get redirected.
System Processes
Running a Process in the Background:
As we already know, UNIX is a multitasking operating system, which means that it can
schedule the execution of more than one program at the same time. In fact, the system
can execute only one program or process at any one instant, but it switches between
processes so quickly, usually within one thousandth of the second, that most of the time
all programs seem to be running at the same time.
If a program that we are running seems to be taking a long time to finish and we would
like to begin work on another task, we may schedule our program to run in what is
known as Background Processing. The main advantage of running programs in the
background is that unlike foreground processes (when you work on a task, which can be
executed only with your intervention), background processes don’t have to have
finished executing before you can begin another program. After beginning a process in
the background, your Shell prompt returns immediately to signal the fact that we may
have invoked another program in the foreground. Then we will have two programs
running, since the background process is still executing while the foreground is running.
Initiating a background process is easy with the UNIX system. Simply type an
ampersand character (&) at the end of the command line invoking the process that we
wish to execute in the background. The Shell will respond by printing an identifying
number known as the process identification number, or PID, on our terminal and
immediately prompt for our next command. The PID serves to identify our background
process uniquely. In a moment we will see how to use this number to inquire about the
status of a background process or even to terminate a background process before it
finishes of its own accord. It’s important to note that all processes, including foreground
processes, have PIDs. We mention PIDs here because the & command happens to
display the PID of the background process that we have invoked.
It is important to note that although you may start more than one background process,
the Kernel program limits the total number of processes (foreground plus background)
to a reasonable value (usually some 20 to 50 per user and 100 to 250 or so
system-wide) for our particular UNIX implementation. From a practical point of view,
the more processes you start in the background, the slower your system runs overall.
One inconvenience of background processing is that output will appear on your terminal
screen intermixed with any output from foreground and other background processes.
Thus, to avoid a confusion display, you may wish to redirect the output of your
background processes to a disk file. We must not initiate a background process that
reads input from the keyboard because any foreground process including your Shell,
which also reads input form the keyboard would then conflict with the input request of
the background process. That is, input destined for the background process might be
read by the foreground process and vice versa. In addition, if the background process
reads its input from a disk file, be sure not to modify that file until the background
process has finished executing. Because the Shell protects such tasks from terminating
in response to an interrupt signal, we cannot terminate a background process with an
interrupt character.
To practice running a process in the background, let’s select a task that generally
requires a minute or so to complete, for example, a task like determining the disk usage
for your entire file system. You may have used the generally available du command to
accomplish this. Further more, we’ll redirect the output of the du command so that the
results are placed in a disk file and du’s output won’t be sent to your terminal screen.
Once du is placed in the background, you may continue with foreground tasks without
the output from the background task distributing the display on your terminal screen.
To invoke du in the background, enter du />du.all& and make a note of the PID
number that appears after you press <return>. The / argument tells du to begin
examining the file system at the root directory.
● $ du />du.all&
800
$
The process numbers are unique and are assigned sequentially by the Kernel, so that in
our example, the very next process that is started in the system would have PID 801,
the next 802 and so on. With du in the background, we may go ahead and run
foreground processes.
UNIX SECURITY FEATURES

File Permission mask
Types of Files
File Access Permissions
Determining the File Access Permissions

Changing File Access Permissions
File Permission mask
All the files inherit a system-defined permission when they are created. Normally, this
permission is rw-rw-rw (octal value 666) for regular files, and rwxrwxrwx (octal
value 777) for directories. The system default is usually transformed by making a
subtraction from it. This is set by the user and is represented by the umask statement
of the Shell. To find out the default value of this mask, you need to enter umask
without any arguments:
● $ umask
022
$
The output shows a set of three octal numbers. It is called a mask because each digit is
subtracted from the system default to remove permission. Thus, 1, 2 and 4 remove the
executable write and read permissions respectively. After subtracting the mask from the
system default, the expression yields 666 - 022 = 644. This represents the default
permissions that you normally see when you create a regular file (-rw-r--r--).
You should make sure that the system default is properly masked, so that other users
don’t automatically have permissions, which you feel they shouldn’t. Normally, the
system administrator takes care of this security feature, but you should also have
knowledge of it, in case the settings made by him is not adequate.
To change the umask setting, you have to use umask with an argument.
● $ umask 777
$
This means that all files are to be created with no permissions on. A value 000 means
that the default permissions will be used while creating files. One important thing to
remember while using this mask is that you can’t turn on permissions not specified in
the system-wide default settings; you can only remove them. However, you can always
use chmod to turn on permissions of files individually, as and when required. The
system default directories is 777, which mean that everyone gets all the permissions.
When you subtract the mask 022 from it, it leaves the permissions for the owner
unchanged, and lowers the other permissions by 2. The user mask thus changes this to
755 (-rwxr-xr-x), which is what a directory gets by default.
Types of Files
As you already know in UNIX, every thing is treated as a file. So, apart from user’s
program files there are also special files such as files that contain information about
directory contents or files that store information about various input-output devices
connected to the system.
In UNIX, there are three types of files:
● Ordinary files: All the files created by the user come under this category.
Examples of these files are data files, program files, object files, executable files
etc. The user can make changes to such kind of files.
● Directory files: For each directory there is a file by the same name as the
directory, which contains information about files under that directory. The user
cannot modify this directory file.
● Special files: In UNIX, most of the system files are special files. The user cannot
alter these special files.
File Access Permissions
In UNIX, there are three different classes of file users and three modes of file access.
These three classes of users and modes of access give rise to the nine different kinds of
access permission allowed within the UNIX file system.
There are three classes of system users. First, every file has an owner; the owner is
usually the system user who created the file. The owner has the full control over
restricting or permitting access to the file at any time.
● Owner: (denoted by u, for user). The owner is the system user who created the
file.
Group: (denoted by g). The group is one or more users who may access the file
as a group.
Other: (denoted by o). The "other" category refers to any other user of the
system.
The UNIX file system allows each user class to access the file independendently of the
other classes, that is, the access rights for the file owner, for the group owner, and for
the "other" user category may be the same or different.
File accessing permissions and their meaning
Access Mode Ordinary File Directory File
Read Allows examination of Allows listing of the files within file contents the directory
Write Allows changing Allows creation of new files and
of file contents removal of old ones
Execute Allows executing file as Allows searching of a directory command
System users with read permission may read (examine) the contents of an ordinary file
by using cat command.
System users with write permission may write to the file and change its contents by
using an editor.
System users with execute permission for ordinary files may execute the file as
command. Generally, an ordinary file is not given execute permission when it is created.
Executing a file only makes sense if the file is actually a program (command) or a Shell
script. A Shell script is a file that contains a list of one or more commands that can be
executed by the Shell.
The meaning of these same access modes is different for a directory file. The system
user with read permission may read (by listing) the contents of the directory using the
ls command. The system user with write permission, on the other hand, may use
certain privileged programs to write on the directory. Write permission is necessary in
order to create files or to remove files.

Determining the File Access Permissions
The three classes of file users (owners, groups, and others) may be combined with the
three types of access (read, write, and execute) to give nine possible sets of
permissions as shown here.
r w s Owner r w s Group r w s Other
The presence of permission is indicated by the appropriate letter being in its correct
location. The absence of permission is indicated by a dash (-) in the same place. For
instance, some common permission patterns for ordinary files and their meanings are:
r—r—r--
This ordinary file can be read (or examined) by all three user classes, but it can’t be
written to (changed) or executed by any one. A file with this set of permissions is
known as read only or write protected.
--x--x--x
This ordinary file can be executed like any command program file by all system user
classes. That is, any user can type the name of the file after the Shell prompt, and the
file will be read into memory by the Shell and run as a command program. This file is
write protected and read protected. Generally, executable programs (commands) such
as cat, ls and the like, that are installed in publicly accessible directories are given this
set of permissions.
rw-------
This ordinary file is readable and writeable only by the file owner. Generally, a file that
the owner wishes to keep private from all other users has this set of permissions. Some
commonly seen permission patterns for directory files and their meanings are:
.rwxrwxrwx
This directory is completely accessible by all system users. In general, public directories
that must be readable and writeable by all the users would have this set of permissions.
.rwx------
This directory is only accessible by its owner, other users cannot search it or read or
write to any files contained in it. In general, users on a multiuser time-sharing system
would establish this pattern of permissions for their home directory workspace.
You can easily determine what type a file is – whether ordinary, directory, or special –
and what set of access permissions it has, as well as other information about the file, by
using the long listing option (-l) with the ls (directory listing) command. The format is
generally similar to the following:
● $ls –l letter
-rw-rw—rw- 1 icmdoc docum 40 Nov 1 13:23 letter

$
Here, we have requested the long directory listing for the ordinary file called letter.
There is only one restriction in the use of the long listing option. The user requesting a
long directory listing must have execute (search) permission in addition to read
permission for the directory that contained the file.
Because the long listing is especially helpful for working with files, it is important to
know how to read it. Here is an explanation of what the various segments or fields in
the long listing mean:
-rw-rw-rw- 1 icmdoc docum 40 Nov 1 13:23 letter
permissions links owner group size date time of filename
last modification
Reading from left to right, we see that the dash (-) in the file type field indicates that
the file is an ordinary file. Next, the access permissions are all enabled. Next, the "1"
indicates that there is only one link for this file from the directory, which means that
this file only has one name associated with it. The word "icmdoc" indicates that the file
owner has the user name "icmdoc": the group that has the access to this file is
referred to as "docum". In the next slot, we see that the file’s size is 40 characters.
The date and time "Nov 1 13:23" show when the file was last modified. Last of all, the
file name is listed. The file name is of course, "letter".
While there are actually four types of files that may be indicated in the first field, the
ones that are important for you to know are the dash (-), which indicates an ordinary
file, and d, which indicates a directory file.
Changing File Access Permissions
The chmod command (for "change mode") allows you to alter the permission modes of
one or more files or directories. Because the command line format for chmod is
somewhat more complicated than the format for other commands you have seen so far,
we’ll lead you through the details one by one:
$ chmod[who] op-code permission …file…
The who argument tells chmod the user class and may be any of the following:
● u: User (individual file owner)
g: Group file owner
o: Users classified as "other"
a: All system users (file owner, group owner, and the "other" category).
The op-code argument represents the operation to be performed by chmod:
● +: Add specified permissions to the existing permissions.
-: Remove the indicated permissions from the existing permissions.
=: Assign the indicated permissions.
The permission argument uses the same abbreviations as you saw earlier in the
discussion of types of file access:
● .r: Read permission

.w: Write permission
.x: Execute permission
Some commands to change the file access permissions:
chgrp: Changes the group ownership of files.
chgrp [OPTION]... GROUP FILE...
Options:
-c, --changes like verbose, but report only when a change is made.
-f, --silent, --quiet suppress most error messages.
-v, --verbose output a diagnostic for every file processed.
-R, --recursive change files and directories recursively.
chmod: Changes the access permissions of the file
chmod [OPTION]... MODE[,MODE]... FILE...
or
chmod [OPTION]... OCTAL_MODE FILE...
Options:
-c, --changes like verbose, but report only when a change is made.
-f, --silent, --quiet suppress most error messages.
-v, --verbose output a diagnostic for every file processed.
-R, --recursive change files and directories recursively.
chown: Changes the user and group ownership of files
chown [OPTION]... OWNER[.[GROUP]] FILE...
or
chown [OPTION]... .[GROUP] FILE...
Options:
-c, --changes be verbose whenever change occurs
-f, --silent, --quiet suppress most error messages
-v, --verbose explain what is being done
-R, --recursive change files and directories recursively
MORE COMMANDS IN UNIX

su: Become a User or another User.
su[ - ] [name] [arg…]]
● Become a superuser without logging off.
● Executes new Shell changing real user-id.
● An entry is made in /usr/admn/sulog.
pwck: Password file checker.

pwck [file]
● Scans the password file and notes any inconsistencies.
● The check includes validation of fields, login name, user-id, group-id, login
directory and program names specified.
● The default password file is /etc/passwd.
grpck: Group file checker.

grpck [file]
● Verifies all entries in the group file.
● Verification includes check of the number of fields, group name, group-id,

existence of login names in /etc/passwd file.
uname: Change or print current system name.
uname [ -snrvma]
Options:
-s: Print the system name (default).
-n: Print the node name.
-r: Print the operating system release.
-v: Print the operating system version.
-m: Machine hardware name.
-a Print all above information.
-s To change system name.
-n To change node name.
umask: Set file creation mask.
umask [000]
● Sets user file creation mode.
● Octal number determines to read/write/execute permissions for owner, group and

others.
● The value of each specified digit is subtracted.
● If an argument is omitted, then the current value will be displayed.
● An entry can be made in the user’s .profile file.
mknod: Build a special file.
Arguments:
Name: Name of special file.
B: Block type (example: disk, tape).
C: Character type (example: printer, terminal).
P: FIFO file.
Major: Major device type.
Minor: Minor device number.
● Makes a directory entry and corresponding I-node.
● Only superusers are allowed to use the first form.
● The major device numbers are specific to each system (can be obtained using the
‘gendev’ command).
Mkfs: Construct a file system.
mkfs special blocks [:inodes] [gap blocks/cyl]
Arguments:
special: Special file on which file system is to be constructed.
blocks: Physical disk blocks.
inodes: Numbers of I-nodes.
gap: Rotational gap.
blocks/cyl: Physical blocks per cylinder (by default 400).
● Only the superuser can use it.
● If the argument % is specified, then mkfs will automatically calculate the correct
number of logical blocks and I-nodes.
● -N option makes mkfs run faster.
Logical block size:

The size of the chunks the UNIX system Kernel uses to read or write files. The logical
block size is usually different from the physical block size (by default 2048 bytes).
Physical block size:
The size of the smallest chunk that the disk controller can read or write to. The value of
supermax is 512 bytes.
Mount: Mount a file system.
mount [special_file directory [-r] ]
● Directory should exist.
● It can be used to check the validity of the file system.
● If invoked with arguments, an entry is added in the table of mounted devices.
● Prints the table if invoked with arguments.
● Only superusers are allowed.
● If the file system is damaged, then mount is not allowed.
● On the damaged file system first run fsck, then mount it.
umount: Unmount file system.

● Unmounts the file system.
● If pwd is the mounted directory, then mounting is not possible.
● Only a superuser is allowed.
fsck: File system consistency check and repair.

Options:
-y: Yes response to all questions.
-n: No response to all questions.
-sx: Construct new free list ignoring the actual one.
-t: To specify the scratch file.
-q: Quiet fsck.
-d: Check directories for bad blocks.
-f: Fast check.
● If the file system is not specified, then it reads from the file /etc/ checklist.
● If the mount command does not work on a file system, first run fsck on that file
system.
● Only superusers are allowed to use this system.
chroot: Change root directory for a command.

chroot new root command
● The given command gets executed relative to the new root.
● The initial / points to the new root.
● Redirection is relative to the original root.
Example:
chroot / newroot is > x

● Only the superuser can use it.
newgrp: Log into a new group.

newgrp [-] [group]
● Changes user's group identification.
● The user should be a member of the called new group.
● Current directory remains unchanged.
● Access permissions to files.
● Without arguments, 'newgrp' changes the group identification back.
● If '-' is present as the first argument, the environment changes.
sync: Update the super block.

sync
● Flushes the internal buffers to the disk.
● To ensure file system integrity, sync must be called before the system is stopped.
nice: Run a command at low priority.
nice [-increment] command
● Runs a command with a lower CPU scheduling priority which is good for others and
hence the name 'nice'.
● Increment range: 1-19.
● Superuser may use a negative increment to run a command with priority.
● Higher than normal.
find: Find files

find path_name_list expression
● Recursively searches the directory hierarchy for each path name.
env: Display or set environment for command execution.

env [-] [name = value] ……[command args]
● Without arguments, env prints current environment.
● Modifies environment according to its arguments, then executes the command with
the modified environment.
● The - flag causes the inherited environment to be ignored completely i.e.,
● command is executed exactly with the environment specified by arguments
time: Time a command.

time command
After the command is complete, time prints the system time, real time and user time
spent in execution of that command.
● Times are printed on standard error.

Real time
Total elapsed time from the beginning of the command execution until it ends.
User time
The amount of time that the program spends executing it’s own code.
System time
The amount of time used directly by the UNIX system in the service of the command.
kill: Terminates a process.
kill [-signo] PID
● Sends terminate signal to the specified process.
● Process number can be found using 'ps'.
● To kill other users’ processes, the superuser mode is required.
● "kill –9…" is a sure kill.
killall: Kill all active processes.

killall [signal]
● Kills all active processes.
● Make mounted file systems unbusied and hence, they can be unmounted.
● By default signal 9 is sent.
fuser: Identify a process using a file or file structure.

fuser[-ku] files[-] [[-ku]files]
Options:
● -u: The user login name follows the process id.
● -k: The SIGKILL signal sent to each process.
● Outputs the process ids of the processes that are using files as specified.
● Only the superuser can terminate another user's process.
od: Octal dump.

od [-bcdosx] [file] [[+]offset [.] [b]]
Options:
-b: Interpret bytes in octal.
-c: Interpret bytes in ASCII.
-d: Interpret words in unsigned decimal.
-o: Interpret words in octal.
-s: Interpret 16-bit words in signed decimal.

-x: Interpret words in hex.
● By default, output with -o option.
● Offset argument specifies the offset in the file where dumping is to commence (by
default octal).
● If . is appended, the offset is interpreted in decimal.
● If b is appended, the offset is interpreted in blocks of 512 bytes.
● If the file is omitted, offset argument must be preceded by +.
● Dumping continues until end-of-file.
ps: It gives the snapshot of the current process.

dd: Convert a file while copying it.
df: Summarize disk free space.
df [OPTION] [PATH]...
Options:
-a, --all Include file systems having 0 blocks.
-i, --inodes List inode information instead of block usage.
-k, --kilobytes Use 1024 blocks, not 512 despite POSIXLY_CORRECT.
--sync Invoke sync before getting usage info (default).
--no -sync Do not invoke sync before getting usage info.
-t, --type = TYPE Limit the listing to TYPE filesystems type.
-x, --exclude
-type = TYPE Limit the listing to not TYPE filesystems type
-v (ignored).
-P, --portability Use the POSIX output format.
-T, --print -type Print filesystems type.
du: summarize disk usage.
du [OPTION]... [PATH]...
Options:
-a, --all Write counts for all files, not just directories.
-b, --bytes Print size in bytes.
-c, --total Produce a grand tota
-k, --kilobytes Use 1024 blocks, not 512 despite POSIXLY_CORRECT.
-l, --count –links Count sizes many times if hard linked.
-s, --summarize Display only a total for each argument.

-x, --one -file -system Skip directories on different filesystems.
-D, --dereference -args Dereference PATHs when symbolic link.
-L, --dereference Dereference all symbolic links.
-S, --separate -dirs Do not include size of sub-directories.
echo: Display a line of text.
head: Output the first part of the file.
head [OPTION]... [FILE]...
Print first 10 lines of each FILE to standard output. With more than one FILE, precede
each with a header giving the file name. With no FILE, or when FILE is -, read standard
input.
Options:
-c, --bytes = SIZE Print first SIZE bytes.
-n, --lines = NUMBER Print first NUMBER lines instead of first 10.
-q, --quiet, --silent Never print headers giving file names.
-v, --verbose Always print headers giving file names.
tail: Output the last part of the file.
tail [OPTION]... [FILE]...
Print last 10 lines of each FILE to standard output. With more than one FILE, precede
each with a header giving the file name. With no FILE, or when FILE is -, read standard
input.
Options:
-c, --bytes = N Output the last N bytes.
-f, --follow Output appended data as the file grows.
-n, --lines = N Output the last N lines, instead of last 10.
-q, --quiet, --silent Never output headers giving file names.
bc: An arbitrary precision calculator language.
cal: Displays a calendar.
For example, when the user enters the cal command, it displays the current month
calendar as follows:
June 1998
Su Mo Tu We Th Fr Sa
1234567
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
file: Determine file type.
Options:
-v: Print the version of the program and exit.
-m: List specifying an alternate list of files containing magic numbers. This can be a
single file, or a colon-separated list of files.
-z: Try to look inside compressed files.
-c: Cause a checking printout of the parsed form of the magic file. This is usually used
in conjunction with -m to debug a new magic file before installing it.
-f namefile: Read the names of the files to be examined from namefile one per line)
before the argument list. Either namefile or at least one filename argument must be
present; to test the standard input, use "-" as a filename argument.
id: Display the current user and groupID name.
id [OPTION]... [USERNAME]
Print information for USERNAME, or the current user.
Options:
-a: Ignore, for compatibility with other versions.
-g, --group: Print only the groupID.
-G, --groups: Print only the supplementary groups.
-n, --name: Print a name instead of a number, for –ugG.
-r, --real: Print the real ID instead of effective ID, for –ugG.
-u, --user: Print only the user ID.
logname: Print user's login name.
logname [option]
lpr: offline print.
Options:
-P: Force output to a specific printer. Normally, the default printer is used (site
dependent), or the value of the environment variable PRINTER is used.
-h: Suppress the printing of the burst page.
-m: Send mail upon completion.
-r: Remove the file upon completion of spooling or upon completion of printing (with the
-s option).
-s: Use symbolic links. Usually files are copied to the spool directory. The -s option will
use symlink (2) to link data files rather than trying to copy them so large files can be
printed.
This means the files should not be modified or removed until they have been printed.
make: To maintain group of programs.
make [ -f makefile ] [ option ] ... target ...
Options:
-C dir: Change to directory dir before reading the makefiles or doing anything else.
-d: Print debugging information in addition to normal processing.
-e: Gives variables taken from the environment precedence over variables from
makefiles.
-f: File Use file as a makefile.
-I: Ignore all errors in commands executed to remake files.
-I dir: Specifies a directory, dir to search for included makefiles.
-k: Continue as much as possible after an error. While the target that failed, and those
that depend on it, cannot be remade, the other dependencies of these targets can be
processed all the same.
-l: Load specifies that no new jobs (commands) should be started if there are others
jobs running and the load average is at least load (a floating-point number). With no
argument, removes a previous load limit.
-o file: Do not remake the file even if it is older than its dependencies, and do not
remake anything on account of changes in file. Essentially the file is treated as very old
and its rules are ignored.
-p: Print the data base (rules and variable values) that results from reading the
makefiles; then execute as usual or as otherwise specified.
pr: Convert text files for printing.
Options:
-a: Print columns across rather than down.
-b: Balance columns on the last page.
-c: Print control characters using hat notation (e.g. ‘^G’); print other unprintable
characters in octal backslash notation.
-d: Double space the output.
-F, -f: Use a formfeed instead f newline to separate output pages.
-h header: Replace the filename in the header with the string header.
-t: Do not print the 5-line header and the 5-line trailer that are normally on each page,
and do not fill out the bottoms of pages (with blank lines or formfeeds).
-v: Print unprintable characters in octal backslash notation.
--version: Print version information on standard output then exit.
-w page –width: Set the page width to page-width columns. The default is 72.
wall: Write a message to users.
Wall [file]
write: Send a message to another user.
write user [@host] [ttyname]
sleep: Delay for a specified amount of time.

sleep [OPTION]... NUMBER [SUFFIX]
Pause for NUMBER seconds. SUFFIX may be s to keep seconds, m for minutes, h for
hours or d for days.
tty: Print the file name of the terminal connected to standard input.
tty [OPTION]...
Print the file name of the terminal connected to standard input.
-s, --silent, --quiet Print nothing, only return an exit status.
UNIX in the Office
"Talking-to" other users:
If you want to send a message to a user who is logged in on another terminal which is
located in another office on the floor, or even in another planet location, making it
inconvenient for you to talk directly with that person, the write command can be used
as an alternative to calling that person on the telephone. write allows you to send a
message to any other logged-in user. The message that you send will appear on that
user’s screen. The user will then have the option to send you a reply by initiating a
write command from his or her own terminal. With this technique, two users can
effectively have a "conversation" through their terminals.
The syntax of the write command is
write user tty
where user is the user-id of the logged-in user and tty is an optional tty number. This
later information is needed when there is more than one logger-in user with the same
user-id. The tty number designates the terminal that the message is to be sent to. As
an example, the command
write tim
tells the UNIX system that you wish to start a conversation with the user tim. If tim is
not currently logged in, then the following will occur:
$ write tim
tim is not logged on
$
After initiating the write, the system will wait for you to type your message. This
message can contain as many lines as you like. Each line that you type will get
displayed on tim’s terminal. When you have finished typing your message, enter
CTRL-D as the first and only character on the line. This will terminate the write and
display the line <EOT> on tim’s terminal to tell him you have finished. As a matter of
convention, most UNIX users typically end their message lines with the characters -o to
tell the other user that their message line is finished and that they are awaiting a reply.
The character -oo are often used to signal the end of the conversation.
Inhibiting messages with the mesg command
If you don’t want to receive any messages while you are doing some important work
and don’t want to be disturbed, it can be done by using the command mesg. This
command takes a single argument n or y. the former specifies that you don’t want to
receive any messages; the latter specifies that you do. So, to inhibit incoming messages
while you are editing a file, type the command
mesg n
before you enter the editor. Then, after your edits are complete, you can tell the system
that you’re willing to again receive messages by typing the command
mesg y
Electronic mail
Electronic mail are the most common words used these days. Electronic mail gives you
the ability to send messages, memos, or any types of documents to other users
electronically, that is, without the use of paper.
The main difference between sending a message to someone using the electronic mail
facility and the write command, is that the latter requires that the person be logged in
at the time that the message is sent. With electronic mail, the mail is automatically kept
by the system until the user issues the necessary command to read his or her mail.
Under the UNIX system, the mail command handles the sending and receiving of
electronic mail. The format of the command to send mail to a user is simple:
mail user
where the user is the user-id of the person you want to send the mail to. Once this
command line has been typed, the mail program will then wait for you to type your
message to be sent to the user. Periodically, the Shell automatically checks to see if you
have received any new mail. If you have, then you will get the following message
displayed at your terminal.
You have mail
FILE SYTEM INTERNALS

Partitions and file systems
The four components of a file system
Disk blocks and i-nodes

The Structure of the I-node
As it is already known, UNIX treats everything as a file and accesses directories and
devices with the same commands. However, the administrator can’t afford to remain
ignorant of the file system internals, as this knowledge is necessary for the
maintenance of a healthy and correct file system. This chapter thus contains material
mainly useful to him. Knowledge of file system is essential for the system administrator
in order to be able to fix inconsistencies that tend to crop up.
Partitions and file systems
Like other operating systems, a formatted hard disk (also known as fixed disk) must be
made available before UNIX can be installed. Normally, the formatting operation lies
outside the operating system’s domain and is carried out by special utilities supplied by
the vendor. Formatting creates address information on the platters of disk and also
isolates bad sectors.
After the disk has been formatted the next step is to divide it into several slices called
partitions.
The disk partition should be made for it, and has a lot of advantages:
● If the entire operating system along with the user files and directories is in one big
superstructure, there may be conflicts that may arise between the various data
areas. This can be prevented by disk partition.
● If the disk is partitioned, and if there is corruption in one area, other areas are
effectively shielded from this evil influence.
● Backups can be made easy. If the system has an adequate number of partitions,
then each partition can be backed up separately in a single volume of tape.
● By all these, the job of system administration becomes a lot easier, if the disk is
partitioned.
Each partition can contain a file system as shown:
disk drive
Disk drive, partitions, and a filesystem

The following figure shows the filesystem in more detail:
directory blocks and data blocks
file system in more detail
f1 First data block

f2 second data block
f3 third data block

Most UNIX systems usually will have at least the following three partitions:
● The root partition - This is the partition that is used for the bringing up and
keeping the system in single user mode. The system administrator uses this mode
for performing his maintenance tasks. This area includes the system’s root
directory, the /bin, /dev, and /etc directories and are generally sufficient to keep
the system going.
● The /usr partition - This is the area where a lot of heterogeneous material can be
found.
● The swap partition - This is the partition that no user is allowed to access but is
used by the Kernel to control the movement process. When the system memory is
heavily loaded, the Kernel has to move process out of memory to a special area of
the disk called the swap area. When these swapped processes are ready to run,
they are loaded back to memory. Since this is a vital activity, every system has a
swap partition.
The creation of a file system for each partition is done with the mkfs command, which
breaks up each partition into number of distinct components.
The four components of a file system
The knowledge of how the Kernel and the components of the file system work in
tandem to organize the allocation space for files is necessary, mainly for the system
administrator. When the file system gets corrupted, he has to run the fsck command to
correct the distortions that take place. He should have an adequate knowledge of the
system internals, because if fsck throws out frightening messages and prompts him for
approval of the actions to be taken, he must make quick and correct decisions. The
entire file system is organized in a sequence of blocks, all numbered from zero to a
number determined at the time of its creation. Some of these blocks are kept reserved
for use by the Kernel, but the vast majority of them are set aside for the users to store
all their data and programs. To briefly recapitulate, these are the four components,
which every file system will have:
The boot block: This block contains a small boot program.
The super block: It contains global information about the file system. Additionally, it
also maintains a free list of I-nodes and data blocks, which can be immediately
allocated by the Kernel when creating a file.
The I-node blocks: This region contains a table for every file of the system. All
attributes of a file and directory are stored in this area except the name of the file or
directory itself.
The data blocks: All data and programs created by the users reside in this area.
Disk blocks and i-nodes
Every hard disk or floppy diskette is organized into blocks (or sectors) where all data
resides. The address, which should be unique, should be known to write something into
a block. Even though the blocks are numbered consecutively, one may not find the data
of file arranged in contiguous blocks. For instance, if a file created occupies say, ten
contiguous blocks. Later, if more bytes are added to it, which require a further three
blocks. The user may not be lucky enough to find sufficient free space immediately after
those ten blocks, which may be occupied by another file. The remaining data then has
to be written to the next free blocks that are available, wherever they may be.
UNIX has a complicated but elaborate scheme for maintaining all the disk block
addresses of a file. Because the blocks of a file are scattered throughout the disk, it is
obvious that the addresses of all its blocks have to be stored, and not just the starting
and ending block addresses. This, no doubt, leads to disk fragmentation, and
consequently increases the overheads of read/write operations. However, this
fragmentation also allows files to be enlarged or reduced at will and keeps wastage to a
minimum.
This situation immediately leads to two requirements:
A separate (also the largest) portion of the file system has to earmark for all data
contained in files. These are called data blocks, and contain only data and not the file
attribute.
For every file, there has to be a table, which maintains among other things, linked lists
of addresses of all the blocks used, by a file. This table should also contain practically all
the file attributes except the file name. The table is called an I-node, normally 64 bytes
long (if the filename is restricted to 14 characters) and maintained individually for each
file. All I-nodes are collectively stored in I-node blocks, arranged contiguously in
another dedicated area of the file system and preceding the data blocks. A disk block
contains 16 64-byte I-node.
The Structure of the I-node
An I-node for each file contains:

● File type (regular, directory or device).
● Size in bytes.
● Owner and group owner.
● Number of links.
● Permissions (read, write and execute) for both owner, group owner and others.
● Time of last modification and last access of file, and last changes of the I-node.
● A linked list of disk block addresses.
The I-node also records when a file was last read, written or executed (accessed) or
modified or had any of the I-node parameters changed. Finally, if the user has forgotten
what a link is, it simply provides another name for a file. A unique number, sometimes
called an I-node number, which simply references the position of the I-node in the list,
identifies every I-node. Note that this number is not stored in the I-node, but its table is
easily located, since they are arranged contiguously in the disk. This value for any file
can be found out with –i option of ls:
● $ ls –il icmcomp
9087 -rw-r--r-- 1 kiran 51813 jan 13 1998 11:30 icmcomp

$
The file icmcomp has the I-node number 9087. No other file in the same file system
can have this number unless the file is removed. In that case, the Kernel will allocate
this number to a new file.
LOGICAL AND PHYSICAL BLOCKS

The block addressing scheme
Addressing a Directory
The Boot Block
The Super Block

Role of super block in file creation
The immediate block after the I-node block is the data block. These blocks contain all
data contained in files and directories. Apart from these direct data blocks, there are
also indirect blocks which contain the addresses of the direct blocks. The I-node
maintains list of these indirect block addresses.
Unlike terminals and printers (character devices), which read and write one character at
a time, hard disks, floppy diskettes and tapes (block devices) handle data in chunks or
blocks.
The standard system V block size is 1024 bytes, which is often called a logical block.
Each block has one unique address. For this, a large amount of wastage will be there
i.e. if you use only 3 bytes for writing, the remaining 1021 bytes will be wasted.
Apart from a logical block, there is also another block, the physical block. A physical
block is 512 bytes long, so one logical block contains two physical blocks.
The block addressing scheme
Even though the I-node is only 64 bytes long, it is sufficient to keep track of the
addresses of even very large files. There are thirteen entries (or addresses) in the
I-node table containing the address of upto thirteen disk blocks. It will be interesting to
see just how these thirteen addresses suffice to locate the data.
The first ten addresses are simply enough. They contain the disk addresses of the first
ten blocks of the file. These blocks may occur either in a contiguous or fragmented
manner in the disk. However, reserving space for ten addresses in the I-node table
doesn’t mean that ten disk blocks are automatically allocated. If a file is only three
blocks long, the first three entries in the table contain the disk block numbers, and the
remaining entries are flushed out with zeroes.
The complication begins when files exceed ten blocks. Unlike the first ten entries of the
address table, the eleventh entry doesn’t store the address of the eleventh block of the
file. Instead, it has the address of another block that is neither an I-node block nor a
data block, but appropriates space from the data block area. This block is known as a
single indirect block. Actually, it contains the addresses of upto a further 256 data
blocks (for a 32 bit machine and having a logical block size of 1024 bytes). Using an
additional entry in the I-node table now increases the maximum file size to 266 (10 +
256) blocks. This also consumes one disk block for storing the 256 addresses. The
single indirect block is now the "second address" for the file.
When you read a file larger than ten blocks (i.e. requires indirect block addressing), the
Kernel goes to the corresponding I-node table, notes the file size (that is also in the
table) and the direct block addresses. It then reads the indirect block and all it’s
associated direct block entries. Finally, it instructs the disk driver to move the heads to
the respective blocks, counts the number of bytes read, matches it with the file size and
reads till the two numbers match. This is the way it should be, because there is no
other way for the Kernel to know that the end of file has been reached.
Now that the eleventh entry of the address table points to (i.e.contains the addresses
of) another disk block, this entry comes into play when the file size exceeds 266 blocks.
It points not to a single, but to a double indirect block, which in turn contains the
addresses of upto 266 single indirect blocks. Each of these single indirect blocks in turn
contains 256 addresses. A double indirect block itself is thus able to access 65,536 (256
* 256) disk block addresses and used for file sizes from 267KB to 64MB. The maximum
size of a file now gets enlarged to 65,802 (65,536 + 266) blocks.
Predictably, the thirteenth and final entry of the address table points to what is called a
triple indirect block. Each of 256 addresses of the block further points to a double
indirect block, so you can now have up to 256 double indirect blocks, each being able to
address 256 single indirect blocks. The maximum size a UNIX file system can support,
using this triple indirect block, finally becomes 16,843,018 (256 * 256 * 256 + 65802)
blocks. This implies a phenomenal 17 gigabytes.
ls uses the -s option to list each file in physical blocks. Combine it with the -l and -i
(I-node) options, and you can see as many file attributes as is possible with this
command.
● $ ls –lis icm1
305 98 -rw-r--r-- 1 kumar group 48527 mar12 14:21 icm1

$
The second column shows the file size in 512 byte blocks. This should be 48 logical
blocks or 96 physical blocks. Dividing 48527 by 1024, and then rounding off to the next
higher integer, since the file uses a partial block. The file indeed uses 96 physical blocks
to store its data. But, since the file size exceeds 10KB, the services of a single indirect
block are required to support this file size. Hence, an additional logical block, i.e. two
physical blocks, which raises the file "consumption" size to 96 + 2 = 98 blocks, as
shown in the output. So, you should remember that the block sizes shown by the df,
du, find and ls commands actually refer to the number of blocks allocated for the file,
and not simply the number storing only the data.
The location information of a file, represented by the array of thirteen pointers, applies
only to non-special files. It is not easily displayed either, and ordinary users can afford
to remain ignorant about them. However, this area does create problems when I-node
structures become inconsistent necessitating repair.
Addressing a Directory
Most of the preceding discussion applies equally to directories. A directory is a 16-byte

structure, with two bytes reserved for the I-node number and 14 bytes for file name. If
a directory file is allowed to occupy an entire logical block, then the directory structure
can house 64 (1024/16) files. Two of these are occupied by the, and .. directories,
which spring into existence when a directory is created. Since the first ten entries in the
I-node address table point to ten disk blocks, they are usually never required. A
directory file which contains 640 file entries.
Note that you know the addressing mechanism used by the Kernel for accessing files
and directories. What happens when you issue the command cat icm? The file is
displayed on the standard output, but only after the Kernel has taken the following
steps:
The Kernel must first know the I-node for the current directory that is always
maintained in memory.
Using this number, it searches the I-node blocks and locates the I-node for this
directory. It fetches from this I-node, the address of the data block that contains the
directory file. From the directory file, the Kernel looks for the file icm and its I-node
number. It then goes back to the I-node blocks and locates the I-node for the icm. The
Kernel now reads the disk address entries, and then accesses the file using the disk
block addresses.
This is an enormous amount of work, which may only slow down file access. Therefore,
for efficient organization of files, you should not let a single directory have more than
62 files.
The UNIX system also maintains an I-node table in memory for the file, which is being
used by it. When a file is opened, its I-node is copied from the hard disk to the system’s
own I-node table. The system also maintains in the super block a partial list of free
I-nodes that can be immediately allocated when files are newly created. When a file is
created, the Kernel does not have to scan the disk; it looks up the list instead. The
efficiency of the system is greatly increased, as the list is always updated, and hence
quite reliable.
The Boot Block:
Preceding the I-node blocks are only two blocks. The first block is known as the boot
block, having the number 0. This contains a small bootstrap program, which is loaded
into memory when the system is booted. This program may in turn, load another
program from the disk, but eventually it will load the Kernel into memory. The file/UNIX
is loaded, and the startup operations commence. However, the bootstrapping program
is read in from the boot block of the root (main) file system. For other file systems this
block is simply kept blank.
The Super Block:
Next comes the block known as the super block (numbered 1) that contains global file
information about disk usage and availability of data blocks and I-nodes, along with a
number of other parameters. The Kernel first reads this area before allocating disk
blocks and I-nodes for new files. Information recorded on super blocks should,
therefore, be correct for the healthy operation of the system. This is mainly what it
contains:
● The size of the file system.
● The length of disk block.
● The number of free data blocks available.
● A partial list of immediately allocable free data blocks.
● Number of free I-nodes available.
● A partial list of immediately usable I-nodes.
● Last time of updating.
While one disk block is kept reserved for storing this information, a copy of the super
block is also kept in memory. The Kernel reads and writes the copy in memory when
controlling allocation of I-nodes and data blocks. From time to time, it updates the disk
copy of the super block with the memory copy. The disk copy can, therefore, never be
newer than the corresponding copy in memory.
Role of super block in file creation:
Let’s see the importance of the super block. Since the super block maintains a list of
free I-node numbers, the Kernel takes the next I-node number from the list and assigns
it to the file. The disk copy of the I-node is then read into the memory, initialized and
written back to disk. The Kernel then decrements the free I-node count that is also
maintained in the super block when it finds the free I-node list empty, it immediately
scans the I-node blocks in the disk, and fills the super block array with fresh list.
Now, the disk blocks have to be allocated. Unlike the I-nodes, a complete list of free
data blocks is kept in a separate area of the disk. This complete list is initially created
during the creation of the file system itself with the mkfs command.
From the "address book", the super block maintains a working list of those free data
blocks that are available for allocation. The mechanism of allocation is similar to that for
I-nodes, with one notable difference. When the Kernel finds the free block is list empty,
it immediately reads this address book and not the disk blocks directly, and fills the
super block array. This is because a disk block itself doesn’t contain any mark to
indicate whether it is free or not, unlike the I-node whose free status is determined by
status of the file type field. The machine is powered down by going through a formal
routine, because the information on the file system needs to be written to disk before
the power is turned off, as the system checks this information during the booting. Since
the Kernel works with the memory copy of super block rather then the disk copy, the
disk super block needs to update in a periodic manner. This ensures that it is nearly as
recent as its copy in memory. If that is not done, there may be serious damage caused
to the file system.
CREATING A FILE SYSTEM

The mkfs command
The Hard Disk
Engaging and Disengaging File Systems

File system checking with fsck
Process Termination
Environment List
Memory layout of a C program
The mkfs command:
The data blocks, I-node, the superblock and the boot block typically describes the
structure of a single file system. But, depending on the number of physical disks that
the machine may have, there can be multiple file systems, sometimes, a single disk can
also be partitioned into two or more file systems.
Data which can be grouped into a category, and which change frequently should have a
file system of its own. Thus, it is quite common to have the /usr directory in a separate
file system. This reduces disk fragmentation a great deal, as the data will normally be
found clustered together.
Further, if a file system is to be backed up, then its size should not be more than the
capacity of the backup device. Multi-volume tapes are usually not supported by the
UNIX backup commands, so make sure that the file system size is smaller than the
capacity of the tape.
This organization of directories into multiple file systems is normally done at the time of
installing the UNIX operating system. Partitioning a disk into several file systems has
distinct advantages. It makes the system administrator’s task easier, because some of
his tools act and report on each file system separately.
Only the system administrator, who uses a different login name to perform his duties,
can use the commands featured in this section.
Using a Floppy Diskette to Create a File System
While you can have multiple file systems on fixed disks, you can also use a floppy
diskette to create a file system on it. It is just as well that we should choose the floppy
diskette as the device for illustrating the various file systems commands since we
access the floppy much more directly than we access the hard disk.
To create a file system on a diskette, it needs to be first formatted with the format
command available in all systems. After that is done, the mkfs (make file system)
command can be used to create a file system on the device that follows the command.
When used in this way:
● $ mkfs /dev/rdsk/foq15dt 2400
it creates a file system on the raw floppy device (1.2 MB). Though diskettes are block
devices (i.e. they read and write in blocks), they have a character mode as well. The
character mode files of floppy devices are store in the /dev/dsk directory. The raw
device has to be specified here because it erases the contents of the disk and creates a
directory table instead. The other argument specifies the number of blocks to be used
for the new file system.
Typically, a 1.2 MB floppy requires 1200000 / 12 = 2400 physical blocks, the figure that
was specified. Optionally, you can also specify the number of I-nodes that you need to
allocate on the disk. When it is omitted, the system assumes a default value, which is
usually, one fourth of the number of logical blocks (i.e. one eighth of the number of
physical blocks). This figure, however, varies across systems. You can override this
default option:
● $ mkfs /dev/rdsk/foq15dt 2400:250
The two parameters (2400 and 250) are separated by the (:). Note that there should be
no white space on either side of this delimiter. This specification indicates that there can
be a maximum of 250 files on the diskette, using a maximum of 2400 blocks. The
diskette fills up whichever figure is attained first.
mkfs initializes the file system so that it has an empty root directory.
mkfs optionally accepts two important parameters - the rational gap, and the number
of blocks/cylinder. These relate to the organization of the hard disk.
The Hard Disk
The system administrator should understand the significance of a couple of parameters

related to the hard disk. This knowledge is necessary when you create a file system on
a disk with the mkfs command.
Every disk contains one or more platters, each of which has surfaces. There is a
magnetic head for reading and writing each surface. So, if there are eight usable
surfaces, they will require eight heads. The heads move in tandem, and it is not
possible to control the movement of a head individually.
Each surface is composed of a number of concentric tracks, which have numbers 0, 1, 2
etc. Since a disk may have more than one platter (i.e. two surfaces), there will be as
many tracks bearing the same track number as there are surfaces. You can then
visualize a cylinder comprising all tracks bearing the same number on each disk surface.
So, there will be as many as many cylinders in the disk as there are tracks on each
usable surface. The disk heads move radial from track to track, and a head is above a
single track at any point of the time.
Each track is further broken up into sectors or blocks. A physically block normally stores
512 bytes of data. The disk is spinning constantly (normally 3600 r.p.m), and when the
head is positioned above a particular track, all the blocks of tracks pass through the
head in a very short time.

Now, when a block of data (say, block number 2) is read, it is transferred to the buffer
of the disk controller, this takes some time, and before the head starts reading again, a
number of blocks of the track would have passed through the head already. If the
blocks are numbered sequentially, the disk has to make a complete turn before block
number 3 comes up for reading. This simple numbering scheme of blocks will greatly
increase the read/write times, so it is obvious that the layout scheme has to be
different.
The bypassing of sectors is known as interleaving, and the number of blocks skipped is
known as rotational gap. The higher the transfer rate, the lower is the gap. By skipping
blocks the possibility of the disk making full turn to make the next block available is
greatly reduced.
When you use the mkfs command to create a file system, you may need to specify this
gap as well as the number of blocks/cylinder. The latter is actually the number of usable
surfaces multiplied by the number of blocks per track.
Engaging and Disengaging File Systems.
The mount and umount commands:

Once a file has been created, it just sits in a stand-alone mode with an empty root
directory: the main file system (i.e., the root file system in the root partition) doesn’t
even know of its existence. Moreover, you can neither "cd" to the root directory of the
file system, nor can you access its files. The attachment made between the root
directory of the new file system and an empty directory (usually / mnt) in the main file
system process is called mounting, and is achieved with the mount command. The
point at which this linkage takes place is called the mount point.
Remember that the objective is to have free access to the floppy so that you can
consider it, for all practical purposes, as a subdirectory of /usr/Kumar. For instance, if
you want to have a sub-directory /usr/kumar/safe in the diskette, you first have to
create this sub-directory in the main file system:
● $ mkdir safe
$ cd safe
$ pwd /usr/Kumar/safe
Once you have done that, you have to mount (i.e., engage) the secondary file system
at this point, indicated by the directory /usr/Kumar/safe. You need to specify the
absolute pathname of this directory as an argument to mount to indicate to the Kernel
that it is this directory under which the file system is to be mounted:
● $ mount /dev/dsk/foq15dt /usr/Kumar/safe
$-
After this device is mounted, the root directory of the file system created on the floppy
loses its separate identity. It now becomes the directory /usr/Kumar/safe, and made
to appear as if it is part of the main file system. The main file system is thus logically
extended with the incorporation of the secondary file system on the diskette.
You can use this directory just like any other in the main file system, except that you
can’t link files, across file systems. The device can now be considered to be mounted
and you can treat it as a secondary hard disk, where you can copy your files or even
"cd" to it. To make sure, try copying the script files to the subdirectory safe:
● $ cp *.sh safe
$-
When you issue this command, the light on the floppy drive glows, indicating that the
files are being copied onto the floppy. Even though this directory has been created on
the main system, these files won’t be copied there. Try changing your directory to safe,
and then issuing ls. You will see the floppy drive lighting up again. You have thus
managed to create a backup of your files on the floppy. However, that doesn’t prove yet
that the sub-directory safe in the hard disk doesn’t also have these files.
To verify that nothing has actually been copied to the directory safe in the main
system, you must first unmount the file system you just created on the diskette. Make
sure that no one is using the device, and then empty the mount point. This is achieved
with the umount command, which takes only the device name as the argument:
● $ /etc/umount /dev/dsk/foq15dt
$ ls safe
$-
There are no files in the directory, implying that all the Shell script files have been
copied onto the floppy diskette. It is worth nothing that the sub-directory safe could be
used as a mount point only because it was empty.
There is one important rule that needs to be observed when you issue this command;
you must be positioned in a directory which is not part of the file system you are trying
to umount. For instance you can't "cd" to safe, and then issue umount from there.
Since the file system has been mounted at that point, it is better to "cd" to the home
directory, and then issue the command. Now the floppy can be removed and stored in a
safe place. Remember to mount it again when you use it.
Many installations have the /usr/man and /usr/spool directories as separate file
systems. These file systems are mounted at start up. After the attachment process is
complete, the separate identity of the root directories of all these systems disappears.
These attached file systems are then made to look as one to the user.
File system checking with fsck
The file system is one of the most sensitive components of the UNIX system. It
maintains all information pertaining to files at a number of places, with suitable links
between them, so that consistency is maintained. However, if the system is not shut
down properly, or power failure causes a system crash, inconsistencies tend to crop up
in the information maintained at these places. These relate mainly to the in-memory
copies of the superblock and i-nodes and their disk counterparts. Several things can
happen to cause these mismatches. Some of the more common ones are listed below.
● Two i-nodes can claim the same disk block. When a file is deleted, the Kernel free
its data blocks and returns them to the free list maintained in the in-memory copy
of the superblock. It can then assign one of these disk blocks for use by another
file. If the power goes off the Kernel writes the data and i-node of the new file to
disk. But before updating the old I-node, there will be two claimants of the same
i-node.
● A block may neither be in the free list nor be used by an i-node. This can easily
happen if the i-node of a removed file is written to disk just before the system
crashes and the superblock is left unwritten to disk. You now have an outdated
superblock, not having the block in the free list, and the latest copy of the i-node
not showing the disk block number in the list.
● An i-node is neither in the free i-node list nor in any directory. If the system
crashes after the file is created, but before updating the directory entry, the file is
then effectively not available in the file system.
● An allocated file doesn’t have at least one directory entry. An I-node must have as
many directory entries as there are links and this situation may get disturbed.
There can also be unreferenced directories or I-nodes with incorrect link counts.
● The i-node format could be grabbled. You can have a bad block number, which is
out of range. There may be a mismatch between the number of blocks used by the
i-node and the file size. The file type may also be invalid. The i-node may show a
zero link count, and yet otherwise have a valid format.
● You can have a totally garbled directory. Since each file in a directory uses 16
bytes, the size of the directory must normally be a multiple of 16. This is,
however, not always the case, leading to directory misalignment. Moreover,
filenames may contain an embedded link, or there may be i-node numbers in a
directory, which are out of range.
One of the most important functions of the system administrator is the maintenance of
the integrity of the file system. He should make use of the services of fsck command to
achieve this task. The command takes the name of the file system as the argument, but
can be invoked without any argument as well.
● fsck
** Phase 1- Check Blocks and Sizes

** Phase 2- Check Pathnames
** Phase 3- Check Connectivity
** Phase 4- Check Reference Counts
** Phase 5- Check Free List
When used with the arguments, the command checks all the file systems specified in
the file /etc/checklist. This file usually contains a list of all the active file systems. The
output that you see above is normal when the file system is consistent. However, when
the file system is corrupted, you will see plenty of messages on the system console,
along with questions, which you have to answer correctly.
● Phase1: The first phase scrutinizes the file system’s i-node list. It checks the
i-node types, their format and the block numbers for bad and duplicate blocks. It
also detects variations between a file size and the number of blocks used by its
i-node. It declares a block "BAD" if the block number is out of range and "DUP" if it
]
is claimed by another i-node. Apart from i-node, fsck also checks whether all
directory file sizes are exact multiples of 16.
● Phase 1B: This phase is normally not seen in fsck output, but once it detects a
single duplicate block, it rescans the disk for more duplicates.
● Phase 2: This phase checks all the directory entries, starting from root, for "OUT
OF RANGE" i-node numbers. If bad or duplicate blocks were detected in Phase 1 or
1B, then fsck prompts for their removal in this phase.
● Phase 3: Proper connectivity requires that each I-node has at least one directory
entry, and this is checked in the third phase. Unreferenced directories are reported
as error messages.
● Phase 4: Here, fsck checks the link count as stored in the I-node with the
directory entries from the information generated in the second and third phases.
Improper link counts may now be fixed.
● Phase 5: Finally, fsck’s free block count is compared with the figure maintained in
the superblock. The free block list is checked for bad and duplicate blocks, as well
as for unused blocks, which should have been in this list but are not. A salvage
operation may be carried out, with the user’s approval, which replaces the
erroneous free block list with a newly computed one.
● Phase 6: The last salvage operation in Phase 5 leads to this phase.
Normally, fsck operates by displaying the error message that is encountered in each
phase. When used with options, it prompts the administrator before repairing any
damage that it has detected. When fsck runs in rectification mode, it automatically
proceeds with the repair of every damaged file. Sometimes, the fsck rebuilds the file
system, the in-memory copies of the superblock and other tables that may contain old
and inaccurate information, the information on the disk being more recent than the
memory copy. It may flash the following message.
******** BOOT UNIX (NO SYSC!) **********
This means that if the administrator uses sync or /etc/shutdown to write the
incorrect superblock disk, the entire good work dome by fsck would be lost. This is
required when the root file system contains a serious problem, he should the
immediately press the reset button and reboot the system.
Setting maximum file size with ulimit
There should be a restriction on maximum size that a user can be permitted to create.
The ulimit statement of the Shell sets the maximum size of files. It is Shell built-in, and
when invoked without an argument, simply specifies the maximum size currently set:
● $ ulimit
2097151
$
This indicates that the maximum file size is restricted to 2097151 physical blocks, the
default ulimit is set inside the Kernel. An ordinary user can also issue this command
with an argument to reduce the default value.
link function
The link function is used to create a link to an existing file.
Syntax:
#include <unistd.h>
int link(const char *existingpath, const char *newpath);
Returns 0 if OK, -1 on error.
This function creates a new directory entry, newpath, that references the existing file
existing path. If the newpath already exists, an error is returned.
unlink function
This function removes an existing directory entry.
Syntax:
#include <unistd.h>
int unlink(const char *pathname);
This function removes the directory entry and decrements the link count of the file
referenced by pathname. If there are other links to the file, the data in the file is still
accessible through other links. The file is not changed if an error occurs.
Note: To unlink a file we must have write and execute permissions in the directory
containing the directory entry, since it is the directory entry that we may be removing.
Only when the link count reaches 0 can the contents of the file be deleted.
remove function
A file or directory can be unlinked using the remove function. For a file, remove is
identical to unlink. For a directory, remove is identical to rmdir.
Syntax:
#include <stdio.h>
int rename(const char *pathname);
rename function
A file or directory is renamed with the rename function.
Syntax:
#include <stdio.h>
int rename(const char *oldname, const char *newname);

ENVIRONMENT OF A UNIX PROCESS
main function
A C program starts execution with a function called main. The prototype for the main
function is
int main(int argc, char *argv[]);
argc is the number of command line arguments and argv is an array of pointers to the
arguments.
When a C program is started by the Kernel, a special startup routine is called before the
main function is called. The executable program file specifies this startup routine as the
starting address for the program. This is set up by the link editor when it is invoked by
the C compiler, usually cc. This startup routine takes values from the Kernel and sets
things up so that the main function is called.
Process Termination
There are five ways for a process to terminate:

1. Normal termination:
a)return from main
b)calling exit
c)calling _exit
2. Abnormal termination:
a)calling abort
b)terminated by a signal
The startup routine is written such, that if the main function returns, the exit function is
called.
exit and _exit functions
Two functions terminate a program normally: _exit which returns to the Kernel
immediately, and exit, which performs certain cleanup processing and then returns to
the Kernel.
#include <stdlib.h>
void exit(int status);
#include <unistd.h>
void _exit(int status);
Both the exit and _exit functions expect a single integer argument, which is called the
exit status. Most UNIX Shells provide a way to examine the exit status of a process. If
either of these functions is called without an exit status, main does a return without a
return value, or main ‘falls off the end’ (an implicit return), the exit status is undefined.
Command-Line Arguments
When a program is executed, the process that does the execution can pass
command-line arguments to the new program.

● Example:
The following program echoes all command line arguments to standard output.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int i;
/* echo all command line arguments */
for (i = 0; i < argc; i++)
printf("argv[%d]: %s\n", i, argv[i]);
exit(0);
}
If the above program is compiled and the executable named as echoarg, then,
$ ./echoarg arg1 TEST foo
argv[0]: ./echoarg
argv[1]: arg1
argv[2]: TEST
argv[3]: foo
Environment List
Each program is also passed an environment list. Like the argument list, the
environment list is an array of character pointers, with each pointer containing the
address of a null-terminated C string. The address of the array of pointers is contained
in the global variable environ.
extern char **environ;
For example, if the environment consisted of five strings, it could look like this:
Environment consisting of five C character strings
In the above figure, the null bytes have been shown explicitly at the end of each string.
Memory layout of a C program
A C program in general comprises the following pieces:

● Text segment: These are the machine instructions that are executed by the CPU.
● Initialized data segment: This is usually called the data segment and it contains
variables that are specifically initialized in the program.
● Uninitialized data segment: This segment is often called the "bss" segment.
Data in this segment is initialized by the Kernel to arithmetic 0 or null pointers
before the program starts executing.
● Stack: This is where automatic variables are stored, along with information that is
saved each time a function is called. Each time a function is called, the address of
where to return to, and certain information about the caller’s environment is saved
on the stack. The newly called function then allocates room on the stack for its
automatic and temporary variables.
● Heap: Dynamic memory allocation usually takes place on the heap.
The following figure shows the typical arrangement of these segments:

Typical memory management

Memory Allocation
There are three functions in ANSI C for memory allocation:
● malloc: Allocates a specified number of bytes of memory. The initial value of
memory is indeterminate.
● calloc: Allocates space for a specified number of objects of a specified size. The
space is initialized to all 0 bits.
● realloc: Changes the size of a previously allocated area.
● #include <stdlib.h>
void *malloc(size_t size);

void *calloc(size_t nobj, size_t size);
void *realloc(void *ptr, size_t newsize);
All three return a non-null pointer if OK, NULL on error.
void free(void *ptr);
STANDARD I/O LIBRARY

Standard Input, Standard Output, and Standard Error
Character at a time I/O
Line at a time I/O
Direct I/O
Formatted I/O
Pipes
A Unix pipe
The standard I/O library handles details like buffer allocation and performing I/O in
optimal-sized chunks, obviating the need to worry about using the correct block size.
When a file is opened or created with the standard I/O library, it may be considered as
if a stream is associated with that file.
When a stream is opened, the standard I/O function fopen returns a pointer to a FILE
object. The object is normally a structure that contains all the information required by
the standard I/O library to manage the stream; the file descriptor used for actual I/O, a
pointer to a buffer for the stream, the size of the buffer, a count of the number of
characters currently in the buffer, an error flag, and the like. To reference the stream,
its file pointer is passed as an argument to each standard I/O function.
Standard Input, Standard Output, and Standard Error
Three streams are predefined and are automatically available to a process:

● standard input
standard output
standard error
These refer to the same files as the file descriptors STDIN_FILENO, STDOUT_FILENO
and STDERR_FILENO. These three standard I/O streams are referenced through the
predefined file pointers stdin, stdout and stderr. These three file pointers are defined
in the <stdio.h> header.
Opening a Stream
The following three functions open a standard I/O stream:
#include <stdio.h>
FILE *fopen(const char *pathname, const char *type);
FILE *freopen(const char *pathname, const char *type, FILE *fp);
FILE *fdopen(int filedes, const char *type);
All three return file pointers if OK, NULL on error.
● fopen opens a file.

freopen opens a specified file on a specified stream, closing the stream first, if it is
already open.
fdopen takes an existing file descriptor and associates a standard I/O stream with
the descriptor.
The following are the different arguments for the type argument:
Type Description
r or rb Open for reading.
w or wb Truncate to 0 length or create for writing.
a or ab Append; open for writing at end of file, or create for writing.
r+ or r+b or rb+ Open for reading and writing.
w+ or w+b or wb+ Truncate to 0 length or create for reading and writing.
a+ or a+b or ab+ Open or create for reading and writing at end of file.
An open stream can be closed by calling fclose. The syntax is:

#include <stdio.h>
● int fclose(FILE *fp);
Returns 0 if OK, EOF on error.

Reading and Writing a stream
Once a stream is opened, the following three types of functions are used to read or
write into the stream:
● Character at a time I/O.
Line at a time I/O.
Direct I/O.
Character at a time I/O
Input functions:
The following three functions are used to read one character at a time:
#include <stdio.h>
● int getc(FILE *fp);
int fgetc(FILE *fp);

int getchar(void);
All three return the next character if OK, EOF on end of file or error.
The difference between the first two functions is that getc can be implemented as a
macro, while fgetc cannot be implemented as a macro. This means the following three
things:
● The argument to getc should not be an expression with side effects.
Since fgetc is guaranteed to be function, it allows the address of fgetc to be passed
as an argument to another function.
Calls to fgetc take a longer time than those to getc.

These three functions return the next character as an unsigned char converted to an
int.
These functions return the same value whether an error occurs or an end of file is
reached. To distinguish between the two, ferror or feof should be called.
● #include <stdio.h>
int ferror(FILE *fp);
int feof(FILE *fp);
Both return nonzero if condition is true. 0 otherwise.
void clearerr(FILE *fp);
Output functions
The following are the output functions. Each correspond to their respective input
functions:
#include <stdio.h>
int putc(int c, FILE *fp);
int fputc(int c, FILE *fp);
int putchar(int c);
All three return c if OK, EOF on error
● Example:
#Usage of getc and putc. The following example copies #standard input to
standard output using getc and #putc.
#include <stdio.h>
int main(void)
{
int c;
while ((c = getc(stdin)) != EOF)
if (putc(c, stdout) == EOF)
printf("output error");
if (ferror(stdin))
printf("input error");
exit(0);
}
Line at a time I/O
Input functions:
Line at a time input is provided by the following two functions:
● #include <stdio.h>
char *fgets(char *buf, int n, FILE *fp);

● char *fgets(char *buf);
Both return buf if OK, NULL on end of file or error.

gets reads from standard input, while fgets reads from the specified stream.
Output functions:
Line at time output is provided by fputs and puts.
#include <stdio.h>
int fputs(const char *str, FILE *fp);
int puts(const char *str);
Both return a nonnegative value if OK, EOF on error.
Example:
#Usage of fgets and fputs. The following program #copies standard input to standard
output using fgets #and fputs.
#include <stdio.h>
int main(void)
{
char buf[MAXLINE];
while (fgets(buf, MAXLINE, stdin) != NULL)
if (fputs(buf, stdout) == EOF)
printf("output error");
if (ferror(stdin))
printf("input error");
exit(0);
}
Direct I/O
These are also known as binary I/O.

Input and Output functions:
The following two functions are provided for binary I/O:
#include <stdio.h>
size_t fread(void *ptr, size_t size, size_t nobj, FILE *fp);
size_t fwrite(const void *ptr, size_t size, size_t nobj, FILE *fp);
Both return number of objects read or written
There are two common uses of these functions:
● Reading or writing a binary array.
● For example to write elements 2 through 5 of a floating point array,
float data[10];
if (fwrite(&data[2], sizeof(float), 4, fp) != 4)
printf("fwrite error");
Here size is specified as the size of each element of the array, and nobj as the number
of elements.
● Reading or writing a structure.
● For example,
struct {
short count;
long total;
char name[NAMESIZE];
}item;
if (fwrite(&item, sizeof(item), 1, fp) != 1)
printf("fwrite error");
Here, size is specified as the size of the structure and nobj as 1.
Positioning a stream
There are two ways to position a standard I/O stream: ftell and fseek. These two
functions assume that a file’s position can be stored in a long integer.
#include <stdio.h>
● long ftell(FILE *fp);
Returns current file position indicator if OK, -1L on error.

int fseek(FILE * fp, long offset, int whence);
Returns 0 if OK, nonzero on error.
void rewind(FILE *fp);
For a binary file, a files position indicator is measured in bytes from the beginning of the
file. The value returned by ftell for a binary file is this byte position. To position a binary
file using fseek a byte offset should be specified. The values of whence are SEEK_SET
which means from the beginning of the file, SEEK_CUR which means from the current
file position, and SEEK_END which means from the end of the file.
fgetpos and fsetpos. These two functions introduce a new abstract datatype fpos_t, that
records a file position. This datatype can be made as big as necessary to record a file’s
position.
Portable applications that need to move to non-UNIX systems should use fgetpos and
fsetpos.
#include <stdio.h>
int fgetpos(FILE *fp, fpos_t *pos);
int fsetpos(FILE *fp, const fpos_t *pos);
Both return 0 if OK, nonzero on error.

fgetpos stores the current value of the file’s position indicator in the object pointed to
by pos. This value can be used in a latter call to fsetpos to reposition the stream to that
location.
Formatted I/O
Formatted Output
Formatted output is handled by the three printf functions.
#include <stdio.h>
int printf(const char *format, …);
int fprintf(FILE *fp, const char *format, …);
Both return number of characters output if OK, negative value if output error.
int sprintf(char *buf, cinst char *format, …);
Returns number of characters stored in array.
printf writes to standard output, fprintf writes to the specified stream, and sprintf places
the formatted characters in the array buf. sprintf automatically appends a null byte at
the end of the array, but this null byte is not included in the return value.
Formatted Input
Formatted input is handled by the following three scanf functions:
#include <stdio.h>
int scanf(const char *format, …)
int fscanf(FILE *fp, const char *format, …);
int sscanf(const char *buf, const char(format, …);
All three return number of input items assigned, EOF if input error or end of file before
any conversion.
INTERPROCESS COMMUNICATION
As we already know, UNIX is a multi-user environment. This means that at any point in
a time, there will be a number of processes concurrently active. This gives rise to a
need for processes to communicate with each other. In fact, the design of the UNIX
operating system is such that no processes may remain in isolation. There is always a
need for one process to be able to inform another process of the occurrence of an event
or to pass on data for processing.
Here, we have two forms of interprocess communication in this session. They are -
pipes (also called unnamed pipes) and FIFOs (First In, First Out - also called
named pipes).
Pipes
You know that a new process is created using the fork() system call wherein a process
spawns a child process. UNIX has historically provided a means for the parent process
and its child to communicate through the use of pipes. Pipes are simplest form of IPC
and exit on all UNIX prompts. An unnamed pipe is essentially a gateway between a
parent process and its child through which they can exit on all UNIX platforms. An
unnamed pipe is essentially a gateway between a parent and its child through which
they can exchange data.
The concept of Pipe

As stated earlier, a pipe is a means of transmitting the output of one program as input
to another program. However, if you need to extract only a few lines of this output,
then you will need to send this output to the grep utility, which would extract the
required line and display them. You would do this using a pipe:
sort empdata | grep "a01"
The process of piping output from one program as input to another can be visualized as
shown below:
The figure shows Process A talking to Process B. But programs talk to each other
through the medium of files. This implies that the pipe shown in the figure is a file.
Process A writes its output into this file, and Process B reads the file. This file which acts
as the medium of communication is a special kind of file.
Creating an unnamed pipe - The pipe() System Call
The pipe() call is used to create a pipe. The syntax of this call is as follows:
#include <unistd.h>
int fd[2];
int pipe(fd);
The system call opens a special file (called an unnamed pipe) twice - once for reading,
and once for writing. This is because the pipe is going to be both read from and written
into. It returns the two file descriptors through the integer array sent to it as a
parameter. The first one is opened for reading, and the second is opened for writing.
Data in the pipe flows through the Kernel. The Kernel takes care of opening the pipe,
which is physically located within the Kernel. When the processes terminates, the Kernel
deletes the pipe.
Now a pipe may be viewed as shown in the following figure:
A Unix pipe
Normally, the process that calls pipe then calls fork, creating an IPC channel from the
parent to the child or vice versa. A point to be kept in mind about a pipe is that it is
half-duplex in nature - that is, data can flow only in one direction at a time; either from
parent to child or vice versa. For a pipe from parent to child, the parent closes the read
end of the pipe and writes into the write end. The child closes the write end and reads
from the read end. Data can now flow from parent to child. For a pipe in the reverse
direction, the parent closes the write end, and the child closes the read end.
The following figure shows a half duplex pipe after a fork call:
Half duplex pipe after a fork

The following rules must be kept in mind when one end of a pipe has been closed:
● If a read is executed on a pipe whose write end has been closed, read will return
a 0 to indicate an EOF.
● If a write is executed on a pipe whose read end has been closed, the signal
SIGPIPE is generated. If action of the signal is either ignored or trapped, write will
return an error with errno set to EPIPE.
Manipulating a Pipe - The popen() and pclose() Functions
The pipe() system call is a rather complicated call to implement. This is because of the
implementation issue involved - the pipe() call, the fork() call, the closing of the
appropriate file descriptors and so on. The standard library offers two front-end
functions that take care of these details - the popen() and pclose() functions which
create and delete pipes. The syntax of popen() and pclose() is:
FILE *popen(cmdstring, type)
char *cmdstring, *type;
This function is very similar to the fopen() function. It returns a pointer to type FILE.
type can be either r for read or w for write to indicate the direction of the pipe. The
function differs from fopen() mainly in that the first parameter, cmdstring is an
executable command. Consider the following lines of code.
FILE *fp;
fp = popen(ls *.c, "r");

The command string, ls *.c will be executed - through a fork() and exec() - by
popen(), and the output of this command will be made available as input to the
process that issues the call. Since popen() returns a FILE pointer, an fgets() can be
used to access this input. Any executable file can be specified in place of this command
string. If the type is "w", then the command string must be some program that expects
standard input.
The syntax of pclose() is:
int pclose(fp)
FILE *fp;
Let us look at where you can use a pipe. Assume that you have to write a program
called get_data that has to accept input in lower-case from the user. Since you can
never be sure that the user is going to key his input in lower-case, you have to take
care of this detail. A program called utol exists that converts string into lower-case.
Example conversion Program: Case
#include <stdio.h>
main(){
int c;
while ((c = getchar()) != EOF)
{
if ((c >= 'A') && (c <= 'Z'))
c = 32;
if (putchar(c) == EOF)
{
perror("utol");
exit(1);
}
if (c == '\n')
fflush(stdin);
}
exit(0);
}
Example: Implementation of get_data using popen()
#include <stdio.h>
main()
{
char line[BUFSIZE];
FILE *fpin;
if ((fpin = popen("utol", "r")) == NULL)
{
perror("open");
exit(1);
}
while (1)
{
fputs("input:", stdout);
fflush(stdout);
if (fgets(line, BUFFSIZE, fpin) == NULL)
break;
}
pclose(fpin);
exit(0);
}
In get_data, the call to popen() causes the program utol to be executed. The output
of utol is read in as input to get_data. The program utol, therefore acts as a filter
between the standard input and the program get_data. It takes input from the user,
converts all upper-case characters to lower-case, and passes on this input to get_data.
Limitations of Pipes:
● They are half-duplex in nature - data flows only in one direction at a time.
● They can only be used between directly related processes. That is, the processes
must have a parent-child relationship. A pipe is created by a process that then
calls fork(), and the pipe is used between parent and child.
FIFO’s AND MESSAGE QUEUES,SEMOPHORES

FIFOs - Named Pipes
Creating a FIFO
Client-Server communication using a FIFO

Message Queues
Semaphores
Shared memory
FIFOs - Named Pipes
Unnamed pipes can only be used between processes with a common ancestor. It is
possible for unrelated processes to communicate using pipes. In situations where
unrelated processes need to communicate, FIFOs (also called named pipes) are used.
When you use pipes, the pipe device created is not visible to you - it is created in the
Kernel and the Kernel takes care of creating it and deleting it when it is no longer
needed.
With FIFOs, however, you will be actually creating the special file required, setting its
permissions, deleting it when it is no longer required etc. The file you create will have a
'p' in the first column of ls-l output to indicate that it is a named pipe.
Creating a FIFO
The mknod() System Call
A FIFO is created using the mknod() system call. The syntax of this call is as follows:
● int mknode(path, mode)
char *path;
int mode;
The mknode() system call is similar to the open() system call in both function and
format. The difference is that while open(), is used to create ordinary files, mknode()
is used to create special files.
Path is the name of the file to be created, and mode specifies the type of the file to be
created. The permissions are set in the same way as inopen(), but instead of
specifying O_RDONLY/O_WRONLY/0_RDWR, the symbolic constant S_IFIF0 must be
used to specify that the file to be created is to be a FIFO. Another difference between
the two system calls is that while open() creates the file and opens it in the desired
mode, mknode() creates the file and does not open it.
Once the file has been created using the mknode(), the usual file I/O calls can be used
to manipulate the file. If mknode() is successful, a value 0 is returned, otherwise -1 is
returned and errno is set to indicate the error. The associated symbolic constants that
errno is set to are tabulated below.
Symbolic constants Interpretation
EOT DIR A component of the path is not a directory.
ENOENT A directory named in the path does not exist.
EEXIST The named file already exists.
EFAULT Invalid path.
EINTR A signal interrupted the system call.
Manipulating a FIFO
Once a FIFO has been created using mknode(), it can be opened using open() and
manipulated like any ordinary file - except that the lseek() function cannot be used as
pipes, both named and unnamed, do not support random access.
We conclude by examining how pipes, which are nothing but special files, are different
from regular files. We said that a process writes a data into this file, which is read by
another process. Once this data is read, it is removed from the pipe. Unlike, regular
files, which are usually used as a repository of data, a pipe is simply used as a
transition portion of data.
Client-Server communication using a FIFO:
One important use of FIFO’s is to send data between a client and server. If there is a
server contacted by several clients, each client can write its request to a well known
FIFO i.e. the pathname of the FIFO is well known to all the clients that need to contact
it. The following figure describes this concept:
Client sending requests to a server using a FIFO

Client - Server communication using FIFO’s

Message Queues
The passing of messages between processes is possible in the UNIX system. In system
V implementation of messages, all messages are stored in the Kernel, and have an
associated message queue identifier. This identifier which is called an msqid,
identifies a particular queue of messages.
Every message on a queue has the following attributes:
● Long integer type.
Length of the data portion of the message (can be zero).
Data (if the length is greater than zero).
A new message queue is created, or an existing message queue is accessed with the
msgget system call.
● #include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
int msgget(key_t key, int msgflag);
The msgflag value is a combination of the constants shown in the below figure:
Numeric Symbolic Description
0400 MSG_R Read by owner
0200 MSG_W Write by owner
0040 MSG_R>>3 Read by group
0020 MSG_R>>3 Write by group
0004 MSG_R>>6 Read by world
MSG_R>>6
0002 IPC_CREAT Write by world
IPC_EXCL
The value returned by the msgget is a message queue identifier, msqid, or -1 if an

error occurred.
Message Queue Limits
There are certain limits in message queues. The system administrator can change some
of these, by configuring a new Kernel. The example for this, is if we write an application
that uses messages that are 20000 bytes long, it will not be portable. The magic
number of 8192 bytes for a maximum message size is similar to the magic number of
4096 bytes for a maximum write to a pipe of FIFO.
SEMAPHORES
Semaphores
Semaphores are a synchronization primitive. As a form of IPC, they are not used for
exchanging large amounts of data as are pipes, FIFO and message queues, but are
intended to let multiple processes synchronize their memory segments. This section
provides a general overview of the semaphore system calls. Consider a semaphore as
an integer valued variable that is a resource counter. The value of the variable at any
point in time is the number of resource units available. If we have one resource, say a
file that is shared, then the valid semaphore values are zero and one. Since our use of
semaphores is to provide resource synchronization between different processes, the
actual semaphore value must be stored in the Kernel.
To obtain a resource that is controlled by a semaphore, a processes needs to test its
current value, and if the current value is greater than zero, decrement the value by
one. If the current value is zero, the processes must wait until the value is greater than
zero. To release a resource that is controlled by a semaphore, a process increments the
semaphores value. If some other process has been waiting for the semaphore value to
become greater that zero, that other process can now obtain the semaphore.
for( ; ; )
{
if (semaphore_value > 0)
{
semaphore_value--;
break;
}
}
What we have described as a semaphore is a single binary semaphore - a semaphore
with a single value that can be either zero or one. The system V implementation
expands this in two directions.
A semaphore is not a single value but a set of non-negative integer values. The number
of non-negative integer values in the set can be from one to some system- defined
maximum.
Each value in the set is restricted to zero and one. Instead, each value in the set can
assume any non-negative value, up to a system defined maximum value.
For every set of semaphores in the system, the Kernel maintains the following structure
of information.
/* defines the ipc_perm structure */
struct semid_ds {
struct ipc_perm sem_perm;
/* operation permission struct */
struct sem *sem_base;
/* ptr to first semaphore in set */
ushort sem_nsems;
/* # of semaphore in set */
time_t sem_otime;
/* time of last semop*/
time_t sem_ctime;
/* time of last change */
};
The sem structure is the internal data structure used by the Kernel to maintain the set
of values for a given semaphore. Every member of a semaphore set is described by the
following structure:
● Structsem
{
ushort semval; /* semaphore value,
nonnegative */
short sempid; /* pid of last operation */
ushort semncnt; /* # awaiting semval >
cval */
ushort semzcnt; /* # awaiting semval = 0 */
};
A semaphore is created, or an existing semaphore is accessed with the semget system
call.
#include <sys/sem.h>
int semget(key_t key, int nsems, int semflag);
The value returned by the semget is the semaphore identifier, semid, or -1 if an error
occurred.
File Locking with Semaphore
We create a binary semaphore - a single semaphore value that is either zero or one. To
lock the semaphore we call semop to do two operations automatically. First, wait for
the semaphore value to become zero, then increment the value to one. This is an
example where multiple semaphore operations must be done automatically by the
Kernel. If we took two system calls to do this - one tests the value and waits for it to
become zero, and the other to increment the value - the operation would not work.
Shared memory
Consider the normal steps involved in the client-server file-copying program that we
have been using for the example.
The server reads from the input file. Typically, the data is read by the Kernel into one of
its internal block buffers and copied from there to the server’s buffer. It should be noted
that most UNIX systems detect sequential reading, as is done by the server, and tries
to keep one block ahead of the read requests. This helps reduce the clock time required
to copy a file, but the data for every read is still copied by the Kernel from its block
buffer to the caller’s buffer.
The server writes the data in a message, using one of the techniques described in this
chapter - a pipe, FIFO or message queue. Any of these forms of IPC require the data to
be copied from the user’s buffer into the Kernel.
The client reads the data from the IPC channel, again requiring the data be copied from
the Kernel’s IPC buffer to the client’s buffer.

Finally, the data is copied from the client’s buffer, the second argument to the write
system call, to the output file. This might involve just copying the data into a Kernel
buffer and returning, with the Kernel doing the actual write operation to the device at
some later time.
A total of four copies of the data is required. Additionally, these four copies are done
between the Kernel and a user process - an intercontext copy. While most UNIX
implementation try to speed up these copies as much as possible, they can still be
expensive.
The problem with these of IPC - pipes, FIFOs and message queues is that for two
processes to exchange information, the information has to go through the Kernel.
Shared memory provides a way around this by letting two or more processes share a
memory segment. There is, of course, a problem involved in multiple processes sharing
a piece of memory: the processes have to coordinate the use of the memory among
themselves. If one process is reading into some shared memory, for example, other
processes must wait for the read to finish before processing the data.
The steps for client-server program now becomes:
● The server gets accesses to a shared memory using a semaphore.
● The server reads from the input file into the shared memory segment. The address
to read into, the second argument to the read system call, points into shared
memory.
● When the read is complete the server notifies the client, again using a semaphore.
● The client writes the data from the shared memory segment to the output file.
● A shared memory segment is created, or an existing one is accessed with the

shmget system call.
FILE I/O
In general, the functions available for file I/O are those that open a file, read a file,
write a file and so on. Most of the UNIX file I/O can be performed using only five
functions: open, read, write, lseek and close. The functions described here are
unbuffered I/O. The term unbuffered refers to the fact that each read or write invokes
a system call in the Kernel.
File Descriptors
To the Kernel, all open files are referred to by file descriptors. A file descriptor is a
non-negative integer. When an existing file is opened or a new file is created, the
Kernel returns a file descriptor to the process. When the file is to be read or written, it
is identified with the file descriptor that was returned by open or creat as an argument
to either read or write.
By convention, UNIX Shells associate file descriptor 0 with the standard input of a
process, file descriptor 1 with the standard output, and file descriptor 2 with the
standard error. These numbers are replaced in POSIX.1 applications with the symbolic
constants STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO. These are defined in
the <unistd.h> header.
open function
A file is opened or created by calling the open function.
Syntax:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int oflag, …/*, mode_t mode */);
Returns file descriptor if OK, -1 on error.
The third argument is shown as …, which specifies that the number and types of the
remaining arguments may vary. For this function, the third argument is used only when
a new file is created.
● pathname: It is the name of the file to open or create.
oflag: The options for this argument are present in <fcntl.h>.
One or more of the following options can be included by OR’ing these:
● O_READONLY Open for reading only.
O_WRONLY Open for writing only.
O_RDWR Open for reading and writing.
Note: One and only one of the above three constants must be specified. The following
constants are optional:
O_APPEND Append to the end of file on each write.
O_CREAT Create a file if it doesn’t exist.
O_TRUNC If the file exists, and if the file is successfully opened, truncate its length to 0.
creat function
A new file can also be created by this function.
Syntax:
#include <sys/stat.h>
#include <fcntl.h>
int creat(const char *pathname, mode_t mode);
Returns file descriptor opened for write-only if OK, -1 on error.
● Note: This function is equivalent to
open(pathname O_WRONLY | O_CREAT | O_TRUNC, mode);

close function
An open file is closed.

Syntax:
#include <unistd.h>
int close(int filedes);
Note: When a process terminates, all open files are automatically closed by the Kernel.
lseek function
An open file can be explicitly positioned by calling lseek.
Syntax:
#include <unistd.h>
off_t lseek(int filedes, off_t offset, int whence);
Returns new file offset if OK, -1 on error.
The interpretation of the offset depends on the value of the whence argument.
● If whence is SEEK_SET, the file’s offset is set to offset bytes from the beginning of
the file.
● If whence is SEEK_CUR, the file’s offset is set to its current value plus the offset.
The offset can be positive or negative.
If whence is SEEK_END, the file’s offset is set to the size of the file plus the offset.
The offset can be positive or negative.
● Example:
#include <fcntl.h>
main(argc, argv)
int argc;
char *argv[];
{
int fd, skval;
char c;
if (argc != 2)
exit();
fd = open(argv[1], O_RDONLY);
if (fd == -1)
exit();
while ((skval == read(fd, &c, 1)) == 1)
{
printf("char %c\n", c);
skval = lseek(fd, 1023L, 1);
printf("new seek val %d\n", skval);
}
}
read function
Data is read from an open file with the read function.
Syntax:
#include <unistd.h>
ssize_t read(int filedes, void *buff, size_t nbytes);
Returns number of bytes read, 0 if end of file, -1 on error
The read operation starts at the file’s current offset. Before a successful return, the
offset is incremented by the number of bytes actually read.
● Example:
#include <fcntl.h>
main()
{
int fd;
char lilbuf[20];
fd = open("/etc/passwd", O_RDONLY);
read(fd, lilbuf, 20);
}
write function
Data is written to an open file with the write function.
Syntax:
#include <unistd.h>
ssize_t write(int filedes, void *buff, size_t nbytes);
Returns number of bytes written if OK, -1 on error.
For a regular file, the write starts at the file’s current offset. After a successful write, the
file’s offset is incremented by the number of bytes actually written.
● Example:
#include <fcntl.h>
main()
{
int fd, i;
char buf[512];
for (i = 0; i < sizeof(buf); i++)
buf[I] = ’a’;
fd = open("/etc/passwd", O_WRONLY);
write(fd, buf, sizeof(buf));
}
dup and dup2 functions

An existing file descriptor is duplicated by either of the following functions:
Syntax:
#include <unistd.h>
int dup(int filedes);
int dup2(int filedes, int filedes2);
Both return new file descriptor if OK, -1 on error.
The new file descriptor returned by dup is guaranteed to be the lowest numbered
available file descriptor. With dup2 the new value of the descriptor is specified in the
filedes2 argument. If filedes2 is already open, it is first closed. If filedes equals filedes2,
then dup2 returns filedes2 without closing it.
fcntl function
The fcntl function can change the properties of a file that is already open.
Syntax:
#include <unistd.h>
#include <fcntl.h>
int fcntl(int filedes, int cmd, …/* int arg */ );
Return depends on cmd if OK, -1 on error.
The fcntl function is used for five different functions:
● To duplicate an existing descriptor (cmd = F_DUPFD).
● To get/set file descriptor flags (cmd = F_GETFD or F_SETFD).
● To get/set file status flags (cmd = F_GETFL or F_SETFL).
● To get/set asynchronous I/O ownership (cmd = F_GETOWN or F_SETOWN).
● To get/set record locks (cmd = F_GETLK, F_SETLK, or F_SETLKW).
● Example:
#include <fcntl.h>
#include "ourhdr.h"
int main(int argc, char *argv[])
{
int accmode, val;
if (argc != 2)
err_quit("usage: a.out <descriptor#>");
if ((val = fcntl(atoi(argv[1]), F_GETFL, 0)) < 0)
err_sys("fcntl error for fd %d", atoi(argv[1]));
accmode = val & O_ACCMODE;
if (accmode == O_RDONLY)
printf("read only");
else if (accmode == O_WRONLY)
printf("write only");
else
err_dump("unknown access mode");
if (val & O_APPEND)
printf(", append");
if (val & O_NONBLOCK)
printf(", nonblocking");
#if !defined(_POSIX_SOURCE) && defined(O_SYNC)
if (val & O_SYNC)
printf(", synchronous writes");
#endif
putchar(‘\n’);
exit(0);
}
#Input & Output
$ a.out 0 < /dev/tty
read only
$ a.out 1 > temp.foo
$ cat temp.foo
write only
$ a.out 2 2>>temp.foo
write only, append
$ a.out 5 5<>temp.foo
read write
SYSTEM CALLS AND LIBRARY FUNCTIONS

PROCESS CONTROL
fork function
All operating systems provide service points through which programs request services
from the Kernel. All variants of UNIX provide a well defined, limited number of entry
points directly into the Kernel called system calls.
The technique used on UNIX systems is for each system call to have a function of the
same name in the standard C library. The user process calls this function, using the
standard C calling sequence. This function then invokes the appropriate Kernel service,
using whatever technique is required on the system. For example, the function may put
one or more of the C arguments into general registers and then execute some machine
instruction that generates a software interrupt in the Kernel.
Note: Some functions such as the printf function may invoke the write system call to
perform the output, but the functions strcpy and atoi (converts ASCII to Integer) do
not involve the operating system at all.
In general, both system calls and library functions appear as normal C functions. Both
exist to provide services for application programmers. However, library functions can be
replaced if desired, whereas the system calls usually cannot be replaced. For example,
the memory allocation function malloc, implements a particular type of allocation. If we
don’t like the way it allocates the memory, we can define our own malloc function,
which will probably use the sbrk system call. In this case the system call sbrk in the
kernel allocates an additional chunk of space to the process and the library function
malloc manages this space.
Another difference between system calls and library functions is that system calls
usually provide a minimal interface while library functions often provide more elaborate
functionality. This is clearly illustrated in the above example of malloc.
PROCESS CONTROL
Every process has a unique process ID, a non-negative integer. There are some special
processes. Process ID 0 is usually the scheduler process and is often known as the
swapper. It is part of the Kernel and is known as a system process. Process ID 1 is
usually the init process and is invoked by the Kernel at the end of the bootstrap
procedure. It is a normal user process, although it does run with superuser privileges.
In addition to the process ID, there are other identifiers for every process. The following
functions return these identifiers:
#include <unistd.h>
pid_t getpid(void); Returns process ID of calling process.
pid_t getppid(void); Returns parent process ID of calling process.
uid_t getuid(void); Returns real user ID of calling process.
uid_t geteuid(void); Returns effective user ID of calling process.
gid_t getgid(void); Returns real group ID of calling process.
gid_t getegid(void) Returns effective group ID of calling process.
fork function
The only way a new process is created by the UNIX Kernel is when an existing process
calls the fork function.
Syntax:
#include <unistd.h>
pid_t fork(void);
Returns 0 in child, process ID of child in parent, -1 on error.
The new process created by fork is called the child process. This function is called once,
but returned twice. The only difference in the returns is that the return value in the
child is 0 while the return value in the parent is the process ID of the new child. The
reason the child’s process ID is returned to the parent is because a process can have
more than one child, so there is no function that allows a process to obtain the process
IDs of its children. The reason fork returns 0 to the child is because a process can have
only a single parent, so the child can always getppid to obtain the process ID of its
parent.
SYMBOLIC DEBUGGER
symbolic debugger
Addresses
Commands
Variable!value
Miscellaneous commands
Files
sdb(CP)
sdb -- symbolic debugger
Syntax:
sdb [-w] [-W] [[objfile [corefile]] [directory:directory:...] [[corefile|-]]
Description
The sdb command calls a symbolic debugger that can be used to debug C programs. It
may be used to examine their object files and core files and to provide a controlled
environment for their execution.
The command-line options available are:
-w
While a process is running under sdb, all addresses refer to the executing
program; otherwise they refer to objfile or corefile. An initial argument of -w
permits overwriting locations in objfile.
-W
By default, warnings are provided if the source files used in producing objfile
cannot be found, or are newer than objfile. This checking feature and the
accompanying warnings may be disabled by the use of the -W flag.

objfile is a COFF format executable program file which has been compiled with the -g or
-Zi (debug) option. If it has not been compiled with a debug option specified, the
symbolic capabilities of sdb will be limited, but the file can still be examined and the
program debugged. The default for objfile is a.out. corefile is assumed to be a core
image file produced after executing objfile; the default for corefile is core. The core file
need not be present. A "-" in place of corefile will force sdb to ignore any core image
file.
It is useful to know that at any time there is a current line and current file. If corefile
exists, then they are initially set to the line and file containing the source statement at
which the process terminated. Otherwise, they are set to the first line in main(). The
current line and file may be changed with the source file examination commands.
Names of variables are written just as they are in C. sdb does not truncate names.
Variables local to a procedure may be accessed using the form procedure:variable. If no
procedure name is given, the procedure containing the current line is used by default.
It is also possible to refer to structure members as variable pointers to structure
members as variable->member, and array elements as variable[number]. Pointers may
be dereferenced by using the form pointer[0]. Combinations of these forms may also be
used. A number may be used in place of a structure variable name, in which case the
number is viewed as the address of the structure, and the template used for the
structure is that of the last structure referenced by sdb. An unqualified structure
variable may also be used with various commands. Generally, sdb will interpret a
structure as a set of variables. Thus, sdb will display the values of all the elements of a
structure when it is requested to display a structure. An exception to this interpretation
occurs when displaying variable addresses. An entire structure does have an address,
and it is this value sdb displays, not the addresses of individual elements.
Elements of a multidimensional array may be referenced as
variable[number][number]..., or as variable[number,number,...]. In place of number,
the form number;number may be used to indicate a range of values, an asterisk (*)
may be used to indicate all legitimate values for that subscript, or subscripts may be
omitted entirely if they are the last subscripts and the full range of values is desired. As
with structures, sdb displays all the values of an array or of the section of an array if
trailing subscripts are omitted. It displays only the address of the array itself or of the
section specified by the user if subscripts are omitted.
A particular instance of a variable on the stack may be referenced by using the form
procedure:variable,number. All the variations mentioned in naming variables may be
used. number is the occurrence of the specified procedure on the stack, counting the
top, or most current, as the first. If no procedure is specified, the procedure currently
executing is used by default.
It is also possible to specify a variable by its address. All forms of integer constants
which are valid in C may be used, so that addresses may be input in decimal, octal, or
hexadecimal.
Line numbers in the source program are referred to as filename:number or
procedure:number. In either case, the number is relative to the beginning of the file. If
no procedure or filename is given, the current file is used by default. If no number is
given, the first line of the named procedure or file is used.
Addresses
The address in a file associated with a written address is determined by a mapping

associated with that file. Each mapping is represented by two triples (b1, e1, f1) and
(b2, e2, f2) and the file address corresponding to a written address is calculated as
follows:
b1 <= address < e1

then
file address = address +f1 -b1

otherwise
b2<=address<e2
then
file address = address +f2 -b2
otherwise, the requested address is not legal. In some cases (for example, programs
with separated text and data space) the two segments for a file may overlap.
The initial setting of both mappings is suitable for normal a.out and core files. If either
file is not of the kind expected, then, for that file, b1 is set to 0, e1 is set to the
maximum file size, and f1 is set to 0; in this way the whole file can be examined with
no address translation.
In order for sdb to be used on large files, all appropriate values are kept as signed
32-bit integers.
Commands
The commands for examining data in the program are:

t
Print a stack trace of the terminated or halted program.
Print the top line of the stack trace.

variable\/clm
Print the value of variable according to length l and format m. A numeric count c
indicates that a region of memory, beginning at the address implied by variable, is
to be displayed. The length specifiers are:
b
one byte
h
two bytes (half word)

l
four bytes (long word)
Legal values for m are:

c
character
d
decimal
u
unsigned decimal
o
octal
x
hexadecimal
f
32-bit single precision floating point

g
64-bit double precision floating point

s
Assume variable is a string pointer and print the characters starting at the address
pointed to by the variable.
Print characters starting at the variable's address. This format may not be used
with register variables.
Pointer to procedure.
i
Disassemble machine-language instruction with addresses printed numerically and

symbolically.
I
Disassemble machine-language instruction with addresses just printed numerically.
Length specifiers are only effective with the c, d, u, o, and x formats. Any of the
specifiers, c, l, and m, may be omitted. If all are omitted, sdb chooses a length
and a format suitable for the variable's type as declared in the program. If m is
specified, then this format is used for displaying the variable. A length specifier
determines the output length of the value to be displayed, sometimes resulting in
truncation. A count specifier c tells sdb to display that many units of memory,
beginning at the address of variable. The number of bytes in one such unit of
memory is determined by the length specifier l, or if no length is given, by the size
associated with the variable. If a count specifier is used for the s or a command,
then that many characters are printed. Otherwise successive characters are
printed until either a null byte is reached or 128 characters are printed. The last
variable may be redisplayed with the command "./".
The sh(C) metacharacters * and ? may be used within procedure and variable
names, providing a limited form of pattern matching. If no procedure name is
given, variables local to the current procedure and global variables are matched; if
a procedure name is specified, then only variables local to that procedure are
matched. To match only global variables, the form :pattern is used.
linenumber?lm
variable:?lm
Print the value at the address from a.out or text space given by linenumber or
variable (procedure name), according to the format lm. The default format is i.
Variable=lm
linenumber=lm
number=lm
Print the address of variable or linenumber, or the value of number, in the format
specified by lm. If no format is given, then lx is used. The last variant of this
command provides a convenient way to convert between decimal, octal, and
hexadecimal.
Variable!value
Set variable to the given value. The value may be a number, a character constant,
or a variable. The value must be well defined; expressions that produce more than
one value, such as structures, are not allowed. Character constants are denoted
‘character’. Numbers are viewed as integers unless a decimal point or exponent is
used. In this case, they are treated as having the type double. Registers are
viewed as integers. The variable may be an expression that indicates more than
one variable, such as an array or structure name. If the address of a variable is
given, it is regarded as the address of a variable of type int. C conventions are
used in any type conversions necessary to perform the indicated assignment.
x
Print the machine registers and the current machine-language instruction.
Print the current machine-language instruction.

The commands
❍ for examining source files are:
e[procedure]
e[filename]
e[directory/]
e [directory filename]
The first two forms set the current file to the file containing procedure or to
filename. The current line is set to the first line in the named procedure or file.
Source files are assumed to be in directory (the default is the current working
directory). The latter two forms change the value of directory. If no procedure,
filename, or directory is given, the current procedure and file names are reported.
/regular expression[/]
Search forward from the current line for a line containing a string matching regular
expression as in ed(C). The trailing \/ may be omitted.
?regular expression[?]
Search backward from the current line for a line containing a string matching
regular expression as in ed(C). The trailing ? may be omitted.

p
Print the current line.
z
Print the current line followed by the next 9 lines. Set the current line to the last
line printed.
w
Window. Print the 10 lines around the current line.

number
Set the current line to the given line number. Print the new current line.
count[+-]
Increment (+) or decrement (-) the current line by count lines. Print the new
current line.
The commands
❍ for controlling the execution of the source program are:
[count]rargs
[count]R
Run the program with the given arguments. The r command with no arguments
re-uses the previous arguments to the program while the R command runs the
program with no arguments. An argument beginning with < or > causes
redirection for standard input or output, respectively. If count is given, it specifies
the❍number of breakpoints to be ignored.
Linenumberc[count]
linenumber C [count]
Continue after a breakpoint or interrupt. If count is given, the program will stop
when count breakpoints have been encountered. The signal which caused the
program to stop is reactivated with the C command and ignored with the c
command. If a line number is specified, a temporary breakpoint is placed at the
line and execution is continued. The breakpoint is deleted when the command
finishes.
linenumber g [count]
Continue after a breakpoint with execution resumed at the given line. If count is
given,
❍ it specifies the number of breakpoints to be ignored.
s[count]
S[count]
Single-step the program through count lines. If no count is given, then the
program is run for one line. S is equivalent to s except it steps through procedure
calls.
i
I
Single-step by one machine-language instruction. The signal which caused the

program to stop is reactivated with the I command and ignored with the i
command.
❍
variable:m[count]
address:m[count]
Single-step (as with s) until the specified location is modified with a new value. If
count is omitted, it is effectively infinity. Variable must be accessible from the
current procedure. Since this command is done by software, it can be very slow.
[level] v
Toggle verbose mode, for use when single-stepping with the S, s, or m

commands. If level is omitted, then just the current source file and/or subroutine
name is printed when either changes. If level is 1 or greater, each C source line is
printed before it is executed; if level is 2 or greater, each assembler statement is
also printed. A v turns verbose mode off if it is on for any level.
k
Kill❍the program being debugged.

procedure(arg1,arg2,...)
procedure(arg1,arg2,...)/m
Execute the named procedure with the given arguments. Arguments can be
integer, character, or string constants or names of variables accessible from the
current procedure. The second form causes the value returned by the procedure to
be printed according to format m. If no format is given, it defaults to d. This
facility is only available if the program was compiled with the -g or -Zi options.
[linenumber] b [command;command;...]
Set a breakpoint at the given line. If a procedure name without a line number is
given (for example, "proc:"), a breakpoint is placed at the first line in the
procedure even if it was not compiled with the -g or -Zi options. If no linenumber
is given, a breakpoint is placed at the current line. If no commands are given,
execution stops just before the breakpoint and control is returned to sdb.
Otherwise the commands are executed when the breakpoint is encountered and
execution continues. Multiple commands are specified by separating them with

semicolons. If the k command is specified, control returns to sdb at the
breakpoint.
Print a list of the currently active breakpoints.

[linenumber] d
Delete a breakpoint at the given line. If no linenumber is given, then the

breakpoints are deleted interactively. Each breakpoint location is printed and a line
is read from standard input. If the line begins with a y or d, then the breakpoint is
deleted.
D
Delete all breakpoints.

l
Print the last executed line.

linenumber a
Announce. If linenumber is of the form proc:number, the command effectively

does a linenumber b l. If linenumber is of the form proc:, the command effectively
does a proc:b T.
Miscellaneous commands:
! command
The command is interpreted by sh(C).
new-line
If the previous command printed a source line, then advance the current line by
one and print the new current line. If the previous command displayed a memory
location, display the next memory location.
end-of-file character
Scroll. Print the next 10 lines of instructions, source or data depending on which
was printed last. The end-of-file character is usually <Ctrl>-D.
\< filename
Read commands from filename until the end of file is reached, and then continue
to accept commands from standard input. When sdb is told to display a variable
by a command in such a file, the variable name is displayed along with the value.
This command may not be nested; "\<" may not appear as a command in a file.
M
Print the address maps.

M [?/] [*] b e f
Record new values for the address map. The arguments "?" and "\/" specify the
text and data maps, respectively. The first segment (b1, e1, f1) is changed unless
"*" is specified; in which case, the second segment (b2, e2, f2) of the mapping is
changed. If fewer than three values are given, the remaining map parameters are
left unchanged.
string
Print the given string. The C escape sequences of the form \character are
recognized, where character is non-numeric.
q
Exit the debugger.

The following commands also exist and are intended only for debugging the debugger:
V
Print the version number.

Q
Print a list of procedures and files being debugged.

Y
Toggle debug output.
Files
a.out
core
Warnings
When sdb prints the value of an external variable for which there is no debugging
information, a warning is printed before the value. The size is assumed to be int
(integer).
Data which are stored in text sections are indistinguishable from functions.
Line number information in optimized functions is unreliable, and some information may
be missing.
Notes: If a procedure is called when the program is not stopped at a breakpoint (such
as when a core image is being debugged), all variables are initialized before the
procedure is started. This makes it impossible to use a procedure which formats data
from a core image.

Unix

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unix

Uploaded by

Copyright:

Available Formats

Unix India CompuMasterLtd,.

INTRODUCTION TO UNIX &

What is an operating system?

What is a UNIX system?

An operating system (OS) is a software program that acts as an interface between a

Example: DOS, Windows-95.

Example: UNIX, Zenix.

Generally, the operating system used on single-user systems is MS-DOS.

A typical UNIX-based computer system includes a number of hardware and software

● Message of the Day

Changing your password

● The minimum time between password changes has not elapsed.

Thu Apr 1 09:34:50 EST 1999

#Assume that user Raghu has logged in and then

$ rmdir cob_prog <RET>

● ls –a: Lists all entries including the hidden files.

● ls –d: Lists a directory file instead of its contents.

● ls –u: Lists files in order of last access time.

● ls –r: Lists the files and directories in reverse order.

#Assume that in the sub directory

#/user/raghu, raghu has 2 files

MORE UNIX COMMANDS

Pipes and filters

$ cat data1 <RET>

$ cat data1 data2 <RET>

$ mv data3 new file<RET>

In any command, a directory can be referred to in one of two ways:

#Refers to the directory user under the root #directory.

<press Ctrl +d to see the sorted output>

India 6890 Asia

#Let us consider the following file called mast:

#Let us suppose that we have two files, first and #second.

#Let us assume that we have file called bag.

File structure and Security

The components are:

● Process Scheduling and Management

● Interrupt and Error Handling

● Date and Time Services

Functions of an operating system

● File Structure and Security

● I/O Redirection and Piping

Modular Structure of UNIX System

are given to each file at three levels.

UNIX directory structure

etc dev bin lib tmp usr

Deleting and changing text

H Move cursor left

J Move cursor down

L Move cursor right

A Append after cursor

ZZ Write file and quit

Programming WITH THE SHELL

The MAIL variable

The variables PS1 and PS2:

CONDITIONS AND LOOPS

test – file tests

test - string comparison

-ge Greater than or equal to

Test Exit Status

test - string comparison

Test Exit Status

-n stg True if string stg is not a null string

-z stg True if string stg is a null string