You are on page 1of 45

UNIX Operating Systems

Unit 2

SUnit 2
Structure 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 Objectives The UNIX File system Typical UNIX Directory Structure Directory and File Handling Commands Making Hard and Soft (Symbolic) Links Specifying multiple filenames Quotes File and Directory Permissions Inspecting File Content Finding Files Finding Text in Files Sorting files Advanced Text File Processing File Compression and Backup

Unix File System

Handling Removable Media (e.g. floppy disks)

2.1 Objectives
This chapter covers: The UNIX filesystem and directory structure. File and directory handling commands. How to make symbolic and hard links. How wildcard filename expansion works. What argument quoting is and when it should be used. File and directory permissions in more detail and how these can be changed. Ways to examine the contents of files.
Page No.: 19

Sikkim Manipal University

UNIX Operating Systems

Unit 2

How to find files when you don't know how their exact location. Ways of searching files for text patterns. How to sort files. Tools for compressing files and making backups. Accessing floppy disks and other removable media.

2.2 The UNIX File system


The UNIX operating system is built around the concept of a filesystem which is used to store all of the information that constitutes the long-term state of the system. This state includes the operating system kernel itself, the executable files for the commands supported by the operating system, configuration information, temporary workfiles, user data, and various special files that are used to give controlled access to system hardware and operating system functions. A file system is a group of files and relevant information regarding them. Your whole hard disk may comprise a single file system or it may be partitioned to house several file systems. However, the reverse is not true. No file system can be split over two different disks. Creation of file systems is dealt with in a later chapter. For the present, let us concentrate on

understanding an already installed file system. The disk space allotted to a Unix file system is made up of 'blocks, each of which are typically of 512 bytes. Some file systems may have blocks of 1024 or 2048 bytes as well. The block size depends upon how the file system has been implemented on a particular installation. It may also change from one Unix version to another. Should you want to find out the block size on your file system, use the cmchk command which reports the block size. $ cmchk BSIZE = 1024
Sikkim Manipal University Page No.: 20

UNIX Operating Systems

Unit 2

The block size rarely exceeds 2048 bytes. Whenever a file is created one block is made available for storing this file's contents. Thus, on a file system whose block size is 2048 bytes if we create a small file of 1000 bytes still one block (2048 bytes) would be assigned for this file's storage, thereby wasting precious 1048 bytes. Then won't it be worthwhile having as small a block size as possible? No, because if the block size is 512 bytes and we create a file of 2000 bytes then to store it on the disk four disk accesses would be necessary. Remember disk accesses are time consuming. More the disk accesses required more would be the time required for reading/writing this file. All the blocks belonging to the file system are logically divided into four parts. The first block of a file system is called the 'Boot block' which is followed by Super block, Inode Table and Data Blocks. Let us understand these blocks one by one. The Boot Block This represents the beginning of the file system. It contains a program called bootstrap loader. This program is executed when we boot the host machine. Although only one boot block is needed to start up the system, all file systems contain one (possibly empty) boot block. The Super Block The super block describes the state of the file system - how large it is, how many maximum files can it accommodate, how many more files can be created etc. The Inode Table We know that all entities in Unix are treated as files. The information related to all these files (not the contents) is stored in an Inode Table on the disk. For each file, there is an Inode entry in the table. Each entry is made up of 64 bytes and contains the relevant details for that file. These details are:
Sikkim Manipal University Page No.: 21

UNIX Operating Systems

Unit 2

(a) (b) (c) (d) (e) (f) (g) (h) (i)

Owner of the file Group to which the owner belongs Type of file File access permissions Date and time of last access Date and time of last modification Number of links to the file Size of the file Addresses of blocks where the file is physically present

Data Blocks These contain the actual file contents. An allocated block can belong to only one file in the file system. This block cannot be used for storing any other file's contents unless the file to which it originally belonged is deleted. Surrogate Super Block and Inode Table Judging by the information stored in the Inode Table, we can see that this must change whenever we use any file, or change its permissions, etc. Making these changes on the disk would gobble up a lot of precious CPU time. To remedy this, a copy of the Super Block and Inode Table gets loaded into memory (RAM) at start-up time. Since memory access is faster than disk access, a lot less time is consumed in recording the changes in the" RAM copies of Super Block and Inode Table every time some modification occurs. The original Super Block and Inode Table on the disk are updated after a fixed interval of time, say every 30 seconds, by a command called sync. sync synchronises the inode table in memory with the one on disk by simply overwriting the memory copy on to the disk: Thus, the changes that may have been recorded in the copy in memory during the last interval of 30 seconds get duly registered on the disk.

Sikkim Manipal University

Page No.: 22

UNIX Operating Systems

Unit 2

How Does Unix Access Files Internally a file is identified by Unix by a unique Inode number associated with it. We can obtain the inode number associated with a file by using the command Is -i. $ Is -i reports reports 12324 Here 12324 is the inode number. We know that a directory in Unix is nothing but a file. A directory file contains the names of the files/sub-directories present in that directory along with an inode number for each. The inode number is nothing but an index into the inode table where the information about the file is stored. For example, amongst several slots present in the inode table slot number 12342 contains information about the file reports. Suppose the file reports is present in a directory called mydir. If we attempt to cat the reports file let us see how Unix would handle this situation. Firstly, it would check whether we have a read permission to the mydir directory file. If so, it would find out whether this directory file has an entry reports in it. If such an entry is found then it would pick up the inode number for this file from mydir. This inode number as we know is an index into the in-core (memory) inode table. Using this inode number the information about reports is accessed from the inode table. From this information it is ascertained whether we have a read permission for the reports file. If so then the contents of the reports file are read from the disk addresses mentioned in the' inode entry of reports and displayed on the screen. Storage of Files Amongst other information, each inode entry in the inode table consists of 13 addresses each, which specify completely where the Contents of the file are stored on the disk. These addresses may be numbered 0 through 12. Of these, the first ten addresses, 0 through 9 point to 1 KB blocks on disk. For example, a file of size 3 KB may have its entries as shown in Figure 3.1.
Sikkim Manipal University Page No.: 23

UNIX Operating Systems

Unit 2

The address 4970 signifies where the first 1 kilobytes of the file are stored. The next 1 KB chunk is at 5231, and the next at 3401.
Typical Inode Entry Owner Group Permissions Access Time Modification Time Inode Modi. Time File Size 0 1 2 3 4 5 6 7 8 9 10 11 12 4970 5231 3401 3401 : : : : : : : : : 5231 4970

Figure 3.1

These addresses may be scattered throughout the disk, as files are stored in chunks wherever empty blocks of disk are available. This is specially the case with large files, for which a very big chunk may be impossible to find. Thus, the addresses 0 to 9 can handle a file of a maximum size of 10 KB.

Sikkim Manipal University

Page No.: 24

UNIX Operating Systems

Unit 2

For files larger than this, Unix has a very interesting way of indicating their location. Have a look at Figure 3.2,
Owner Group File Type Permission Access Time Modification Time Inode Modi. Time Size 0 1 2 3 4 5 6 7 8 9 10 11 12 4970 5231 3401 7654 8765 9877 7666 4444 7665 8771 7777 8888 9999 7665 3401 : : : : 5231 4970

Figure 3.2

As can be seen from the figure the 10th entry also contains an address of a 1 KB block. This block doesnt contain the file contents. Instead, It consists
Sikkim Manipal University Page No.: 25

UNIX Operating Systems

Unit 2

of 256 four-byte slots which can store 256 more addresses. Each of these 256 addresses can point to a 1 KB block on disk. Thus, for a file which occupies 12 blocks on the disk, the first 10 addresses would be found in the inode entry for this file, whereas address of 11th and 12th block would be present in a 1 KB block whose address in turn is stored as the 11th address in the inode entry. Thus, the maximum file size that can be addressed using the 10th address entry is 256 KB. This is called Single Indirection. For a still larger file, Double Indirection is used. The 12th address in the inode entry points to a block of 256 addresses, each of which in turn points to another set of 256 addresses. These are the addresses of 1 KB chunks, making the maximum file size accessible by Double Indirection equal to 256 X 256 KB, which is 64 MB. For an even larger file - you guessed it - Unix uses Triple Indirection. This way, the last address in the inode entry yields a massive 256 X 256 X 256 KB, i.e. 16 gigabytes! That means, the maximum file size Unix provides for is the sum of sizes accessible by the 13 addresses that occur in the inode entry. Together, they yield 10 KB + 256 KB + 64 MB + 16 GB, which is more than sufficient for all practical purposes. Lets now see what links have to do with inode numbers. If we choose to rename a file, all that Unix does is associate the new name with the same inodc number and forget the old name. This association may be thought of as a link with the inode number, or in essence with the file. Unix provides for its files to have more than one such links. For instance, a

Sikkim Manipal University

Page No.: 26

UNIX Operating Systems

Unit 2

file called reports in the root directory may have a link called results in a sub-directory. The way to create a link is by saying, $ In reports impdir/results Assuming that impdir is a directory present in root, it now contains the entry for results, whose inode number is, say, 12324. We can check what the inode number of reports was, in the first place, by using 1s with -i option.'" $ 1s -i reports. 12324 reports This is no coincidence - the inode numbers of reports and results have to be the same, as they are referring to the same file. The file is physically present in only one location, but can be accessed by either of its two names. What if you were to delete reports? $ rm reports This would tell Unix to discard the name reports as a link to the file with same inode number 12324. However, the file is still very much present on the disk. The root directory no longer holds an entry for reports, but the other link, results in impdir, is intact. Hence the file is also intact, and will remain so until all links to it are deleted. If we use rm to delete results, only then will the file be physically deleted. Disk Related Commands One of the major concerns of the System Administrator of Unix installation is efficient hard disk management. Since the Unix file system is installed usually on a hard disk its upkeep is of primary importance. The System Administrator has to regularly monitor the. integrity of the file system and the amount of disk space available.

Sikkim Manipal University

Page No.: 27

UNIX Operating Systems

Unit 2

Neglecting this may eventually lead to a system crash. Lets see what are the commands usually used for the upkeep of hard disk. Checking Disk Free Space If we want to see how much of the disk is being used and what part of it lies free, Unix has for us a command called df ( for disk free). This command reports the free as well as the used disk space for all the file systems installed on your machine. $ df / (/dev/root ): 12970 27857 i-nodes

We have on our machine only one file system installed, the root file system or simply /dev/root.df reports the number of free disk blocks and free

inodes for this file system. If we want a more detailed information about disk usage we should say,
$ df ivt Mount / Dir Filesystem blocks /dev/root used free %used iused 95% ifree %iused 21% 282098 269146 12952 7410 27854

Now, the available blocks and inodes are reported numerically as well as percentages of total available blocks and inodes. This possibly gives a better idea of how much disk space is free. One thing that you must note is that df counts blocks in sizes of 512 bytes irrespective of the actual block size as reported by the cmchk command. Hence the actual amount of free disk space in bytes will be the free space reported by df multiplied by 512. For our file system this comes out to be, 12592 * 512 = 6631424 bytes ( approximately 6.32 MB) Thats the time the System Administrator goes into action and clean the file system of any unused files, empty files, empty directories, unreasqnably big files etc. In fact a program can be written to identify and delete such files.
Sikkim Manipal University Page No.: 28

UNIX Operating Systems

Unit 2

dfspace Makes More Sense How much space do you think you have on your hard disk if it has 27857 free inodes or 12970 blocks? Finding it difficult to comprehend? Naturally so, because we understand the disk space better in terms of bytes and megabytes than in terms of inodes and blocks. And dfspace does exactly that. It reports the free disk space in terms of megabytes and percentage of total disk space. $ dfspace dfspace : not found Don't get offputted by that message. It came because the dfspace command is present in fete directory. This directory doesn't get searched when we execute any command. So to execute it we need to say, $ /etc/dfspace : Disk space: 6.32 MB of 137.74 MB available ( 4.59%). Total Disk Space: 6.32 MB of 137.74 MB available ( 4.59%). Now dfspace does all the mathematics internally and reports free disk space for the root file system. Had there been other file systems installed their free space would also have been reported. Additionally it also reports the total disk space available. Note that the disk space available (6.32 MB) tallies with what we calculated earlier, while discussing the df command. Only difference being this time the calculations were done by dfspace. Disk Usage. The du Command du sounds similar to debut is different in its working. df and dfspace report the disk space available in the file system as a whole whereas do reports
Sikkim Manipal University Page No.: 29

UNIX Operating Systems

Unit 2

the disk space used by specified files and directories. For example: $ du 226 ./backup 418 ./fa/backup 1182 ./fa 4 ./check 16 ./dbf 1662 Here du is reporting the number of blocks used by the current directory (denoted by .) and those used by sub-directories within the current directory. Thus, when invoked without any arguments it assumes that blocks occupied by current directory and the directories lying within it are to be reported. If we specify a directory then du descends down this directory locating any sub-directories lying in it and reports the blocks used by the directory and the sub-directories. For example, $ du /dev 2 /dev/string 4 /dev/rdsk 4 /dev/dsk 2 /dev/mouse 20 /dev Thus, the number of blocks occupied by each sub-directory within /dev, as well as those occupied by /dev are displayed. If we want only the blocks occupied by the directory and not those occupied by the sub-directories within it we can say, $ du -s /dev 20 /dev
Sikkim Manipal University Page No.: 30

UNIX Operating Systems

Unit 2

du is often used to single out directories that occupy large amounts of disk space. Unused and redundant files and directories can then be eliminated from them, thereby freeing the valuable disk space. The ulimit Command Though most files in Unix occupy few tens of blocks, in some odd case a program may go awry and create files which occupy huge amounts of disk space. Sometimes things might take such a bad turn that the file might occupy several megabytes of disk space and ultimately harm the file system. To avoid creation of such files Unix uses a variable called ulimit. It stands for user limit and contains a value which signifies the largest file that can be created by the user in the file system. Let's see the current value of the ulimit variable. $ ulimit 2097152 This implies that the user cannot create a file whose size is bigger than 2097152 bytes, or 2048 KB. If you happen to create a file which exceeds this size, its size would be curtailed to 2048 KB and the program creating this file would be aborted. A user can reduce this value by saying, $ ulimit 1 Here onwards no file can be created whose size is bigger than 512 bytes. Once reduced this value remains effective till the user doesn't log out. Thus this change will be effective only for the current session and the system will return to its default value when you log out. An ordinary user can only reduce the ulimit value and is never permitted to increase it. A super-u r IS an exception to the rule and can increase or decrease this value.

Sikkim Manipal University

Page No.: 31

UNIX Operating Systems

Unit 2

Every item stored in a UNIX filesystem belongs to one of four types: 1. Ordinary files Ordinary files can contain text, data, or program information. Files cannot contain other files or directories. Unlike other operating systems, UNIX filenames are not broken into a name part and an extension part (although extensions are still frequently used as a means to classify files). Instead they can contain any keyboard character except for '/' and be up to 256 characters long (note however that characters such as *,?,# and & have special meaning in most shells and should not therefore be used in filenames). Putting spaces in filenames also makes them difficult to manipulate - rather use the underscore '_'. 2. Directories Directories are containers or folders that hold files, and other directories. 3. Devices To provide applications with easy access to hardware devices, UNIX allows them to be used in much the same way as ordinary files. There are two types of devices in UNIX - block-oriented devices which transfer data in blocks (e.g. hard disks) and character-oriented devices that transfer data on a byte-by-byte basis (e.g. modems and dumb terminals). 4. Links A link is a pointer to another file. There are two types of links - a hard link to a file is indistinguishable from the file itself. A soft link (or symbolic link) provides an indirect pointer or shortcut to a file. A soft link is implemented as a directory file entry containing a pathname.

Sikkim Manipal University

Page No.: 32

UNIX Operating Systems

Unit 2

2.3 Typical UNIX Directory Structure


The UNIX filesystem is laid out as a hierarchical tree structure which is anchored at a special top-level directory known as the root (designated by a slash '/'). Because of the tree structure, a directory can have many child directories, but only one parent directory. Fig. 2.1 illustrates this layout.

Fig. 2.1: Part of a typical UNIX filesystem tree

The top-level directory is known as the root. Beneath the root are several system directories. Directories are identified by the / character. For example: /bin /etc /usr Historically, user directories were often kept in the directory /usr. However, it is often desirable to organise user directories in a different manner. For example: /home/sunserv1 users on the Sun /home/ecuserv1 users on Engineering department's Suns Users have their own directory in which they can create and delete files, and create their own sub-directories. For example: /home/ecuserv1/men5jb belongs to Jane Brown, a postgraduate student in Mechanical Engineering

Sikkim Manipal University

Page No.: 33

UNIX Operating Systems

Unit 2

Directory / /bin /usr/bin /sbin

Typical Contents The "root" directory Essential low-level system utilities Higher-level programs system utilities and application

Superuser system utilities (for performing system administration tasks) Program libraries (collections of system calls that can be included in programs by a compiler) for lowlevel system utilities Program libraries for higher-level user programs Temporary file storage space (can be used by any user) User home directories containing personal file space for each user. Each directory is named after the login of the user. UNIX system configuration and information files Hardware devices A pseudo-filesystem which is used as an interface to the kernel. Includes a sub-directory for each active program (or process). Fig. 2.2: Typical UNIX directories

/lib /usr/lib /tmp

/home or /homes /etc /dev /proc

Fig. 2.2 shows some typical directories you will find on UNIX systems and briefly describes their contents. Note that these although these

subdirectories appear as part of a seamless logical filesystem, they do not need be present on the same hard disk device; some may even be located on a remote machine and accessed across a network. Pathnames Files and directories may be referred to by their absolute pathname. For example: /home/ecuserv1/men5jb/progs/prog1.f
Sikkim Manipal University Page No.: 34

UNIX Operating Systems

Unit 2

The initial / signifies you are starting at the root and hence indicates an absolute pathname. Files and directories may also be referred to by a relative pathname. For example: progs/prog1.f relative to the current location. The Home Directory Each user has a home directory. They will be attached to this directory when they login. Jane Brown's home directory is: /home/ecuserv1/men5jb The symbol ~ can be used to refer to the home directory. If Jane Brown wishes to refer to her file she can give: ~/progs/prog1.f rather than typing the long form: /home/ecuserv1/men5jb/progs/prog1.f The symbol ~ can also refer to the home directory of other users. ~cen6js refers to John Smiths home directory. So Jane can refer to a file in John Smith's home directory using: ~cen6js/test.dat The Current Directory The current directory can be referred to by the . character (a dot). This refers to your actual location in the filestore hierarchy. When you log in the current directory is set to your home directory.

Sikkim Manipal University

Page No.: 35

UNIX Operating Systems

Unit 2

The Parent Directory The parent directory is the directory immediately above the current directory. The parent directory can be referred to by the .. characters (two dots). For example to refer to the file test.dat in the parent directory: ../test.dat Relative path names may also be constructed by progressively stepping back through parent directories using the .. construct. For example if user men5jb is currently attached to the directory /home/ecuserv1/men5jb/progs and user cen6js has a file called example.data in their home directory /home/ecuserv1/cen6js Then user men5jb may refer to that file with the relative pathname ../../cen6js/example.dat

2.4 Directory and File Handling Commands


This section describes some of the more important directory and file handling commands.

pwd (print [current] working directory) pwd displays the full absolute path to the your current location in the filesystem. So $ pwd /usr/bin implies that /usr/bin is the current working directory.

ls (list directory) ls lists the contents of a directory. If no target directory is given, then the contents of the current working directory are displayed. So, if the current working directory is /, $ ls bin dev home mnt share usr var boot etc lib proc sbin tmp vol

Sikkim Manipal University

Page No.: 36

UNIX Operating Systems

Unit 2

Actually, ls doesn't show you all the entries in a directory - files and directories that begin with a dot (.) are hidden (this includes the directories '.' and '..' which are always present). The reason for this is that files that begin with a . usually contain important configuration information and should not be changed under normal circumstances. If you want to see all files, ls supports the -a option: $ ls -a Even this listing is not that helpful - there are no hints to properties such as the size, type and ownership of files, just their names. To see more detailed information, use the -l option (long listing), which can be combined with the -a option as follows: $ ls -a -l (or, equivalently,) $ ls -al Each line of the output looks like this:

where:
o

type is a single character which is either 'd' (directory), '-' (ordinary file), 'l' (symbolic link), 'b' (block-oriented device) or 'c' (characteroriented device). permissions is a set of characters describing access rights. There are 9 permission characters, describing 3 access types given to 3 user categories. The three access types are read ('r'), write ('w') and

Sikkim Manipal University

Page No.: 37

UNIX Operating Systems

Unit 2

execute ('x'), and the three users categories are the user who owns the file, users in the group that the file belongs to and other users (the general public). An 'r', 'w' or 'x' character means the corresponding permission is present; a '-' means it is absent.
o

links refers to the number of filesystem links pointing to the file/directory (see the discussion on hard/soft links in the next section). owner is usually the user who created the file or directory. group denotes a collection of users who are allowed to access the file according to the group access rights specified in the permissions field. size is the length of a file, or the number of bytes used by the operating system to store the list of files in a directory.

o o

date is the date when the file or directory was last modified (written to). The -u option display the time when the file was last accessed (read). name is the name of the file or directory.

ls supports more options. To find out what they are, type: $ man ls man is the online UNIX user manual, and it can be used to get help with commands and find out about what options are supported. It has quite a terse style which is often not that helpful, so some users prefer to the use the (non-standard) info utility if it is installed: $ info ls

cd (change [current working] directory) $ cd path

Sikkim Manipal University

Page No.: 38

UNIX Operating Systems

Unit 2

changes your current working directory to path (which can be an absolute or a relative path). One of the most common relative paths to use is '..' (i.e. the parent directory of the current directory). Used without any target directory $ cd resets your current working directory to your home directory (useful if you get lost). If you change into a directory and you subsequently want to return to your original directory, use $ cd

mkdir (make directory) $ mkdir directory creates a subdirectory called directoryin the current working directory. You can only create subdirectories in a directory if you have write permission on that directory.

rmdir (remove directory) $ rmdir directory removes the subdirectory directory from the current working directory. You can only remove subdirectories if they are completely empty (i.e. of all entries besides the '.' and '..' directories).

cp (copy) cp is used to make copies of files or entire directories. To copy files, use: $ cp source-file(s) destination where source-file(s) and destination specify the source and destination of the copy respectively. The behaviour of cp depends on whether the destination is a file or a directory. If the destination is a file, only one source file is allowed and cp makes a new file called destination that has

Sikkim Manipal University

Page No.: 39

UNIX Operating Systems

Unit 2

the same contents as the source file. If the destination is a directory, many source files can be specified, each of which will be copied into the destination directory. Section 2.6 will discuss efficient specification of source files using wildcard characters. To copy entire directories (including their contents), use a recursive copy: $ cp -rd source-directories destination-directory

mv (move/rename) mv is used to rename files/directories and/or move them from one directory into another. Exactly one source and one destination must be specified: $ mv source destination If destination is an existing directory, the new name for source (whether it be a file or a directory) will be destination/source. If source and destination are both files, source is renamed destination. If destination is an existing file it will be destroyed and overwritten by source (you can use the -i option if you would like to be asked for confirmation before a file is overwritten in this way).

rm (remove/delete) $ rm target-file(s) removes the specified files. Unlike other operating systems, it is almost impossible to recover a deleted file unless you have a backup (there is no recycle bin!) so use this command with care. If you would like to be asked before files are deleted, use the -i option: $ rm -i myfile rm: remove 'myfile'?

Sikkim Manipal University

Page No.: 40

UNIX Operating Systems

Unit 2

rm can also be used to delete directories (along with all of their contents, including any subdirectories they contain). To do this, use the -r option. To avoid rm from asking any questions or giving errors (e.g. if the file doesn't exist) you used the -f (force) option. Extreme care needs to be taken when using this option - consider what would happen if a system administrator was trying to delete user will's home directory and accidentally typed: $ rm -rf / home/will (instead of rm -rf /home/will).

cat (catenate/type) $ cat target-file(s) displays the contents of target-file(s) on the screen, one after the other. You can also use it to create files from keyboard input as follows (> is the output redirection operator, which will be discussed in the next chapter): $ cat > hello.txt hello world! [ctrl-d] $ ls hello.txt hello.txt $ cat hello.txt hello world! $

more and less (catenate with pause) $ more target-file(s) displays the contents of target-file(s) on the screen, pausing at the end of each screenful and asking the user to press a key (useful for long

Sikkim Manipal University

Page No.: 41

UNIX Operating Systems

Unit 2

files). It also incorporates a searching facility (press '/' and then type a phrase that you want to look for). You can also use more to break up the output of commands that produce more than one screenful of output as follows (| is the pipe operator, which will be discussed in the next chapter): $ ls -l | more less is just like more, except that has a few extra features (such as allowing users to scroll backwards and forwards through the displayed file). less not a standard utility, however and may not be present on all UNIX systems.

2.5 Making Hard and Soft (Symbolic) Links


Direct (hard) and indirect (soft or symbolic) links from one file or directory to another can be created using the ln command. $ ln filename linkname creates another directory entry for filename called linkname (i.e. linkname is a hard link). Both directory entries appear identical (and both now have a link count of 2). If either filename or linkname is modified, the change will be reflected in the other file (since they are in fact just two different directory entries pointing to the same file). $ ln -s filename linkname creates a shortcut called linkname (i.e. linkname is a soft link). The shortcut appears as an entry with a special type ('l'): $ ln -s hello.txt bye.txt $ ls -l bye.txt

Sikkim Manipal University

Page No.: 42

UNIX Operating Systems

Unit 2

lrwxrwxrwx 1 will finance 13 bye.txt -> hello.txt $ The link count of the source file remains unaffected. Notice that the permission bits on a symbolic link are not used (always appearing as rwxrwxrwx). Instead the permissions on the link are determined by the permissions on the target (hello.txt in this case). Note that you can create a symbolic link to a file that doesn't exist, but not a hard link. Another difference between the two is that you can create symbolic links across different physical disk devices or partitions, but hard links are restricted to the same disk partition. Finally, most current UNIX implementations do not allow hard links to point to directories.

2.6 Specifying multiple filenames


Multiple filenames can be specified using special pattern-matching characters. The rules are:

'?' matches any single character in that position in the filename. '*' matches zero or more characters in the filename. A '*' on its own will match all files. '*.*' matches all files with containing a '.'.

Characters enclosed in square brackets ('[' and ']') will match any filename that has one of those characters in that position.

A list of comma separated strings enclosed in curly braces ("{" and "}") will be expanded as a Cartesian product with the surrounding characters.

For example: 1. ??? matches all three-character filenames. 2. ?ell? matches any five-character filenames with 'ell' in the middle. 3. he* matches any filename beginning with 'he'.

Sikkim Manipal University

Page No.: 43

UNIX Operating Systems

Unit 2

4. [m-z]*[a-l] matches any filename that begins with a letter from 'm' to 'z' and ends in a letter from 'a' to 'l'. 5. {/usr,}{/bin,/lib}/file expands to /usr/bin/file /usr/lib/file /bin/file and /lib/file. Note that the UNIX shell performs these expansions (including any filename matching) on a command's arguments before the command is executed.

2.7 Quotes
As we have seen certain special characters (e.g. '*', '-','{' etc.) are interpreted in a special way by the shell. In order to pass arguments that use these characters to commands directly (i.e. without filename expansion etc.), we need to use special quoting characters. There are three levels of quoting that you can try: 1. Try insert a '\' in front of the special character. 2. Use double quotes (") around arguments to prevent most expansions. 3. Use single forward quotes (') around arguments to prevent all expansions. There is a fourth type of quoting in UNIX. Single backward quotes (`) are used to pass the output of some command as an input argument to another. For example: 1. $ hostname rose 2. $ echo this machine is called `hostname` this machine is called rose

Sikkim Manipal University

Page No.: 44

UNIX Operating Systems

Unit 2

2.8 File and Directory Permissions


Permission Read Write File User can look at the contents of the file User can modify the contents of the file User can use the filename as a UNIX command Directory User can list the files in the directory User can create new files and remove existing files in the directory User can change into the directory, but cannot list the files unless (s)he has read permission. User can read files if (s)he has read permission on them.

Execute

3.1: Interpretation of permissions for files and directories

As we have seen in the previous chapter, every file or directory on a UNIX system has three types of permissions, describing what operations can be performed on it by various categories of users. The permissions are read (r), write (w) and execute (x), and the three categories of users are user/owner (u), group (g) and others (o). Because files and directories are different entities, the interpretation of the permissions assigned to each differs slightly, as shown in Fig 3.1. File and directory permissions can only be modified by their owners, or by the superuser (root), by using the chmod system utility.

chmod (change [file or directory] mode) $ chmod options files chmod accepts options in two forms. Firstly, permissions may be specified as a sequence of 3 octal digits (octal is like decimal except that the digit range is 0 to 7 instead of 0 to 9). Each octal digit represents the access permissions for the user/owner, group and others respectively. The mappings of permissions onto their corresponding octal digits is as follows:

Sikkim Manipal University

Page No.: 45

UNIX Operating Systems

Unit 2

----x -w-wx r-r-x rwRwx For example the command: $ chmod 600 private.txt

0 1 2 3 4 5 6 7

sets the permissions on private.txt to rw------- (i.e. only the owner can read and write to the file). Permissions may be specified symbolically, using the symbols u (user), g (group), o (other), a (all), r (read), w (write), x (execute), + (add permission), - (take away permission) and = (assign permission). For example, the command: $ chmod ug=rw,o-rw,a-x *.txt sets the permissions on all files ending in *.txt to rw-rw---- (i.e. the owner and users in the file's group can read and write to the file, while the general public do not have any sort of access). chmod also supports a -R option which can be used to recursively modify file permissions, e.g. $ chmod -R go+r play will grant group and other read rights to the directory play and all of the files and directories within play.

Sikkim Manipal University

Page No.: 46

UNIX Operating Systems

Unit 2

chgrp (change group) $ chgrp group files can be used to change the group that a file or directory belongs to. It also supports a -R option.

2.9 Inspecting File Content


Besides cat there are several other useful utilities for investigating the contents of files:

file filename(s) file analyzes a file's contents for you and reports a high-level description of what type of file it appears to be: $ file myprog.c letter.txt webpage.html myprog.c: letter.txt: C program text English text

webpage.html: HTML document text file can identify a wide range of files but sometimes gets understandably confused (e.g. when trying to automatically detect the difference between C++ and Java code).

head, tail filename head and tail display the first and last few lines in a file respectively. You can specify the number of lines as an option, e.g. $ tail -20 messages.txt $ head -5 messages.txt tail includes a useful -f option that can be used to continuously monitor the last few lines of a (possibly changing) file. This can be used to monitor log files, for example: $ tail -f /var/log/messages continuously outputs the latest additions to the system log file.

Sikkim Manipal University

Page No.: 47

UNIX Operating Systems

Unit 2

objdump options binaryfile objdump can be used to disassemble binary files - that is it can show the machine language instructions which make up compiled application programs and system utilities.

od options filename (octal dump) od can be used to displays the contents of a binary or text file in a variety of formats, e.g. $ cat hello.txt hello world $ od -c hello.txt 0000000 h e l l o 0000014 $ od -x hello.txt 0000000 6865 6c6c 6f20 776f 726c 640a 0000014 There are also several other useful content inspectors that are nonstandard (in terms of availability on UNIX systems) but are nevertheless in widespread use. They are summarised in Fig. 3.2. File type Portable Format Document Typical extension .pdf .ps .dvi .jpg .gif .mpg .wav .html Content viewer acroread ghostview xdvi xv xv mpeg_play realplayer netscape w o r l d \n

Postscript Document DVI Document JPEG Image GIF Image MPEG movie WAV sound file HTML document

Fig 3.2: Other file types and appropriate content viewers. Sikkim Manipal University Page No.: 48

UNIX Operating Systems

Unit 2

2.10 Finding Files


There are at least three ways to find files when you don't know their exact location:

find If you have a rough idea of the directory tree the file might be in (or even if you don't and you're prepared to wait a while) you can use find: $ find directory -name targetfile -print find will look for a file called targetfile in any part of the directory tree rooted at directory. targetfile can include wildcard characters. For example: $ find /home -name "*.txt" -print 2>/dev/null will search all user directories for any file ending in ".txt" and output any matching files (with a full absolute or relative path). Here the quotes (") are necessary to avoid filename expansion, while the 2>/dev/null suppresses error messages (arising from errors such as not being able to read the contents of directories for which the user does not have the right permissions). find can in fact do a lot more than just find files by name. It can find files by type (e.g. -type f for files, -type d for directories), by permissions (e.g. -perm o=r for all files and directories that can be read by others), by size (-size) etc. You can also execute commands on the files you find. For example, $ find . -name "*.txt" -exec wc -l '{}' ';' counts the number of lines in every text file in and below the current directory. The '{}' is replaced by the name of each file found and the ';' ends the -exec clause.

Sikkim Manipal University

Page No.: 49

UNIX Operating Systems

Unit 2

For more information about find and its abilities, use man find and/or info find.

which (sometimes also called whence) command If you can execute an application program or system utility by typing its name at the shell prompt, you can use which to find out where it is stored on disk. For example: $ which ls /bin/ls

locate string find can take a long time to execute if you are searching a large filespace (e.g. searching from / downwards). The locate command provides a much faster way of locating all files whose names match a particular search string. For example: $ locate ".txt" will find all filenames in the filesystem that contain ".txt" anywhere in their full paths. One disadvantage of locate is it stores all filenames on the system in an index that is usually updated only once a day. This means locate will not find files that have been created very recently. It may also report filenames as being present even though the file has just been deleted. Unlike find, locate cannot track down files on the basis of their permissions, size and so on.

2.11 Finding Text in Files

grep (General Regular Expression Print) $ grep options pattern files

Sikkim Manipal University

Page No.: 50

UNIX Operating Systems

Unit 2

grep searches the named files (or standard input if no files are named) for lines that match a given pattern. The default behaviour of grep is to print out the matching lines. For example: $ grep hello *.txt searches all text files in the current directory for lines containing "hello". Some of the more useful options that grep provides are:

-c (print a count of the number of lines that match), -i (ignore case), -v (print out the lines that don't match the pattern) and -n (printout the line number before printing the matching line). So $ grep -vi hello *.txt searches all text files in the current directory for lines that do not contain any form of the word hello (e.g. Hello, HELLO, or hELlO). If you want to search all files in an entire directory tree for a particular pattern, you can combine grep with find using backward single quotes to pass the output from find into grep. So $ grep hello `find . -name "*.txt" -print` will search all text files in the directory tree rooted at the current directory for lines containing the word "hello". The patterns that grep uses are actually a special type of pattern known as regular expressions. Just like arithemetic expressions, regular expressions are made up of basic subexpressions combined by operators. The most fundamental expression is a regular expression that matches a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any other character with special meaning may be quoted by preceding it with a backslash (\). A
Sikkim Manipal University Page No.: 51

UNIX Operating Systems

Unit 2

list of characters enclosed by '[' and ']' matches any single character in that list; if the first character of the list is the caret `^', then it matches any character not in the list. A range of characters can be specified using a dash (-) between the first and last items in the list. So [0-9] matches any digit and [^a-z] matches any character that is not a digit. The caret `^' and the dollar sign `$' are special characters that match the beginning and end of a line respectively. The dot '.' matches any character. So $ grep ^..[l-z]$ hello.txt matches any line in hello.txt that contains a three character sequence that ends with a lowercase letter from l to z. egrep (extended grep) is a variant of grep that supports more sophisticated regular expressions. Here two regular expressions may be joined by the operator `|'; the resulting regular expression matches any string matching either subexpression. Brackets '(' and ')' may be used for grouping regular expressions. In addition, a regular expression may be followed by one of several repetition operators: `?' means the preceding item is optional (matched at most once). `*' means the preceding item will be matched zero or more times. `+' means the preceding item will be matched one or more times. `{N}' means the preceding item is matched exactly N times.

`{N,}' means the preceding item is matched N or more times. `{N,M}' means the preceding item is matched at least N times, but not more than M times. For example, if egrep was given the regular expression '(^[0-9]{1,5}[a-zA-Z ]+$)|none'

Sikkim Manipal University

Page No.: 52

UNIX Operating Systems

Unit 2

it would match any line that either:


o

begins with a number up to five digits long, followed by a sequence of one or more letters or spaces, or

contains the word none

You can read more about regular expressions on the grep and egrep manual pages. Note that UNIX systems also usually support another grep variant called fgrep (fixed grep) which simply looks for a fixed string inside a file (but this facility is largely redundant). The following table lists out the various options available with grep along with the meaning of each to serve as a quick reference for you.
Option -c -i -l -n -s -y Meaning Returns only the number of matches, without quoting the text. Ignores case while searching. Returns only filenames containing a match, without quoting the text. Returns line number of matched text, as well as the text itself. Suppresses error messages. Returns lines that do not match the text. Figure 4.3

2.12 Sorting files


There are two facilities that are useful for sorting files in UNIX: wc A simple and useful command, it counts the number of lines, words and characters in the specified file or files. It comes with the options -1, -w and
Sikkim Manipal University Page No.: 53

UNIX Operating Systems

Unit 2

c which allow the user to obtain the number of lines, words or characters individually or in any desired combination. $ wc -Ic file 1 file2 file1 20 571 file2 30 804 Thus, the file file1 constitutes 20 lines and 571 characters. Similarly for the file file2. The wc command is capable of accepting input directly form the keyboard. By entering wc without any arguments, it waits for the user to type in the input. On terminating input (using Ctrl d), the appropriate counts are displayed for the input that you supplied. sort As the name suggests the sort command can be used for sorting the contents of a file. Apart from sorting files, sort has another trick up its sleeve. It can merge multiple sorted files and store the result in the specified output file. While sorting the sort command bases its comparisons on the first character in each line in the file. If the first character of two lines is same then the second character in each line is compared and so on. That's quite logical. To put it in more technical terms, the sorting is done according to the ASCII collating sequence. That is, it sorts the spaces and the tabs first, then the punctuation marks followed by numbers, uppercase letters and lowercase letters in that order. The simplest form of sort command would be: $ sort myfile This would sort the contents of myfile and display the sorted output on the screen. .

Sikkim Manipal University

Page No.: 54

UNIX Operating Systems

Unit 2

If we want we can sort the contents of several files atone shot as in: $ sort file1 file2 file3 Instead of displaying the sorted output on the screen we can store it in a file by saying, $ sort -oresult file1 file2 file3 The above command sorts the three files file1, file2 & file3 and saves the result in a file called result. And if there are repeated lines in each of these files and we want that such lines should occur only once in the output we can ensure that too using the u option which outputs only unique lines. $ sort -u -0 result file1 file2 file3 If the files have already been sorted and we just want to merge them we can use: $ sort -m file1 file2 Sometimes we may want to combine the contents of a file with the input from the keyboard and then carry out the sorting. This can be achieved by saying: $ sort - file1 where - stands for the standard input i.e. the keyboard. We can even sort only the input from standard input by just saying, $ sort Since no file has been specified here it is assumed that the input is to come from the standard input device.

Sikkim Manipal University

Page No.: 55

UNIX Operating Systems

Unit 2

That is only part of the capability of sort.sort is used most fruitfully for files which are essentially databases, or which have its information organised in fields. Fields are a group of characters separated by a predetermined delimiter, or a new line. In most cases, the delimiter is a space or a tab, separating different chunks of information. Now that we know what fields are, we can specify the fields to be used for sorting. Such fields are often known as sort keys. The syntax of the sort command includes optional +posl and -pos2, signifying the starting and ending position of the sort key. If -pos2 is not included, then the key is assumed to extend till the end of the line. Assume that file students has four fields, for roll number, names of the students, their marks, and their grades. These fields would be numbered 0, 1,2 and 3. $ sort -r + 1 -2 students Saying thus would sort the file students on the field containing the names of the students. The + 1 indicates that the sort key begins at the second field and the -2 indicates that it ends before the third field. This yields mimes of the students as the sort key. We can even have multiple sort keys using this feature of the sort command. The -r switch indicates a reverse sort. So, the records arranged in reverse alphabetical order of names would be displayed on the screen. If we want to sort the same file according to marks we must use the -n option which specifies that the sorting is to be done on a numeric field. If not specified then the marks 100, 40, 50, 10 would be incorrectly sorted in the order 1O, 100,40,50.

Sikkim Manipal University

Page No.: 56

UNIX Operating Systems

Unit 2

The following figure summarises all the options that we have used with sort along with a few more useful ones that you may try on your own.
Option -b -c -d -f -m -n -ofile -r -tc -u +n[-m] Meaning Ignores leading spaces and tabs. Checks if files are already sorted. If they are, sort does nothing Sorts in dictionary order ( ignores punctuation). Ignores case. Merges files that have already been sorted. Sorts III numeric order. Stores output in file. The default is to send output to standard output. Reverses sort. Separates fields with character ( default is tab ). Unique output: if merge creates identical lines, uses only the first. Skips n fields before sorting and then sorts through field m Figure 4.2

cut Like sort, cut is also a filter. it cuts or picks up a given number of character or fields from the specified file to view only a few selected fields, for instance name and division, cut is the answer. $ cut -f 2,7 empinfo to view fields 2 through 7 $ cut -f 2-7 empinfo The cut command assumes that the fields are separated by tab character. If the fields are delimited by some character other than the default tab character, cut supports an option
Sikkim Manipal University Page No.: 57

UNIX Operating Systems

Unit 2

-d which allows us to set the delimiter. The file empinfo may have the information for each employee stored in the following format name: age: address: city: pin: division Each piece of information is separated by a colon, hence we require the field delimiter to be recognised as : The command for listing the name and division fields would now be $ cut -f 2,7 d : empinfo The cut command can also cut specified columns from a file and display them on the standard output. The switch used for this purpose is c. For example, $ cut -c1-15 empinfo As a result, the first 15 columns from each line in the file empinfo would be displayed.

uniq filename uniq removes duplicate adjacent lines from a file. This facility is most useful when combined with sort: $ sort input.txt | uniq > output.txt

2.13 Advanced Text File Processing

sed (stream editor) sed allows you to perform basic text transformations on an input stream (i.e. a file or input from a pipeline). For example, you can delete lines containing particular string of text, or you can substitute one pattern for another wherever it occurs in a file. Although sed is a mini-programming language all on its own and can execute entire scripts, its full language is obscure and probably best forgotten (being based on the old and

Sikkim Manipal University

Page No.: 58

UNIX Operating Systems

Unit 2

esoteric UNIX line editor ed). sed is probably at its most useful when used directly from the command line with simple parameters: $ sed "s/pattern1/pattern2/" inputfile > outputfile (substitutes pattern2 for pattern1 once per line) $ sed "s/pattern1/pattern2/g" inputfile > outputfile (substitutes pattern2 for pattern1 for every pattern1 per line) $ sed "/pattern1/d" inputfile > outputfile (deletes all lines containing pattern1) $ sed "y/string1/string2/" inputfile > outputfile (substitutes characters in string2 for those in string1)

awk (Aho, Weinberger and Kernigan) awk is useful for manipulating files that contain columns of data on a line by line basis. Like sed, you can either pass awk statements directly on the command line, or you can write a script file and let awk read the commands from the script. Say we have a file of cricket scores called cricket.dat containing columns for player number, name, runs and the way in which they were dismissed: 1 atherton 2 hussain 3 stewart 4 thorpe 5 gough 0 bowled 20 caught 47 stumped 33 lbw 6 run-out

To print out only the first and third columns we can say: $ awk '{ print $1 " " $3 }' cricket.dat atherton 0

Sikkim Manipal University

Page No.: 59

UNIX Operating Systems

Unit 2

hussain 20 stewart 47 thorpe 33 gough 6 $ Here $n stands for the nth field or column of each line in the data file. $0 can be used to denote the whole line. We can do much more with awk. For example, we can write a script cricket.awk to calculate the team's batting average and to check if Mike Atherton got another duck: $ cat > cricket.awk BEGIN { players = 0; runs = 0 } { players++; runs +=$3 } /atherton/ { if (runs==0) print "atherton duck!" } END { print "the batting average is " runs/players } (ctrl-d) $ awk -f cricket.awk cricket.dat atherton duck! the batting average is 21.2 $ The BEGIN clause is executed once at the start of the script, the main clause once for every line, the /atherton/ clause only if the word atherton occurs in the line and the END clause once at the end of the script. awk can do a lot more. See the manual pages for details (type man awk).

Sikkim Manipal University

Page No.: 60

UNIX Operating Systems

Unit 2

2.14 File Compression and Backup


UNIX systems usually support a number of utilities for backing up and compressing files. The most useful are:

tar (tape archiver) tar backs up entire directories and files onto a tape device or (more commonly) into a single disk file known as an archive. An archive is a file that contains other files plus information about them, such as their filename, owner, timestamps, and access permissions. tar does not perform any compression by default. To create a disk file tar archive, use $ tar -cvf archivenamefilenames where archivename will usually have a .tar extension. Here the c option means create, v means verbose (output filenames as they are archived), and f means file.To list the contents of a tar archive, use $ tar -tvf archivename To restore files from a tar archive, use $ tar -xvf archivename

cpio cpio is another facility for creating and reading archives. Unlike tar, cpio doesn't automatically archive the contents of directories, so it's common to combine cpio with find when creating an archive: $ find . -print -depth | cpio -ov -Htar > archivename This will take all the files in the current directory and the directories below and place them in an archive called archivename.The -depth option controls the order in which the filenames are produced and is recommended to prevent problems with directory permissions when

Sikkim Manipal University

Page No.: 61

UNIX Operating Systems

Unit 2

doing a restore.The -o option creates the archive, the -v option prints the names of the files archived as they are added and the -H option specifies an archive format type (in this case it creates a tar archive). Another common archive type is crc, a portable format with a checksum for error control. To list the contents of a cpio archive, use $ cpio -tv < archivename To restore files, use: $ cpio -idv < archivename Here the -d option will create directories as necessary. To force cpio to extract files on top of files of the same name that already exist (and have the same or later modification time), use the -u option.

compress, gzip compress and gzip are utilities for compressing and decompressing individual files (which may be or may not be archive files). To compress files, use: $ compress filename or $ gzip filename In each case, filename will be deleted and replaced by a compressed file called filename.Z or filename.gz. To reverse the compression process, use: $ compress -d filename or $ gzip -d filename

Sikkim Manipal University

Page No.: 62

UNIX Operating Systems

Unit 2

2.15 Handling Removable Media (e.g. floppy disks)


UNIX supports tools for accessing removable media such as CDROMs and floppy disks.

mount, umount The mount command serves to attach the filesystem found on some device to the filesystem tree. Conversely, the umount command will detach it again (it is very important to remember to do this when removing the floppy or CDROM). The file /etc/fstab contains a list of devices and the points at which they will be attached to the main filesystem: $ cat /etc/fstab /dev/fd0 /mnt/floppy auto rw,user,noauto 0 0 /dev/hdc /mnt/cdrom iso9660 ro,user,noauto 0 0 In this case, the mount point for the floppy drive is /mnt/floppy and the mount point for the CDROM is /mnt/cdrom. To access a floppy we can use: $ mount /mnt/floppy $ cd /mnt/floppy $ ls (etc...) To force all changed data to be written back to the floppy and to detach the floppy disk from the filesystem, we use: $ umount /mnt/floppy

mtools If they are installed, the (non-standard) mtools utilities provide a convenient way of accessing DOS-formatted floppies without having to mount and unmount filesystems. You can use DOS-type commands like "mdir a:", "mcopy a:*.* .", "mformat a:", etc. (see the mtools manual pages for more details)

Sikkim Manipal University

Page No.: 63

You might also like