You are on page 1of 46

File Handling

Sanjay Chaudhary DA-IICT http://intranet.daiict.ac.in/~sanjay

Reference
1. Unix System Programming, Keith Haviland, Dina Gray and Ben Salama, Addison-Wesley

UNIX file access primitives


A set of system call that give direct access to the I/O facilities provided by the UNIX kernel. Typical UNIX program will
call open or creat to initialize a file read, write, and lseek to manipulate data within that file close a file unlink or remove to remove file completely

UNIX primitives
Name
open creat

Meaning
Opens a file for reading or writing or creates an empty file Creates an empty file

close
read write lseek unlink remove fcntl

Closes a previously opened file


Extracts information from a file Places information into a file Moves to a specified byte in a file Removes a file Alternative method to remove a file Controls attributes associated with a file

#include <fcntl.h> #include <unistd.h> int main(void) { int fd; ssize_t nread; char buf[1024]; /*open file for reading */ fd = open("marks.dat", O_RDONLY); /* read the data */ nread = read(fd, buf, 1024); /*close the file */ close(fd);

UNIX file access primitives


O_RDONLY is an integer constant defined in the header file <fcnlt.h> To open a file in read only mode The return value from the open call, is placed into integer variable fd and contains file descriptor. File descriptor is non-negative integer, whose value is determined by the system If open fails, it will return -1 fd identifies the open file and is passed as a parameter to the other file primitives, such as read, write, lseek, and close.

UNIX file access primitives


nread = read(fd, buf, 1024); To read 1024 characters, from the file identified by fd and place into the character array buf. nread gives the number of characters actually read Like open, if something goes wrong, read will return -1. nread is the type ssize_t as defined in <sys/types.h>. Why this header file is not included? Some basic types, such as ssize_t are also defined in <unistd.h> close(fd) to inform system that the program has finished with the file associated with fd.

open system call


#include <sys/types.h> #include <sys/stat.h> #include <fcnlt.h> int open(const char *pathname, int flags, [mode_t mode];

Value of a pathname can be absolute pathname, such as /home/sanjay/sample.dat It can be relative pathname which is in relation to pwd. flags: of integer type, specifies the access method, constants defined in <fcnlt.h> (file control)

Sample programs
createfile.c readfile.c readfileeof.c readfileeof02.c copyfile01.c createmarksfile.c appendfile.c printmarksst.c

open system call


O_RDONLY: to open a file for reading only O_WRONLY: to open file for writing only O_RDWR: to open for both reading and writing In case of error open will return -1 To create a new file, use open call with the flags set to O_CREAT Third optional parameter, mode, is used only with O_CREAT flag. It is concerned with the file security permissions.

#include <unistd.h> int main(void) { int fd; ssize_t nread; char buf[1024]; char *filename="marks.dat"; /*open file for reading */ fd = open(filename, O_RDONLY); if (fd == -1) { printf("Coundn't open %s\n", filename); exit(1); } /* read the data */ nread = read(fd, buf, 1024); /*Processing on data*/ /*close the file */ close(fd);

exit(0);
}

Creating a file with open


fd = open(filename, O_RDWR | O_CREAT, 0644); if (fd == -1) { printf("Coundn't open %s\n", filename); exit(1); }

Mode contains a number which specifies the file access permissions


#define PERMS 0644 fd = open(filename, O_RDWR | O_CREAT, PERMS);

What happens if filename already exists? If access permissions allow, file will be opened for writing as if O_CREAT was not specified. Use O_CREAT flag with O_EXCL flag, which will cause an open to fail, if the file already exists. fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0644);

Creating a file with open


Another useful flag is O_TRUNC. Used with O_CREAT and will force file to be truncated to zero bytes, if it exists and its access permissions allow fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);

The creat system call


To create a file #include <sys/types.h> #include <sys/stat.h> #include <fcnlt.h> int creat(const char *pathname, mode t_mode) fd = creat(filename, 0644); It is equivalent to:
fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);

The close system call


#include <unistd.h> int close(int filedes);

All open files are automatically closed when a program completes execution.

The read system call


To copy an arbitrary number of characters or bytes from a file into a buffer Buffer is normally declared as a pointer to void, it may hold items of any type. Although buffer is normally an array of char, it can also easily be an array of user-defined structs
#include <unistd.h> ssize_t read(int filedes, void *buffer, size_t n);

The read system call


int fd; ssize_t nread; char buf[1024]; /*open file for reading */ fd = open("marks.dat", O_RDONLY);
/* read the data */ nread = read(fd, buf, 1024);

You can replace 1024 by variable like SOMEVALUE nread gives the number of characters actually read Like open, if something goes wrong, read will return -1.

The read-write pointer


int fd; ssize_t n1, n2; char buf1[512], buf2[512]; . . . If ((fd=open(filename, O_RDONLY)) == -1) return (-1);
n1 = read(fd, buf1, 512); n2 = read(fd, buf2, 512);

The read-write pointer


The system keeps track of processs position in a file with an entity called read-write pointer, sometimes referred to as the file pointer The programmer should be able to detect end of a file The return value from read is important If the number of characters requested in a read call is greater than he number of characters left in a file, the system will transfer only the characters remaining Any further call to read will return a value 0. Checking for a return value of 0 from read is, in fact the normal way of testing for end of file within a program.

The write system call


#include <unistd.h> ssize_t write(int filedes, const void *buffer, size_t n); filedes is a file descriptor buffer is a pointer to the data to be written n is a positive integer giving the number of bytes to be written. The value returned is either the number of characters write managed to output, or the error code -1. If is not -1, then the return value will almost always be equal to n. If n is any less then something has gone badly wrong.

The write system call


int fd; ssize_t w1, w2; char buf1[512], buf2[512]; if ((fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0644)) == -1) return (-1); w1 = write(fd, buf1, 512); w2 = write(fd, buf1, 1024);

What if a program opens an existing file for writing and then immediately write to that file? Old data will be overwritten File can be opend with O_APPEND flag.
fd = open(filename, O_WRONLY | O_APPEND);

read, write and efficiency


Reading and writing one byte at a time gives poor performance. Performance improves when buffer size is inclreased. Best performance: when BUFSIZE is a multiple of the systems natural disk blocking factor To improve the performance:
Reduce number of system calls: switching mode between program and kernel when a system call is made can be relatively expensive. Ideally minimize system calls

lseek and random access


It enables uses to change the position of the read-write pointer. It enables random access into a file
#include <sys/types.h> #include <unistd.h> off_t lseek(int filedes, off_t offset, int start_flag);

Second parameter, offset, actually determines the new position of the read-write pointer. It gives number of bytes to add to a starting position. The starting position will be determined by the third argument, the integer start_flag. It specifies where in the file the offset is to be measured from.

lseek and random access


start_flag can take: SEEK_SET: the offset is measured from the beginning of the file, usually integer value = 0 SEEK_CUR: the offset is measured from the current position of the file pointer, usual value = 1 SEEK_END, the offset is measured from the end of the file, usual value 2
off_t newpos; newpos = lseek(fd, (off_t)-16, SEEk_END);

It will give a position 16 bytes before the end of the file

SEEk_SET
A B C

Current position of file pointer

D
E F

SEEk_CUR

G
h

SEEk_END

lseek and random access


In all cases the return value will give the new position in the file. In case of error, it will contain the usual error code of -1. Newpos and offset are of type off_t as defined in <sys/types.h> and will be a type large enough to cope with movement through any file on the system. Offset can be negative and possible to move backwards from the staring point indicated by start_flag

lseek and random access


Possible to specify a position beyond the end of a file, a subsequent write is meaningful and the file will be extended. Any empty space between the old end of file and the starting position of the new data will be filled with ASCII null character.
filedes=open(filename, O_RDWR); lseek(filedes, (off_t)0, SEEK_END); write(filedes, outbuf, OBSIZE);

lseek and random access


The following statements will find out the size of a file, since lseek returns the number the new position in the file.

off_t filesize; int filedes; filesize=lseek(filedes, (off_t)0, SEEK_END); Example: getoccupier.c

Appending data to a file


To write data at the end of a file:
lseek(filedes, (off_t)0, SEEK_END); write(filedes, appbuf, BUFSIZE);

Use of O_APPEND
filedes=open(filename, O_WRONLY | O_APPEND); write(filename, appbuf, BUFSIZE);

Deleting a file
#include <unistd.h> #int unlink(const char *pathname); #include <stdio.h> int remove(const char *pathname); unlink(filename); remove(filename);

The fnctl system call


To provide a degree of control over already-open files.
#include <sys/types.h> #include <unistd.h> #include <stdio.h> int fcntl(int filedes, int cmd, );

It acts on the open file identified by the file descriptor Programmer selects a particular function by choosing a value for the integer cmd parameter from the header file <fcntl.h>

The fnctl system call


The type of third parameter is an integer, or, if fcntl is used to lock a file then the third parameter is a pointer to a struct flock. In some cases third parameter is not used at all. cmd values: F_GETFL and F_SETFL F_GETFL instructs fcntl to return the current file status flags as set by open. Example: filestatus.c It uses fcntl to display the current state of an open file

The fnctl system call


We test whether a particular bit is set in the file status flags held in arg1 by using the bitwise AND operator, denoted by the single & symbol. The bit is tested against O_ACCMODE, a mask defined in <fcntl.h> F_SETFL is used to reset the file status flags associated with a file. The new flags are given in the third argument for fcntl. Only certain flags can be set in this way, e.g. cant open file for R & W, if it was opened in R mode

The fnctl system call


if (fcntl(filedes, F_SETFL, O_APPEND) == -1) printf(fcntl error \n);

The standard I/O Library


Difference between standard I/O and the system call primitives: instead of integer file descriptors, the standard I/O routines work, implicitly or explicitly, with a structure called FILE, e.g.
FILE *stream; if ((stream=fopen(filename, r)) == NULL) { } int getc(FILE *istream); int putc(int c, FILE *ostream);

The standard I/O Library


getc and putc process single characters. Standard I/O avoids this inefficiency by buffering mechanism First call to get results in BUFSIZ characters being read from the file via the system call read. Data is kept in a buffer set up by the library. Only first character will be returned via getc

Writing error messages with fprintf


Standard I/O routines provide formatting and conversion utilities
fprintf(stderr, error number %d\n, errno);

The errno variable and system calls


UNIX provides globally accessible integer variable which contains an error code number. errno can be used in a C program by including the header file <errno.h> It is safest to use errno immediately after a system call has been made and has failed. The perror subroutine Standard way to report errors When called it will produce a message on standard wrror consisting of the string argument passed to the routine.
perror(error opening sample.dat file);

Permissions and file modes


Octal Value
0400 0200 0100 0040 0020 0010 0004 0002

Symbolic mode
S_IRUSR S_IWUSR S_IXUSR S_IRGRP S_IWGRP S_IXGRP S_IROTH S_IWOTH

Meaning
Read allowed by owner Write allowed by owner Owner can execute file Read allowed by group Write allowed by group Group member can execute file Other types of user can read file Other types of user can write file

0001

S_IXOTH

Other types of user can execute file

Extra permissions for executable files


There are three other types of file permissions, which specify special attributes and are usually only relevant when a file contains executable program

0400 S_ISUID Set user-id on execution 0200 S_ISGID Set group-id on execution 0100 S_ISVTX Save-text-image (sticky bit)

Extra permissions for executable files


If the S_ISUID permission is set, then when the program contained in the file is started, the system gives the resulting process an effective user-id taken from the file owner rather than that of the use who started the process. The process assumes the file system privileges of the file owner, not the use who started the process. Classical example: passwd program /etc/passwd is owned root user and for passwd program, S_ISUID bit is set.

Extra permissions for executable files


save-text-image (sticky bit): In earlier systems if the save-text-image bit was set on a file, then it was executed, its program-text part would remain in the systems swap area until the system was halted. Now this bit is redundant.

Determining file accessibility with access


access is a useful system call that determines whether or not a process can access a file, according to the real user-id of the process, rather than the current effective user-id.
#include <unistd.h> #include access(const char, *pathname, int amode);

amode contains a value indicating the required method of access: R_OK Has calling process read access? W_OK Has calling process write access? X_OK Can calling process execute the file?

Determining file accessibility with access


Char *filename=sample.dat; If (access(filename, R-OK) == -1 { fprint(stderr, User cannot read file %s\n, filename); exit(1); }

Other useful system calls


int chmod(const char *pathname, mode_t newmode);
int chown(const char *pathname, uid_t owner_id, gid_t group_id); int link(const char *original_path, const char *new_path);

int rename(const char *oldpathname, const char *newpathname);

Other useful system calls


int stat(const char *pathname, struct stat *buf);
buf will hold the information associated with the file after sucessful invocation struct stat s; int filedes, retval; filedes=open(sample.dat, O_RDWR); retval=(stat(sample.dat, &s);

Obtaining file information


stat and fstat system call enable a process to discover the values of the properties for an existing file. #include <sys/types.h> #include <sys/stat.h> int stat(const char *pathname, struct stat *buf); int fstat(int filedes, struct stat *buf);
Pathname to identify the file buf is a pointer to a stat structure, containing information associated with the file after a successful invocation. fstat expects a file descriptor, can be used only on an open file

You might also like