You are on page 1of 7

7.

File management
Objectives
1. Long-term storage of data
2. Allow creation and deletion of files – automatically management of secondary storage
3. Allow for file reference using symbolic names
4. Protect against unauthorised access (access control) – allow sharing of files when required
5. Protect files against system failure

Files
A file is a uniform logical unit of information created by a process. Also
has address space but mapping a mass storage unit instead of RAM.
Basically, a named collection of related information that Is recorded on
secondary storage. (e.g. the set of lines in a program, or the set of
words in a text document).
Used for storing large amounts of data in the long-term
Allows processes to access the data concurrently

Naming
Motivation: no need for user to use numerical addresses - can be accessed using user-friendly name
Different Oss enforce different file naming conventions, but most follow a common pattern.
Many OS, e.g. Windows/Unix support up to approximately 260 characters for names.
Restrictions as to the characters that can be user, e.g. “?” is invalid in Windows but valid in Unix.
Some Oss distinguish between upper and lower case. Windows is not case sensitive, while UNIX is.

Extensions can be useful to tell the user and OS what types of data the file contains.
In MS-DOS only 3 characters were allowed for extensions, in Unix the size is up to the user.

In Unix, extensions are not enforced by OS, but for example, C Compiler needs a “.C” extension to compile.

Gui-Based Oss usually attach meanings to extensions – tries to associate applications to file extensions (docx –
MS word).

Issue -> easy to trick and corrupt files by modifying extensions; MacOS uses a more sophisticated approach,
examines file and tries to work out its type by the “look” of its contents.

Files can be structured in 3 ways:


1. Byte Sequence
OS considers a file to be unstructured and it is up to the program that accesses the file to interpret the
byte sequence. (Windows/Unix)
2. Record sequence
A sequence of fixed-length structured records (collection of bytes) – useful for punch cards
3. Tree
Each record (variable-length) has a key field to be used for record retrieval. – Useful in database
systems.
File Access – refers to the way in which the files stored may be accessed
1. Sequential access
2. Random (direct) access

Sequential access is still interesting today because of locality principle. Random access is essential for most
applications.
Sequential access – read next/write next
Random access – Read n/write n (n = relative block number)

Most Oss support several types of files such us:


Regular – contain data
Executables – contain a program that can be executed
Directory – System file containing reference to other files
Character special – not true files but directory entries which refer to character devices – allow
communication with I/O devices, printers.
Block special – not true files but directory entries which refer to block devices – used for
communication with storage devices, e.g. disks, memory

Operating systems offer several system calls for file management:


Create - the file is created with no data. Some attributes are associated with the file
Delete – the file is deleted to free up disk space
Open – before using a file, a process must open it, fetch attributes into RAM
Close – when all the accesses are finished; attributes/disk addresses loaded are no longer needed
Read – data are read from file
Write – data are written to an existing file
Append – restricted form of write, adds data only at the end of file.
Seek – used for random access files, repositions the pointer to a specific location
Get attributes – retrieve the attributes of the file
Set attributes – some attributes can be set by the user (e.g. protection-mode)
Rename – change the name

Directories
Most filing systems allow files to be grouped together into
directories (or folders), resulting in a more logical organisation
• Allows operations to be performed in bulk on groups of files, 
e.g., copy files or
set one of their attributes 

• Allows different files to have the same filename as long as they are in different
directories 

• Each directory is managed via a special file, which contains: 

• a file descriptor table with descriptors for each file under that directory,
corresponding to specific entries on global file table 


Single-level directory systems:


One directory for all the files in the volume; used in early mainframes
Pros: simplicity, ability to quickly find files
Cons: naming and grouping problems (organisation)

Two-level:
Separate directory for each user
Letters indicate owners of the directories and files
Pros: can have the same file name for different users
Cons: limited grouping capability
Hierarchical directory systems:
Directories in a tree-like structure
Pros: grouping capabilities, can have same name for files in different directories
Requires a method to browse and locate

Two common methods:


1. Absolute path name /NTU/sysoftwar/demo.c
2. Relative to current directory, demo.c (., ..)

OS (depending on the OS) provides corresponding system calls for directories:


1. Create
2. Delete
3. Opendir
4. Closedir
5. Rename
6. Link – a technique that allows a file to appear in more than 1 directory
7. Unlink – if it is unlinked is only present to one directory normally
1. File Systems

A drive is divided into partitions/volumes, each holding an independent file system.


Many times, even if we are talking about other types of mass storage, for historical reasons we will say “drive”
or “disk” interchangeably.

Section 0 is master boot record (MBR), used to boot the computer via a boot block from a specified partition,
from which the OS is loaded.
The super block contains the info about the partition (e.g. the number of blocks)

File block allocation methods:


Objective: keep track of which files go to which block on physical drive
Different allocation schemes:
1. Contiguous allocation (CDs)
2. Linked list allocation – non-contiguous
3. Linked list with FAT (used in DOS/Windows)
4. I-nodes (used in Unix)

Contiguous allocation

Drives are split into blocks of fixed size; e.g. 1KB – a file of 50KB would be 60 blocks.
Contiguous blocks are assigned to each file.

Advantages:
1. Simple implementation, needs to store the first block address and its length
2. The performance of such an implementation is good
3. Allows easy random access
4. Resilient to drive faults: damage to a single block results in only localised loss of data.
Disadvantages:
1. Need to track size of the files when initially created
2. Files cannot grow
3. Fragmentation as files are deleted, holes may be generated.
Linked List Allocation
Files are stored as linked list of blocks. The first bytes of each block are used as a pointer. Each block points to
the next block and the final block contains a null pointer.

Advantages:
1. Every block can be used
2. File size does not have to be known beforehand
3. Files can grow
4. No external fragmentation
5. No internal fragmentation except for last block
Disadvantages:
1. Does not support random access – very slow
2. Some space is lost for useful data within each block due to pointer

Linked list with FAT


Used by older versions of MS-DOS/Windows. A major improvement of linked
list allocation. Stores the pointers of all blocks in a FAT in memory. (Different
versions FAT16, FAT32).
FAT must be in memory all time; To get a file we simply point to the location
in the FAT representing its first block.
Advantages:
1. Does not “waste” space in the block
2. Random access is much faster as FAT is in memory
3. No drive references are required
Disadvantages:
1. Fat may get to large, especially for physical drives with large
capacities
2. Damage to FAT can cause serious data loss, solution = backup

I-Nodes
Used in UNIX type Oss. Each file is associated with an i-node (index-node) listing
all the attributes and drive/disk addresses of the files blocks.
With the i-node it is possible to find all the blocks that correspond to the file.
Advantages:
1. Only the i-node of the file needs to be in memory
2. And only when the corresponding file is opened
Disadvantages:
1. What if a file grows beyond limits of i—node?
2. Last disk addresses must point to an address block instead of a data
block.

The i-node contains a number of direct pointers to disc blocks.


Typically there are 10 direct pointers.
In addition, there are three indirect pointers. These point to
further address blocks which eventually lead to a disk data block.
The first of these pointers is a single level of indirections,
the next is double indirect pointer, the third is a triple
indirect pointer.

To open a file, the path name is used to locate its directory entry.
The directory entry provides a mapping from a filename/file
descriptor to the disk blocks that contain the data.

The directory entry contains all the information needed to find the disk blocks for a given file…
Contiguous allocations – addresses of the entire file
Linked list allocations – first disk block address
i-node implementation the directory – i-node number.
It also allows access to files attributes.

DOS: a directory entry contains the attributes.

UNIX: a directory entry has an i-node number and filename


- All its attributes are stored in the i-node
- All i-nodes have a fixed location on the disk, so locating an i-node is simple and fast

Determining block size


- All allocation methods require to split the disk into fixed-sized blocks
- Nearly all modern systems use fixed-size blocks
- Similar trade off as within page size in memory management
o Small blocks size: a file may occupy several blocks -> longer access time since more blocks
have to be located -> increased overhead
o Large block size: a small file of 1kb will occupy a big chunk of the disk block -> space is wasted

Tracking free space: linked lists, bitmaps.


Journaling
A single file operation might involve multiple actual writes to the disk.
For example, in UNIX, removing a file:
1. Remove the file from its directory
2. Release the i-node to the pool of free i-nodes
3. Return all the disk blocks to the pool of free blocks

In windows analogous steps are required


Suppose that the first task is complete and the system crashes.
Result: i-node and file blocks will not be accessible from any file or available for reassignment (time consuming
repairs…)

A journaling file system uses a special disk area to make a log entry listing the actions to be completed
In the event of disk failure the log is used to bring back the disk into a consistent state and complete all pending
actions.
Log entries are erased once the operations complete successfully.

Backups – recover from disasaster, recover from mistakes


Issues: Process is slow, do an incremental backup (only to files that are changed)
They occupy a lot of space
Difficulty to perform while the system is active (system needs to stay inactive).
Must ensure physical security of backup media. (CDs, disks…)

Drives and mount points


In windows, multiple drives appear with distinct drive letters: C, D, E…
Attach a new drive and it gets a unique letter.
Unix style system presents a uniform file system with a single root, drives appear in /media/.

RAID – Redundant array of inexpensive discs


Disks remain the bottleneck in computer systems (except SSDs).
RAID – distributes data across several physical disks which look like a single logical disk.

Raid 0 (stripping) – distributes data across several disks in a way which gives improved speed and full capacity.
No security.
Raid 1 (mirroring) – uses more than one disk which store the same data. Degraded speed, not full capacity, but
secure.
Raid 1+0 – takes advantage of both.

You might also like