File Organization11

Introduction
File organization is the methodology which is applied to structured computer files. Files contain
computer records which can be documents or information which is stored in a certain way for
later retrieval. File organization refers primarily to the logical arrangement of data (which can
itself be organized in a system of records with correlation between the fields/columns) in a file
system. It should not be confused with the physical storage of the file in some types of storage
media. There are certain basic types of computer file, which can include files stored as blocks of
data and streams of data, where the information streams out of the file while it is being read until
the end of the file is encountered.
We will look at two components of file organization here:
1. The way the internal file structure is arranged and
2. The external file as it is presented to the O/S or program that calls it. Here we
will also examine the concept of file extensions.
We will examine various ways that files can be stored and organized. Files are presented to the
application as a stream of bytes and then an EOF (end of file) condition.
A program that uses a file needs to know the structure of the file and needs to interpret its
contents.
Internal File Structure
It is a high-level design decision to specify a system of file organization for a computer software
program or a computer system designed for a particular purpose. Performance is high on the list
of priorities for this design process, depending on how the file is being used. The design of the
file organization usually depends mainly on the system environment. For instance, factors such as
whether the file is going to be used for transaction-oriented processes like OLTP or Data
Warehousing, or whether the file is shared among various processes like those found in a typical
distributed system or standalone. It must also be asked whether the file is on a network and used
by a number of users and whether it may be accessed internally or remotely and how often it is
accessed.
However, all things considered the most important considerations might be:
1. Rapid access to a record or a number of records which are related to each other.
2. The Adding, modification, or deletion of records.
3. Efficiency of storage and retrieval of records.
4. Redundancy, being the method of ensuring data integrity.
A file should be organized in such a way that the records are always available for processing with
no delay. This should be done in line with the activity and volatility of the information.
1
Types of File Organization
Organizing a file depends on what kind of file it happens to be: a file in the simplest form can be
a text file, (in other words a file which is composed of ascii (American Standard Code for
Information Interchange) text.) Files can also be created as binary or executable types (containing
elements other than plain text.) Also, files are keyed with attributes which help determine their
use by the host operating system.
Techniques of File Organization
The three techniques of file organization are:
1. Heap (unordered)
2. Sorted
1. Sequential (SAM)
2. Line Sequential (LSAM)
3. Indexed Sequential (ISAM)
3. Hashed or Direct
In addition to the three techniques, there are four methods of organizing files. They are
sequential, line-sequential, indexed-sequential, inverted list and direct or hashed access
organization.
Sequential Organization
A sequential file contains records organized in the order they were entered. The order of the
records is fixed. The records are stored and sorted in physical, contiguous blocks within each
block the records are in sequence.
Records in these files can only be read or written sequentially.
Once stored in the file, the record cannot be made shorter, or longer, or deleted. However, the
record can be updated if the length does not change. (This is done by replacing the records by
creating a new file.) New records will always appear at the end of the file.
If the order of the records in a file is not important, sequential organization will suffice, no
matter how many records you may have. Sequential output is also useful for report printing or
sequential reads which some programs prefer to do.
Line-Sequential Organization
Line-sequential files are like sequential files, except that the records can contain only characters
as data. Line-sequential files are maintained by the native byte stream files of the operating
system.
In the COBOL environment, line-sequential files that are created with WRITE statements with
the ADVANCING phrase can be directed to a printer as well as to a disk.
Indexed-Sequential Organization (cylindrical surface indexing)
2
Key searches are improved by this system too. The single-level indexing structure is the simplest
one where a file, whose records are pairs, contains a key pointer. This pointer is the position in
the data file of the record with the given key. A subset of the records, which are evenly spaced
along the data file, is indexed, in order to mark intervals of data records.
This is how a key search is performed: the search key is compared with the index keys to find the
highest index key coming in front of the search key, while a linear search is performed from the
record that the index key points to, until the search key is matched or until the record pointed to
by the next index entry is reached. Regardless of double file access (index + data) required by this
sort of search, the access time reduction is significant compared with sequential file searches.
Let's examine, for sake of example, a simple linear search on a 1,000 record sequentially
organized file. An average of 500 key comparisons are needed (and this assumes the search keys
are uniformly distributed among the data keys). However, using an index evenly spaced with 100
entries, the total number of comparisons is reduced to 50 in the index file plus 50 in the data file:
a five to one reduction in the operations count!
Hierarchical extension of this scheme is possible since an index is a sequential file in itself,
capable of indexing in turn by another second-level index, and so forth and so on. And the exploit
of the hierarchical decomposition of the searches more and more, to decrease the access time will
pay increasing dividends in the reduction of processing time. There is however a point when this
advantage starts to be reduced by the increased cost of storage and this in turn will increase the
index access time.
Hardware for Index-Sequential Organization is usually Disk-based, rather than tape. Records are
physically ordered by primary key. And the index gives the physical location of each record.
Records can be accessed sequentially or directly, via the index. The index is stored in a file and
read into memory at the point when the file is opened. Also, indexes must be maintained.
Life sequential organization the data is stored in physical contiguous box. How ever the
difference is in the use of indexes. There are three areas in the disc storage:
• Primary Area:-Contains file records stored by key or ID numbers.
• Overflow Area:-Contains records area that cannot be placed in primary area.
• Index Area:-It contains keys of records and there locations on the disc.
Inverted files
In file organization, this is a file that is indexed on many of the attributes of the data itself. The
inverted list method has a single index for each key type. The records are not necessarily stored in
a sequence. They are placed in the are data storage area, but indexes are updated for the record
keys and location.
Here's an example, in a company file, an index could be maintained for all products, another one
might be maintained for product types. Thus, it is faster to search the indexes than every record.
These types of file are also known as "inverted indexes." Nevertheless, inverted list files use
more media space and the storage devices get full quickly with this type of organization. The
benefits are apparent immediately because searching is fast. However, updating is much slower.
Content-based queries in text retrieval systems use inverted indexes as their preferred
mechanism. Data items in these systems are usually stored compressed which would normally
slow the retrieval process, but the compression algorithm will be chosen to support this
technique.
When querying a file there are certain circumstances when the query is designed to be modal
which means that rules are set which require that different information be held in the index.
3
Here's an example of this modality: when phrase querying is undertaken, the particular algorithm
requires that offsets to word classifications are held in addition to document numbers.
Direct or Hashed Access
With direct or hashed access a portion of disk space is reserved and a “hashing” algorithm
computes the record address. So there is additional space required for this kind of file in the store.
Records are placed randomly through out the file. Records are accessed by addresses that specify
their disc location. Also, this type of file organization requires a disk storage rather than tape. It
has an excellent search retrieval performance, but care must be taken to maintain the indexes. If
the indexes become corrupt, what is left may as well go to the bit-bucket, so it is as well to have
regular backups of this kind of file just as it is for all stored valuable data!
Cellular partition
In order to reduce file search time the storage media may be divided into cells. A cell may be an
entire disk space or it may be a cylinder lists that are localized to lie within a cell. Thus if we had
a multi list organization in which list for KEY1 = PROG list include records on several different
cylinders. When we would break this list into several smaller lists, where each PROG list
included only those records, which are in same cylinder.
The index entry for PROG will now contains several entries of the type (addr, length), where
‘addr’ is pointer to start of the list of record with KEY1 = PROG & ‘length’ is the number of
records on this list. By implementing this, all records on same cell may be accessed without
moving read/write head.
4
1. Difference between random access and sequential access:
random access
Refers to the ability to access data at random. The opposite of random access is
sequential access. To go from point A to point Z in a sequential-access system, you
must pass through all intervening points. In a random-access system, you can jump
directly to point Z. Disks are random access media, whereas tapes are sequential access
media.
The terms random access and sequential access are often used to describe data files. A
random-access data file enables you to read or write information anywhere in the file. In a
sequential-access file, you can only read and write information sequentially, starting from the
beginning of the file.
Both types of files have advantages and disadvantages. If you are always accessing
information in the same order, a sequential-access file is faster. If you tend to access
information randomly, random access is better.
Random access is sometimes called direct access.
5
2. The difference between relative and absolute references
Relative references When you create a formula, references to cells or ranges are usually based
on their position relative to the cell that contains the formula. In the following example, cell B6
contains the formula =A5; Microsoft Excel finds the value one cell above and one cell to the left
of B6. This is known as a relative reference.
When you copy a formula that uses relative references, Excel automatically adjusts the references
in the pasted formula to refer to different cells relative to the position of the formula. In the
following example, the formula in cell B6, =A5, which is one cell above and to the left of B6, has
been copied to cell B7. Excel has adjusted the formula in cell B7 to =A6, which refers to the cell
that is one cell above and to the left of cell B7.
Absolute references If you don't want Excel to adjust references when you copy a formula to a
different cell, use an absolute reference. For example, if your formula multiplies cell A5 with cell
C1 (=A5*C1) and you copy the formula to another cell, Excel will adjust both references. You
can create an absolute reference to cell C1 by placing a dollar sign ($) before the parts of the
reference that do not change. To create an absolute reference to cell C1, for example, add dollar
signs to the formula as follows:
=A5*$C$1
Switching between relative and absolute references If you created a formula and
want to change relative references to absolute (and vice versa), select the cell that
contains the formula. In the formula bar, select the reference you want to change and then
press F4. Each time you press F4, Excel toggles through the combinations: absolute
column and absolute row (for example, $C$1); relative column and absolute row (C$1);
absolute column and relative row ($C1); and relative column and relative row (C1). For
example, if you select the address $A$1 in a formula and press F4, the reference becomes
A$1. Press F4 again and the reference becomes $A1, and so on.
3. Difference between file and record :
A file is a collection or set of records.
Typically, In database sense, A Group of records makes a file.
A group of attributes makes a record.

File Organization11

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

File Organization11

Uploaded by

Copyright:

Available Formats

Introduction

Indexed-Sequential Organization (cylindrical surface indexing)

Random access is sometimes called direct access.

3. Difference between file and record :

A file is a collection or set of records.

Typically, In database sense, A Group of records makes a file.

A group of attributes makes a record.

You might also like