You are on page 1of 3

Cluster File Organization

In all the file organization methods described above, each file contains single table and are all stored
in different ways in the memory. In real life situation, retrieving records from single table is
comparatively less. Most of the cases, we need to combine/join two or more related tables and
retrieve the data. In such cases, above all methods will not be faster to give the result. Those
methods have to traverse each table at a time and then combine the results of each to give the
requested result. This is obvious that the time taken for this is more. So what could be done to
overcome this situation?

Another method of file organization Cluster File Organization is introduced to handle above
situation. In this method two or more table which are frequently used to join and get the results are
stored in the same file called clusters. These files will have two or more tables in the same data
block and the key columns which map these tables are stored only once. This method hence
reduces the cost of searching for various records in different files. All the records are found at one
place and hence making search efficient.

There are two types of cluster file organization

Indexed Clusters: - Here records are grouped based on the cluster key and stored together.
Our example above to illustrate STUDENT-COURSE cluster is an indexed cluster. The
records are grouped based on the cluster key COURSE_ID and all the related records
are stored together. This method is followed when there is retrieval of data for range of
cluster key values or when there is a huge data growth in the clusters. That means, if we
have to select the students who are attending the course with COURSE_ID 230-240 or
there is a large number of students attending the same course, say 250.

Hash Clusters: - This is also similar to indexed cluster. Here instead of storing the records
based on the cluster key, we generate the hash key value for the cluster key and store the
records with same hash key value together in the memory disk.
Rules for using indexes

Index is usually created to enhance the performance of retrieval of records in huge tables. But
sometimes, improper creation of index and usage will cause bad performance of the query. Hence
proper management of the index is required. We have to follow few rules for the index management.

Create Indexes on huge tables to enhance the performance of record retrieval. Query
retrieval in small tables will be usually faster. Hence they need not have any indexes.

Indexes are created on frequently accessed tables means tables on which frequent record
fetches are applied. Usually the columns which are frequently involved in fetching the
records the columns which comes in the condition clause (WHERE Clause) of a query
are used to index. Indexes should be created on columns with unique values and having
range of values. So proper table and columns need to be indexed.

Columns with non-null values If we create index on columns with NULL value, it is of no
use. We cannot fetch the required record.

If there are lots of updates on the column, such columns should not be used in index.

Drop the indexes, if they are no more required in the database. Unwanted and unused
indexes always lead to bad performance of a query. It sometimes deviate the execution
path of the query. That is if there is a better performance of a query by full table scan or by
using some other index, but because of the existence of unwanted index, the DBMS will
be forced to use this index. Hence affecting the performance.

Order of the columns in the index also matters in the performance. When Indexes are
created on set of columns, it is usually created on the order of the columns. That means
when we fire fetch query, the execution path will first fetch for the address location based
on first column in the index, and the subsequent columns. Hence if we use less frequently
accessed columns first in the index, it will not boost the performance. Usually indexes
involving two or more columns, the order should be most frequently accessed columns to
less frequently accessed columns.

Any table can have any number of indexes. But these indexes should increase the
performance of query retrieval. But if there is lot of insertion / deletion/updation, then
having more index is not a good idea. These transactions will require index table to be
updated accordingly. For example, if we have inserted some records into table, then index
table also have to be inserted with index columns. If there is an update/delete, index table
is also have to be updated/deleted. This is an overhead to the database and it will
decrease the performance of insertion / deletion / updation.

New indexes should be created once records are inserted into the table. Otherwise it will
degrade the performance of insertion as it has to update each record for index.

Query optimization
Query optimization is a function of many relational database management systems.
The query optimizer attempts to determine the most efficient way to execute a given query
by considering the possible query plans.
Generally, the query optimizer cannot be accessed directly by users: once queries are
submitted to database server, and parsed by the parser, they are then passed to the query
optimizer where optimization occurs. However, some database engines allow guiding the
query optimizer with hints.
A query is a request for information from a database. It can be as simple as "finding the
address of a person with SS# 123-45-6789," or more complex like "finding the average
salary of all the employed married men in California between the ages 30 to 39, that earn
less than their wives." Queries results are generated by accessing relevant database data
and manipulating it in a way that yields the requested information.

You might also like