You are on page 1of 14

A

ADITYA COLLEGE OF ENGINEERING


PUNGANUR ROAD, MADANAPALLE-517325
II-B.Tech(R15)-I-Semester II-Internal Examinations November -2016 (Descriptive)
15A05301- Database Management Systems (Computer Science Engineering)
Time : 90 min
Max Marks : 30
__________________________________________________________________________________

PART A
1 Answer all the questions

a) List out the desirable properties of a transaction.


Collections of operations that form a single logical unit of work are called Transactions. A database system
must ensure proper execution of transactions despite failures either the entire transaction executes, or none of it
does. A transaction is a unit of program execution that accesses and possibly updates various data items. To
ensure integrity of the data, we require that the database system maintain the following properties of the
transaction.
Atomicity: Either all operations of the transaction are reflected properly in the database, or non are .
Consistency : Execution of a transaction in isolation ( that is, with no other transaction executing concurrently)
preserves the consistency of the database.
Isolation : Even though multiple transactions may execute concurrently, the system guarantees that, for every
pair of transaction Ti and Tj, ti appears to Ti that either Tj finished execution before Ti started, or Tj started
execution after Ty finished. Thus, each transaction is unaware of other transactions executing concurrently in the
system.
Durability : After a transaction completes successfully, the changes it has made to the database persist, even if
there are system failures.
These properties are often called the ACID properties, the acronym is derived from the first letter of each of the
four properties.

b) Define conflict serializability and view serializability.


Conflict Equivalence
Two schedules would be conflicting if they have the following properties Both belong to separate transactions.
Both accesses the same data item. At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if and
only if Both the schedules contain the same set of Transactions.
The order of conflicting pairs of operation is maintained in both the schedules.
Conflict serializability
A schedule is said to be conflict serializable if it is conflict equivalent to some serial schedule.
View Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a
similar manner. For example If T reads the initial data in S1, then it also reads the initial data in S2. If T reads
the value written by J in S1, then it also reads the value written by J in S2. If T performs the final write on the
data value in S1, then it also performs the final write on the data value in S2.
View Serializability :
A schedule is said to be view serializable if it is view equivalent to some serial schedule.
Every conflict serializable schedule is view serializable but every view serializable schedule may or may not be
conflict serializable

c) What is Write Ahead Log Protocol.


Write-Ahead Logging (WAL) is a standard method for ensuring data integrity. WAL's central concept
is that changes to data files (where tables and indexes reside) must be written only after those changes
have been logged, that is, after log records describing the changes have been flushed to permanent
storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction
commit, because we know that in the event of a crash we will be able to recover the database using the
log: any changes that have not been applied to the data pages can be redone from the log records. (This
is roll-forward recovery, also known as REDO.)
Using WAL results in a significantly reduced number of disk writes, because only the log file needs to
be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by

A
the transaction. The log file is written sequentially, and so the cost of syncing the log is much less than
the cost of flushing the data pages. This is especially true for servers handling many small transactions
touching different parts of the data store. Furthermore, when the server is processing many small
concurrent transactions, one fsync of the log file may suffice to commit many transactions.
WAL also makes it possible to support on-line backup and point-in-time recovery. By archiving the
WAL data we can support reverting to any time instant covered by the available WAL data: we simply
install a prior physical backup of the database, and replay the WAL log just as far as the desired time.
The physical backup doesn't have to be an instantaneous snapshot of the database state if it is made
over some period of time, then replaying the WAL log for that period will fix any internal
inconsistencies.

d) Explain various types of indices.


Indexing is a data structure technique to efficiently retrieve records from the database files based on some
attributes on which the indexing has been done. Indexing in database systems is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following types
Primary Index Primary index is defined on an ordered data file. The data file is ordered on a key field.
The key field is generally the primary key of the relation.
Secondary Index Secondary index may be generated from a field which is a candidate key and has a
unique value in every record, or a non-key with duplicate values.
Clustering Index Clustering index is defined on an ordered data file. The data file is ordered on a nonkey field.
Ordered Indexing is of two types

Dense Index
Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database. This makes searching faster
but requires more space to store index records itself. Index records contain search key value and a pointer to the
actual record on the disk.

Sparse Index
In sparse index, index records are not created for every search key. An index record here contains a search key
and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at
the actual location of the data. If the data we are looking for is not where we directly reach by following the
index, then the system starts sequential search until the desired data is found.

Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with
the actual database files. As the size of the database grows, so does the size of the indices. There is an immense
need to keep the index records in the main memory so as to speed up the search operations. If single-level index
is used, then a large size index cannot be kept in memory which leads to multiple disk accesses.

Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so small
that it can be saved in a single disk block, which can easily be accommodated anywhere in the main memory.

e) Distinguish between Linear hashing and extendible hashing.


Extendible Hashing is Good for database that grows and shrinks in size. It allows the hash function to be
modified dynamically. Extendable hashing one form of dynamic hashing in which
o Hash function generates values over a large range typically b-bit integers, with b = 32.
o At any time use only a prefix of the hash function to index into a table of bucket addresses.
o Let the length of the prefix be i bits, 0 i 32.
Bucket address table size = 2i. Initially i = 0
Value of i grows and shrinks as the size of the database grows and shrinks.
o Multiple entries in the bucket address table may point to a bucket (why?)
o Thus, actual number of buckets is < 2i
The number of buckets also changes dynamically due to coalescing and splitting of
buckets.
Benefits of extendable hashing:
o Hash performance does not degrade with growth of file
o Minimal space overhead
Disadvantages of extendable hashing
o Extra level of indirection to find desired record
o Bucket address table may itself become very big (larger than memory)
Cannot allocate very large contiguous areas on disk either
Solution: B+-tree structure to locate desired record in bucket address table
o Changing size of bucket address table is an expensive operation
Linear Hashing is an alternative mechanism that allows incremental growth of its directory (equivalent to
bucket address table) and at the cost of more bucket overflows. Compared with Extendible Hashing,
Linear Hashing does not use a bucket directory, and when an overow occurs it is not always the overflown bucket
that is split. The name Linear Hashing is used because the number of buckets grows or shrinks in a linear fashion.
Overflows are handled by creating a chain of pages under the overown bucket. The hashing function changes
dynamically and at any given instant there can be at most two hashing functions used by the scheme.
For uniform distributions, this implementation of Linear Hashing has a lower average cost for equality selections
(because the directory level is eliminated). For skewed distributions, this implementation could result in any
empty or nearly empty buckets, each of which is allocated at least one page, leading to poor performance relative
to Extendible Hashing, which is likely to have higher bucket occupancy.

A
PART B
2) a) Describe about testing of Serializability.
A simple and efficient method for determining conflict serializability of a schedule. Consider a schedule
S. We construct a directed graph, called a precedence graph, from S. This graph consists of a pair G =
(V, E), where V is a set of vertices and E is a set of edges. The set of vertices consists of all the
transactions participating in the schedule. The set of edges consists of all edges Ti Tj for which one
of three conditions holds:
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
If an edge Ti Tj exists in the precedence graph, then, in any serial schedule S1equivalent to S, Ti must
appear before Tj .
If the precedence graph for S has a cycle, then schedule S is not conflict serializable. If the graph contains
no cycles, then the schedule S is conflict serializable. A serializability order of the transactions can be
obtained through topological sorting, which determines a linear order consistent with the partial order
of the precedence graph. There are, in general, several possible linear orders that can be obtained through
a topological sorting. Thus, to test for conflict serializability, we need to construct the precedence graph
and to invoke a cycle-detection algorithm. Cycle-detection algorithms, such as those based on depthfirst search, require on the order of n2 operations, where n is the number of vertices in the graph (that is,
the number of transactions).
Testing for view serializability is complicated. The problem of testing for view serializability is itself
NP-complete. Thus, almost certainly there exists no efficient algorithm to test for view serializability.
However, concurrency-control schemes can still use sufficient conditions for view serializability. That
is, if the sufficient conditions are satisfied, the schedule is view serializable, but there may be viewserializable schedules that do not satisfy the sufficient conditions.

b) Explain the deferred and immediate modification versions of the log based
recovery scheme.
Deferred Update vs Immediate Update
Deferred Update and Immediate Update are two techniques used to maintain transaction log files of Database
Management Systems (DBMS). Transaction log (also referred to as the journal log or the redo log) is a physical
file that stores the Transaction ID, the time stamp of the transaction, the old value and the new values of the data.
This allows the DBMS to keep track of the data before and after each transaction. When the transactions are
committed and the database is returned to a consistent state, the log might be truncated to remove the committed
transactions.
The difference between Deferred Update and Immediate Update:
Even though Deferred Update and Immediate Update are two methods for recovering after a system failure, the
process that each method uses is different. In differed update method, any changes made to the data by a
transaction are first recorded in a log file and applied to the database on commit. In immediate update method,
changes made by a transaction are directly applied to the database and old values and new values are recorded in
the log file. These records are used to restore old values on rollback. In differed update method, records in the log
file are discarded on roll back and are never applied to the database. One disadvantage of deferred update method
is the increased time taken to recover in case of a system failure. On the other hand, frequent I/O operations while
the transaction is active, is a disadvantage in immediate update method.

(OR)
3) Write short note on
a) Types of schedules.
A Schedule represents the order in which instructions of transactions are executed. Various types of schedules
are as follows:
Complete Schedule:
A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called a
complete schedule. A complete schedule must contain all the actions of every transaction that appears in it.

A
Eg.

Serial Schedule:
If the actions of different transactions are not interleaved that is, transactions are executed
from start to finish, one by one then that schedule is called a serial schedule.

Recoverable Schedule:
A schedule in which for each pair of transactions Ti and Tj such that Tj reads a data item that was previously
written by Ti then the commit operation of Ti should appear before the commit operation of Tj.

Cascadeless Schedules:

A
A schedule in which for each pair of transactions Ti and Tj such that Tj reads a data item that was previously
written by Ti then the commit operation of Ti should appear before theread operation of Tj.

Strict Schedule:
A schedule is said to be strict if a value written by a transaction cannot be read or overwritten by other transactions
until the transaction is either committed or aborted.
Every strict schedule is recoverable and cascadeless.

b) Validation based Protocol.


In cases where a majority of transactions are read-only transactions, the rate of conflicts among transactions may
be low. A concurrency-control scheme imposes overhead of code execution and possible delay of transactions.
In validation based protocol we assume that each transaction Ti executes in two or three different phases in its
lifetime, depending on whether it is a read-only or an update transaction. The phases are, in order,
1. Read phase. During this phase, the system executes transaction Ti. It reads the values of the various data items
and stores them in variables local to Ti. It performs all write operations on temporary local variables, without
updates of the actual database.
2. Validation phase. Transaction Ti performs a validation test to determine whether it can copy to the database
the temporary local variables that hold the results of write operations without causing a violation of serializability.
3. Write phase. If transaction Ti succeeds in validation (step 2), then the system applies the actual updates to the
database. Otherwise, the system rolls back Ti. Each transaction must go through the three phases in the order
shown.
However, all three phases of concurrently executing transactions can be interleaved. To perform the validation
test, we need to know when the various phases of transactions Ti took place. We shall, therefore, associate three
different timestamps with transaction Ti:
1. Start(Ti), the time when Ti started its execution.
2. Validation(Ti), the time when Ti finished its read phase and started its validation phase.
3. Finish(Ti), the time when Ti finished its write phase.
We determine the serializability order by the timestamp-ordering technique, using the value of the timestamp
Validation(Ti). Thus, the value TS(Ti) = Validation(Ti) and, if TS(Tj ) < TS(Tk), then any produced schedule
must be equivalent to a serial schedule in which transaction Tj appears before transaction Tk. The reason we have

A
chosen Validation(Ti), rather than Start(Ti), as the timestamp of transaction Ti is that we can expect faster
response time provided that conflict rates among transactions are indeed low.
The validation test for transaction Tj requires that, for all transactions Ti with TS(Ti) < TS(Tj ), one of the
following two conditions must hold:
1. Finish(Ti) < Start(Tj ). Since Ti completes its execution before Tj started, the
serializability order is indeed maintained.
2. The set of data items written by Ti does not intersect with the set of data items read by Tj, and Ti completes its
write phase before Tj starts its validation phase (Start(Tj ) < Finish(Ti) < Validation(Tj )). This condition ensures
that the writes of Ti and Tj do not overlap. Since the writes of Ti do not affect the read of Tj , and since Tj cannot
affect the read of Ti, the serializability order is indeed maintained.
The validation scheme automatically guards against cascading rollbacks, since the actual writes take place only
after the transaction issuing the write has committed. However, there is a possibility of starvation of long
transactions, due to a sequence of conflicting short transactions that cause repeated restarts of the long transaction.
To avoid starvation, conflicting transactions must be temporarily blocked, to enable the long transaction to finish.
This validation scheme is called the optimistic concurrency control scheme since transactions execute
optimistically, assuming they will be able to finish execution and validate at the end. In contrast, locking and
timestamp ordering are pessimistic in that they force a wait or a rollback whenever a conflict is detected, even
though there is a chance that the schedule may be conflict serializable.

c) ARIES.
ARIES is a recovery algorithm that is designed to work with a steal, no-force approach. When the recovery
manager is invoked after a crash, restart proceeds in three phases:
1.Analysis: Identifies dirty pages in the buffer pool and active transactions at the time of the crash.
2.Redo: Repeats all actions, starting from an appropriate point in the log, and restores the database state to what
it was at the time of the crash.
3.Undo: Undoes the actions of transactions that did not commit, so that the database reflects only the actions of
committed transactions.
There are three main principles behind the ARIES recovery algorithm:
Write-ahead logging: Any change to a database object is first recorded in the log; the record in the log must be
written to stable storage before the change to the database object is written to disk.
Repeating history during Redo: Upon restart following a crash, ARIES retraces all actions of the DBMS before
the crash and brings the system back to the exact state that it was in at the time of the crash. Then, it undoes the
actions of transactions that were still active at the time of the crash.
Logging changes during Undo: Changes made to the database while undoing a transaction are logged in order to
ensure that such an action is not repeated in the event of repeated restarts.

d) Fuzzy Checkpointing
The checkpointing technique requires that all updates to the database be temporarily suspended while
the checkpoint is in progress. If the number of pages in the buffer is large, a checkpoint may take a long
time to finish, which can result in an unacceptable interruption in processing of transactions.
To avoid such interruptions, the checkpointing technique can be modified to permit updates to start once
the checkpoint record has been written, but before the modified buffer blocks arewritten to disk. The
checkpoint thus generated is a fuzzy checkpoint. Since pages are output to disk only after the checkpoint
record has been written, it is possible that the system could crash before all pages are written. Thus, a
checkpoint on disk may be incomplete. One way to deal with incomplete checkpoints is this: The
location in the log of the checkpoint record of the last completed checkpoint is stored in a fixed position,
last-checkpoint, on disk. The system does not update this information when it writes the checkpoint
record. Instead, before it writes the checkpoint record, it creates a list of all modified buffer blocks. The
last-checkpoint information is updated only after all buffer blocks in the list of modified buffer blocks
have been output to disk.
Even with fuzzy checkpointing, a buffer block must not be updated while it is being output to disk,
although other buffer blocks may be updated concurrently. The write-ahead log protocol must be
followed so that (undo) log records pertaining to a block are on stable storage before the block is output.
Note that, in our scheme, logical logging is used only for undo purposes, whereas
Physical logging is used for redo and undo purposes. There are recovery schemes that use logical logging
for redo purposes. To perform logical redo, the database state on disk must be operation consistent, that
is, it should not have partial effects of any operation. It is difficult to guarantee operation consistency of
the database on disk if an operation can affect more than one page, since it is not possible to write two
or more pages atomically. Therefore, logical redo logging is usually restricted only to operations that

A
affect a single page; In contrast, logical undos are performed on an operation-consistent database state
achieved by repeating history, and then performing physical undo of partially completed operations.

e. Logical Undo Logging


For operations where locks are released early, we cannot perform the undo actions by simply writing
back the old value of the data items. Consider a transaction T that inserts an entry into a B+-tree, and,
following the B+-tree concurrency-control protocol, releases some locks after the insertion operation
completes, but before the transaction commits. After the locks are released, other transactions may
perform further insertions or deletions, thereby causing further changes to the B+-tree nodes.
Even though the operation releases some locks early, it must retain enough locks to ensure that no other
transaction is allowed to execute any conflicting operation (such as reading the inserted value or
deleting the inserted value) holds locks on the leaf level of the B+-tree until the end of the transaction.
The insertion operation has to be undone by a logical undothat is, in this case, by the execution of a
delete operation. Therefore, when the insertion operation completes, before it releases any locks, it
writes a log record <Ti,Oj , operation-end, U>, where the U denotes undo information and Oj denotes
a unique identifier for (the instance of) the operation. The insertion and deletion operations are
examples of a class of operations that require logical undo operations since they release locks early;
we call such operations logical operations. Before a logical operation begins, it writes a log record
<Ti,Oj , operation-begin>, where Oj is the unique identifier for the operation. While the system is
executing the operation, it does physical logging in the normal fashion for all updates performed by the
operation. Thus, the usual old-value and new-value information is written out for each update. When
the operation finishes, it writes an operation-end log record.
4)

a) Explain concurrency control techniques 2PL and Timestamp based protocols.

Lock-Based Concurrency Control

A DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no
actions of committed transactions are lost while undoing aborted transactions. A DBMS typically uses
a locking protocol to achieve this. A locking protocol is a set of rules to be followed by each transaction,
in order to ensure that even though actions of several transactions might be interleaved, the net effect is
identical to executing all transactions in some serial order.
Strict Two-Phase Locking(Strict2PL):
The most widely used locking protocol, called Strict Two-Phase Locking, or Strict2PL,
has two rules. The first rule is
1.If a transaction T wants to read an object, it first requests a shared lock on the object.A transaction that
requests a lock is suspended until the DBMS is able to grant it the requested lock. The DBMS keeps
track of the locks it has granted and ensures that if a transaction holds an exclusive lock on an object no
other transaction holds a shared or exclusive lock on the same object.
(2)All locks held by a transaction are released when the transaction is completed.
Multiple-Granularity Locking
Another specialized locking strategy is called multiple-granularity locking, and it allows us to efficiently
set locks on objects that contain other objects. For instance, a database contains several files , a file is a
collection of pages , and a page is a collection of records . A transaction that expects to access most of
the pages in a file should probably set a lock on the entire file, rather than locking individual pages as
and when it needs them. Doing so reduces the locking overhead considerably. On the other hand, other
transactions that require access to parts of the file even parts that are not needed by this transaction
are blocked. If a transaction accesses relatively few pages of the file, it is better to lock only those pages.
Similarly, if a transaction accesses ever all records on a page, it should lock the entire page, and if it
accesses just a few records, it should lock just those records.
The recovery manager of a DBMS is responsible for ensuring two important properties of transactions:
atomicity and durability. It ensures atomicity by undoing the actions of transactions that do not commit
and durability by making sure that all actions of committed transactions survive system crashes, (e.g., a
core dump caused by a bus error) and media failures (e.g., a disk is corrupted).
Timestamp Ordering Protocol

A
The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and write
operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed
according to the timestamp values of the transactions.

The timestamp of transaction Ti is denoted as TS(Ti).


Read time-stamp of data-item X is denoted by R-timestamp(X).
Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows
If a transaction Ti issues a read(X) operation
o If TS(Ti) < W-timestamp(X)
Operation rejected.
o If TS(Ti) >= W-timestamp(X)
Operation executed.
o All data-item timestamps updated.
If a transaction Ti issues a write(X) operation
o If TS(Ti) < R-timestamp(X)
Operation rejected.
o If TS(Ti) < W-timestamp(X)
Operation rejected and Ti rolled back.
o Otherwise, operation executed.
Thomas' Write Rule
This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.

b) Write a short note on remote backups.


Remote backup systems provide high availability by allowing transaction processing to continue even if the
primary site is destroyed. Remote backup can be offline or real-time or online. In case it is offline, it is maintained
manually. Online backup systems are more real-time and lifesavers for database administrators and investors. An
online backup system is a mechanism where every bit of the real-time data is backed up simultaneously at two
distant places. One of them is directly connected to the system and the other one is kept at a remote place as
backup.
As soon as the primary database storage fails, the backup system senses the failure and switches the user system
to the remote storage. Sometimes this is so instant that the users cant even realize a failure.

Detection of failure: Backup site must detect when primary site has failed
To distinguish primary site failure from link failure maintain several communication
links between the primary and the remote backup.
Heart-beat messages
Transfer of control:
o To take over control backup site first perform recovery using its copy of the database and all the
long records it has received from the primary.
o Thus, completed transactions are redone and incomplete transactions are rolled back.
o When the backup site takes over processing it becomes the new primary
o To transfer control back to old primary when it recovers, old primary must receive redo logs from
the old backup and apply all updates locally.

A
Time to recover: To reduce delay in takeover, backup site periodically proceses the redo log records (in
effect, performing recovery from previous database state), performs a checkpoint, and can then delete
earlier parts of the log.
Hot-Spare configuration permits very fast takeover:
o Backup continually processes redo log record as they arrive, applying the updates locally.
o When failure of the primary is detected the backup rolls back incomplete transactions, and is
ready to process new transactions.
Alternative to remote backup: distributed database with replicated data
o Remote backup is faster and cheaper, but less tolerant to failure
Ensure durability of updates by delaying transaction commit until update is logged at backup; avoid this
delay by permitting lower degrees of durability.
One-safe: commit as soon as transactions commit log record is written at primary
o Problem: updates may not arrive at backup before it takes over.
Two-very-safe: commit when transactions commit log record is written at primary and backup
o Reduces availability since transactions cannot commit if either site fails.
Two-safe: proceed as in two-very-safe if both primary and backup are active. If only the primary is active,
the transaction commits as soon as is commit log record is written at the primary.
o Better availability than two-very-safe; avoids problem of lost transactions in one-safe.
(OR)

5) a) What is an Index? Elaborate about ISAM and B+-Tree Structures.


Indexing is a data structure technique to efficiently retrieve records from the database files based on some
attributes on which the indexing has been done. Indexing in database systems is similar to what we see in books.
INDEXED SEQUENTIAL ACCESS METHOD (ISAM)
The potential large size of the index file motivates the ISAM idea. Building an auxiliary file on the index file and
so on recursively until the final auxiliary file fits on one page. This repeated construction of a one-level index
leads to a tree structure that is illustrated in Figure. The data entries of the ISAM index are in the leaf pages of
the tree and additional overflow pages that are chained to some leaf page. In addition, some systems carefully
organize the layout of pages so that page boundaries correspond closely to the physical characteristics of the
underlying storage device. The ISAM structure is completely static and facilitates such low-level optimizations.

Fig ISAM Index Structure


Each tree node is a disk page, and all the data resides in the leaf pages. This corresponds to an index that uses
Alternative (1) for data entries, we can create an index with Alternative (2) by storing the data records in a separate
file and storing key, rid pairs in the leaf pages of the ISAM index. When the file is created, all leaf pages are
allocated sequentially and sorted on the search key value.The non-leaf level pages are then allocated. If there are
several inserts to the file subsequently, so that more entries are inserted into a leaf than will fit onto a single page,
additional pages are needed because the index structure is static. These additional pages are allocated from an
overflow area. The allocation of pages is illustrated in below Figure.

Fig: Page allocation in ISAM

B+ tree
A static structure such as the ISAM index suffers from the problem that long overflow chains can develop as
the file grows, leading to poor performance. This problem motivated the development of more flexible, dynamic
structures that adjust gracefully to inserts and deletes. The B+ tree search structure, which is widely used, is a
balanced tree in which the internal nodes direct the search and the leaf nodes contain the data entries. Since the
tree structure grows and shrinks dynamically, it is not feasible to allocate the leaf pages sequentially as in ISAM,
where the set of primary leaf pages was static. In order to retrieve all leaf pages efficiently, we have to link them
using page pointers. By organizing them into a doubly linked list, we can easily traverse the sequence of leaf
pages in either direction. This structure is illustrated in Figure.

The following are some of the main characteristics of a B+ tree:


Operations (insert, delete) on the tree keep it balanced.However, deletion is often implemented by simply locating
the data entry and removing it, without adjusting the tree as needed to guarantee the 50 percent occupancy,
because files typically grow rather than shrink. Searching for a record requires just a traversal from the root to the
appropriate leaf. We will refer to the length of a path from the root to a leafany leaf, because the tree is
balancedas the height of the tree.

b) Describe the various types of file organizations.


A file organization is a way of arranging the records in a file when the file is stored on disk. A file of
records is likely to be accessed and modified in a variety of ways, and different of ways arranging the
records enable different operations over the file to be carried out efficiently.
A DBMS supports several file organization techniques, and an important task of a DBA is to choose a
good organization for each file, based on its expected pattern of use.
The three different file organizations are :

Heap files : Files of randomly ordered are called Heap files. When a file is created using Heap File
Organization, the Operating System allocates memory area to that file without any further accounting
details. File records can be placed anywhere in that memory area. It is the responsibility of the software
to manage the records. Heap File does not support any ordering, sequencing, or indexing on its own.
Sorted files : these are the files sorted on some field. Every file record contains a data field (attribute)
to uniquely identify that record. In sequential file organization, records are placed in the file in some
sequential order based on the unique key field or search key. Practically, it is not possible to store all the
records sequentially in physical form.
Hashed files : Files that are hashed on some fields are called hashed files. Hash File Organization
uses Hash function computation on some fields of the records. The output of the hash function
determines the location of disk block where the records are to be placed.

Clustered Files: Clustered file organization is not considered good for large databases. In this
mechanism, related records from one or more relations are kept in the same disk block, that is, the
ordering of records is not based on primary key or search key.
The choice of file organization can have a significant on performance. The choice of an appropriate file
organization depends on the operations like Scan, Insert, Delete

ADITYA COLLEGE OF ENGINEERING


PUNGANUR ROAD, MADANAPALLE-517325
II-B.Tech(R15)-I-Semester II-Internal Examinations November -2016 (Objective)
15A05301- Database Management Systems (Computer Science Engineering)
Time :20 min
Max Marks : 10
Name :

Roll No:

__________________________________________________________________________________
I Answer all the questions each question carries 1 Mark
5*1=5
1) What are MVD and Join Dependencies?

Multi Valued Dependency:

Let R be the relational schema, X,Y be the attribute sets over R. A MVD (XY) exists on a relation R : If
two tuples t1 and t2 exists in R, such that t1[X] = t2[Y] then two tuples t3 and t4 should also exist in R with the
following properties where Z = R {X Y}:
t3[X] = t4[X] = t1[X] = t2[X]
t3[Y] = t1[Y] and t4[Y] = t2[Y]
t3[Z] = t2[Z] and t4[Z] = t1[Z]
The tuples t1, t2, t3, t4 are not necessarily distinct.

Join Dependency
Let R be a relation. Let A, B, , Z be arbitrary subsets of Rs attributes. R satisfies the JD
*(A,B,,Z) if and only if R is equal to the join of its projections on A, B, , Z. A join dependency
JD(R1, R2, , Rn) specified on relation schema R, is a trivial JD, if one of the relation schemas Ri in
JD(R1, R2, .,Rn) is equal to R.
Join dependency is used in the following case : When

there is no lossless join decomposition of R into two


relation schemas, but there is a lossless join decompositions of R into more than two relation schemas.
A join dependency is very difficult in a database, hence normally not used
2) What is a secondary index?
Secondary index may be generated from a field which is a candidate key and has a unique value in every
record, or a non-key with duplicate values.
3) Define lock and list various types of lock modes.
Locking is a mechanism to ensure data integrity while allowing maximum concurrent access to data. It is used to
implement concurrency control when multiple users access table to manipulate its data at the same time.
Lock types:
Binary Locks

A lock on a data item can be in two states; it is either locked or unlocked.

Shared Lock - This type is placed on a record when the record is being viewed.
Exclusive lock - This is placed when Insert, Update or Delete command is performed. There can be only one
exclusive lock on a record at a time.
4) What is Multi version Time stamp Ordering Protocol.
Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote the version of Q whose write
timestamp is the largest write timestamp less than or equal to TS(Ti).

A
1. If transaction Ti issues a read(Q), then the value returned is the content of version Qk.
2. If transaction Ti issues write(Q), and if TS(Ti)<R-timestamp(Qk), then the system rolls back transaction Ti.
On the other hand, if TS(Ti) = W-timestamp(Qk), the system overwrites the contents of Qk; otherwise it creates
a new version of Q.
5) What is a checkpoint.

Keeping and maintaining logs in real time and in real environment may fill out all the memory space
available in the system. As time passes, the log file may grow too big to be handled at all. Checkpoint
is a mechanism where all the previous logs are removed from the system and stored permanently in a
storage disk. Checkpoint declares a point before which the DBMS was in consistent state, and all the
transactions were committed.

II Answer all the questions each question carries mark

5*1/2=2 12

Fill in the blanks with appropriate answers given below.


(LSN, Steal and No force, BCNF, Precedence Graph, Tree Based Protocols)
6. One of data structures used by ARIES are LSN
7. Desired approach for updating the dabases are Steal and No force
8. Testing for serializability can be easily achieved with the help of Precedence Graphs
9. Minimum required level of normalization for a table is BCNF
10. Example for lock based pessimistic concurrency control protocol is Tree Based Protocols
III Answer all the questions each question carries mark

5*1/2=2 12

11.Deadlocks are possible only when one of the transactions wants to obtain a(n) ____ lock on a data
item.
[ C]
a. Binary
b. Exclusive c. Shared
d. Complete
12. A DBMS uses a transaction _ to keep track of all transactions that update the database [ A ]
a. log
b. table
c. block
d. statement
13. Reading uncommitted data is addressed as
[ A ]
a. Dirty Read
b. Log Read
c. W-W conflict
d. Tree
14. Example of non dense index is
[ D ]
a. Ternary
b. Secondary
c. Primary
d. Clustered
15. Transitive Dependency is removed in
[ C ]
a. 2NF
b. Denormalization
c. 3NF
d. 5NF

You might also like