Professional Documents
Culture Documents
Information Technology
Database Management
System
Semester I
Amity
University
Chapter 6 and 7 is focusing on many advance concepts of Database systems including the
concept of Transaction Management, Concurrency Control Technology and Backup and
Recovery methods of database system.
Updated Syllabus
Course Contents:
Model I: Introduction to DBMS
Introduction to DBMS, Architecture of DBMS, Components of DBMS, Traditional data Models
(Network, Hierarchical and Relational), Database Users, Database Languages, Schemas and
Instances, Data Independence
Module II: Data Modeling
Entity sets attributes and keys, Relationships (ER), Database modeling using entity, Weak and
Strong entity types, Enhanced entity-relationship (EER), Entity Relationship Diagram Design of
an E-R Database schema, Object modeling, Specialization and generalization
Module III: Relational Database Model
Basic Definitions, Properties of Relational Model, Keys, Constraints, Integrity rules, Relational
Algebra, Relational Calculus.
Module IV: Relational Database Design
Functional
Dependencies,
Normalization,
Normal
forms
(1st,
2nd,
3rd,BCNF),
Lossless
Basic security issues, Discretionary access control, Mandatory access control, Statistical
database security.
Module VIII: Transaction Management and Concurrency Control Techniques
Transaction concept, ACID properties, Schedules and recoverability, Serial and Non-serial
schedules, Serializability, Concurrency Techniques: Locking Protocols, Timestamping Protocol,
Multiversion Technique, Deadlock Concept - detection and resolution.
Module IX: Backup and Recovery
Database recovery techniques based on immediate and deferred update, ARIES recovery
algorithm, Shadow pages and Write-ahead Logging
Text & References:
Tex t:
R eferences:
Index:
Chapter
Page No.
34
49
64
78
106
138
Chapter-1
INTRODUCTION TO DBMS AND DATA MODELING
1. Introductory Concepts
Data: - Data is Collection of facts, upon which a conclusion is based. (Information or
knowledge has value, data has cost). Data can be represented in terms of numbers, characters,
pictures, sounds and figures
Data item: - Smallest named unit of data that has meaning in the real world (examples: last
name, Locality, STD_Code )
Database: - Interrelated collection of data that serves the needs of multiple users within one or
more organizations, i.e. interrelated collections of records of potentially many types.
Database administrator (DBA):- A person or group of person responsible for the effective
use of database technology in an organization or enterprise. DBA is said to be custodian or
owner of Database.
Database Management System: - DBMS is a logical collection of software programs which
facilitates large, structured sets of data to be stored, modified, extracted and manipulated in
different ways. Database Management System (DBMS) also provides security features that
protect against unauthorized users trying to gain access to confidential information and prevent
data loss in case of a system crash. Depending on the specific users requirement, users are
allowed access to either all, or specific database subschema, through the use of passwords.
DBMS is also responsible for the databases integrity, ensuring that no two users are able to
update the same record at the same time, as well as preventing duplicate entries, such as two
employees being given the same employee number.
The following are examples of database applications:
1. Computerized library systems.
2. Automated teller machines.
3. Airline reservation systems.
There are innumerable numbers of Database Management System (DBMS) Software available in
the market. Some of the most popular ones include Oracle, IBMs DB2, Microsoft Access,
Microsoft SQL Server, MySQL. MySQL is, one of the most popular database management
systems used by online entrepreneurs is one example of an object-oriented DBMS. Microsoft
Access (another popular DBMS) on the other hand is not a fully object oriented system, even
though it does exhibit certain aspects of it.
Example: A database may contain detailed student information, certain users may only be
allowed access to student names , addresses and Phone number, while others user may be able
to view payment detail of students or marks detail of student. Access and change logs can be
programmed to add even more security to a database, recording the date, time and details of any
user making any alteration to the database.
Furthermore, the Database Management Systems employ the use of a query language and report
writers to interrogate the database and analyze its data. Queries allow users to search, sort, and
analyze specific data by granting users efficient access to the required information.
Example: one would use a query command to make the system retrieve data regarding all
courses of a particular department. The most common query language used to access database
systems is the Structured Query Language (SQL).
* Audit trail.
Privacy (the goal) and security (the means)
* Schema/ Sub-schema,
* Passwords
Management controlDBA: lifecycle control, training, maintenance
Data independence (a relative term) -- Avoids reprogramming of applications, allows easier
conversion and reorganization of data.
Physical data independence:
3.1 Hierarchical databases organize data under the premise of a basic parent/child relationship.
Each parent can have many children, but each child can only have one parent. In hierarchical
databases, attributes of specific records are listed under an entity type and entity types are
connected to each other through one-to-many relationships, also known as 1:N mapping.
Originally, hierarchical relationships were most commonly used in mainframe systems, but with
the advent of increasingly complex relationship systems, they have now become too restrictive
and are thus rarely used in modern databases. If any of the one-to-many relationships are
compromised, for e.g. an employee having more than one manager, the database structure
switches from hierarchical to a network.
3.2 Network model: In the network model of a database it is possible for a record to have
multiple parents, making the system more flexible compared to the strict single-parent model of
the hierarchical database. The model is made to accommodate many to many relationships,
which allows for a more realistic representation of the relationships between entities. Even
though the network database model enjoyed popularity for a short while, it never really lifted of
the ground in terms of staging a revolution. It is now rarely used because of the availability of
more competitive models that boast the higher flexibility demanded in todays ever advancing
age.
3.3 Relational databases (RDBMS) are completely unique when compared to the
aforementioned models as the design of the records is organized around a set of tables (with
unique identifiers) to represent both the data and their relationships. The fields to be used for
matching are often indexed in order to speed up the process and the data can be retrieved and
manipulated in a number of ways without the need to reorganize the original database tables.
Working under the assumption that file systems (which often use the hierarchical or network
models) are not considered databases, the relational database model is the most commonly used
system today. While the concepts behind hierarchical and network database models are older
than that of the relational model, the latter was in fact the first one to be formally defined.
After the relational DBMS soared to popularity, the most recent development in DMBS
technology came in the form of the object-oriented database model, which offers more flexibility
than the hierarchical, network and relational models put together. Under this model, data exists
in the form of objects, which include both the data and the datas behavior. Certain modern
information systems contain such convoluted combinations of information that traditional data
models (including the RDBMS) remain too restrictive to adequately model this complex data.
The object-oriented model also exhibits better cohesion and coupling than prior models, resulting
in a database which is not only more flexible and more manageable but also the most able when
it comes to modeling real-life processes. However, due to the immaturity of this model, certain
problems are bound to arise, some major ones being the lack of an SQL equivalent as well as
lack of standardization. Furthermore, the most common use of the object oriented model is to
have an object point to the child or parent OID (object I.D.) to be retrieved; leaving many
programmers with the impression that the object oriented model is simply a reincarnation of the
network model at best. That is, however, an attempt at the over-simplification of an innovative
technology.
4. Components of a DBMS
Components of a Data Base Management System (DBMS) is well illustrated by the diagram
shown bellow.
4.1. Database Engine: Database Engine is the foundation for storing, processing, and securing
data. The Database Engine provides controlled access and rapid transaction processing to meet the
requirements of the most demanding data consuming applications within your enterprise. Use the
Database Engine to create relational databases for online transaction processing or online analytical
processing data. This includes creating tables for storing data, and database objects such as
indexes, views, and stored procedures for viewing, managing, and securing data. You can use SQL
Server Management Studio to manage the database objects, and SQL Server Profiler for capturing
server
events.
4.2. Data dictionary: A data dictionary is a reserved space within a database which is used to store
information about the database itself. A data dictionary is a set of table and views which can only
be read and never altered. Most data dictionaries contain different information about the data used
in the enterprise. In terms of the database representation of the data, the data table defines all
schema objects including views, tables, clusters, indexes, sequences, synonyms, procedures,
packages, functions, triggers and many more. This will ensure that all these things follow one
standard defined in the dictionary. The data dictionary also defines how much space has been
allocated for and / or currently in used by all the schema objects. A data dictionary is used when
finding information about users, objects, schema and storage structures. Every time a data
definition language (DDL) statement is issued, the data dictionary becomes modified.
A data dictionary may contain information such as:
User permissions
User statistics
DDL interpreter
DML compiler
charts, and other diagrams. Once you have created a format for a report, you can save the format
specifications in a file and continue reusing it for new data.
5. Database Languages
5.1 Data Definition Language (DDL): Data Definition Language (DDL). It is use to define the
structure of a Database. The database structure definition (Schema) typically includes the
following:
Defining all data element, Defining data element field and records, Defining the name, field
length, and field type for each data type, Defining control for field that can have only selective
values.
Typical DDL operations (with their respective keywords in the structured query language SQL):
5.3 Data Control Language (DCL): Data control commands in SQL control access privileges
and security issues of a database system or parts of it. These commands are closely related to the
DBMS (Database Management System) and can therefore vary in different SQL
implementations. Some typical commands are:
GRANT
REVOKE
command
Since these commands depend on the actual database management system (DBMS), we will not
cover DCL in this module.
6. Database USER
6.1 Database Administrator (DBA): The DBA is a person or a group of persons who is
responsible for the management of the database. The DBA is responsible for authorizing access
to the database by grant and revoke permissions to the users, for coordinating and monitoring its
use, managing backups and repairing damage due to hardware and/or software failures and for
acquiring hardware and software resources as needed. In case of small organization the role of
DBA is performed by a single person and in case of large organizations there is a group of
DBA's who share responsibilities.
6.2 Database Designers: They are responsible for identifying the data to be stored in the
database and for choosing appropriate structure to represent and store the data. It is the
responsibility of database designers to communicate with all prospective of the database users in
order to understand their requirements so that they can create a design that meets their
requirements.
6.3 End Users: End Users are the people who interact with the database through applications or
utilities. The various categories of end users are:
Casual End Users - These Users occasionally access the database but may need
different information each time. They use sophisticated database Query language to specify their
requests. For example: High level Managers who access the data weekly or biweekly.
Native End Users - These users frequently query and update the database using standard
types of Queries. The operations that can be performed by this class of users are very limited and
effect precise portion of the database.
For example: - Reservation clerks for airlines/hotels check availability for given request and
make reservations. Also, persons using Automated Teller Machines (ATM's) fall under this
category as he has access to limited portion of the database.
Standalone end Users/On-line End Users - Those end Users who interact with the
database directly via on-line terminal or indirectly through Menu or graphics based Interfaces.
For example: - User of a text package, library management software that store variety of library
data such as issue and return of books for fine purposes.
6.4 Application Programmers
Application Programmers are responsible for writing application programs that use the database.
These programs could be written in General Purpose Programming languages such as Visual
Basic, Developer, C, FORTRAN, COBOL etc. to manipulate the database. These application
programs operate on the data to perform various operations such as retaining information,
creating new.
7. ADVANTAGES OF DBMS
The DBMS (Database Management System) is preferred over the conventional file
processing system due to the following advantages:
Controlling Data Redundancy - In the conventional file processing system, every user group
maintains
its
own
files
for
handling
its
data
files.
This
may
lead
to
Flexibility of the System is improved - Since changes are often necessary to the contents of the
data stored in any system, these changes are made more easily in a centralized database than in a
conventional system. Applications programs need not to be changed on changing the data in the
database. This will also maintain the consistency and integrity of data into the database.
Integrity can be improved - Since data of the organization using database approach is
centralized and would be used by a number of users at a time. It is essential to enforce integrityconstraints.
In the conventional systems because the data is duplicated in multiple files so updating or
changes may sometimes lead to entry of incorrect data in some files where it exists.
For example: - The example of result system that we have already discussed. Since multiple files
are to maintained, as sometimes you may enter a value for course which may not exist. Suppose
course can have values (Computer, Accounts, Economics, and Arts) but we enter a value 'Hindi'
for it, so this may lead to an inconsistent data, so lack of Integrity. Even if we centralized the
database
it
may
still
contain
incorrect
data.
For
example:
Salary of full time employ may be entered as Rs. 500 rather than Rs. 5000.
A student may be shown to have borrowed books but has no enrollment.
A list of employee numbers for a given department may include a number of non existent
employees. These problems can be avoided by defining the validation procedures whenever any
update operation is attempted.
Standards can be enforced - Since all access to the database must be through DBMS, so
standards are easier to enforce. Standards may relate to the naming of data, format of data,
structure of the data etc. Standardizing stored data formats is usually desirable for the purpose of
data interchange or migration between systems.
established for each type of access (retrieve, modify, delete etc.) to each piece of information in
the database.
Consider an Example of banking in which the employee at different levels may be given access
to different types of data in the database. A clerk may be given the authority to know only the
names of all the customers who have a loan in bank but not the details of each loan the customer
may have. It can be accomplished by giving the privileges to each employee.
Organization's requirement can be identified - Organizations have sections and departments
and each of these units often consider the work of their unit as the most important and therefore
consider their need as the most important. Once a database has been setup with centralized
control, it will be necessary to identify organization's requirement and to balance the needs of the
different units. So it may become necessary to ignore some requests for information if they
conflict with higher priority need of the organization. It is the responsibility of the DBA
(Database Administrator) to structure the database system to provide the overall service that is
best
for
an
organization.
For example: - A DBA must choose best file Structure and access method to give fast response
for
the
high
critical
applications
as
compared
to
less
critical
applications.
Overall cost of developing and maintaining systems is lower - It is much easier to respond to
unanticipated requests when data is centralized in a database than when it is stored in a
conventional file system. Although the initial cost of setting up of a database can be large, one
normal expects the overall cost of setting up of a database, developing and maintaining
application programs to be far lower than for similar service using conventional systems, Since
the productivity of programmers can be higher in using non-procedural languages that have been
developed
with
DBMS
than
using
procedural
languages.
Data Model must be developed - Perhaps the most important advantage of setting up of
database system is the requirement that an overall data model for an organization be build. In
conventional systems, it is more likely that files will be designed as per need of particular
applications demand. The overall view is often not considered. Building an overall view of an
organization's
data
is
usual
cost
effective
in
the
long
terms.
Provides backup and Recovery - Centralizing a database provides the schemes such as
recovery and backups from the failures including disk crash, power failures, software errors
which may help the database to recover from the inconsistent state to the state that existed prior
to the occurrence of the failure, though methods are very complex.
8. Three-Schemes Architecture
The objective of Three-Schemes Architecture is to separate the user application program and the
physical database. The Three schema architecture is an effective tool with which the user can
visualize the schema levels in a database system. The three levels ANSI architecture has an
important place in database technology development because it clearly separates the users
external level, the systems conceptual level, and the internal storage level for designing a
database. In three-schemas architecture schemas can be defined at three different levels.
8.1 External Scheme:
An external scheme describes the specific users view of data. and the specific methods
and constraints connected with this information.. Each external schema describes the part
of the part of the database that a particular user group is interested in and hides the rest of
the database from that database from that database.
8.2 Internal Scheme:
The Internal scheme mainly describes the physical storage structure of the database.
Internal scheme describes the data from a view very close to the computer or system in
general. It completes the logical scheme with data technical aspects like storage methods
or help functions for more efficiency.
8.3 Conceptual Schema: It describes the structure of the whole database for the entire user
community. The conceptual schema hides the details of physical storage structure and
concentrates on describing entities, data types, relationships and constraints. This
implementation of conceptual schema is based on conceptual schema design in a high level data
model.
9. Data Independence:
With knowledge about the three-scheme architecture the term data independence can be
explained as followed: Each higher level of the data architecture is immune to changes of the
next lower level of the architecture.
Data independence is normally thought of in terms of two levels or types. Logical data
independence makes it possible to change the structure of the data independently without
modifying the application programs that make use of the data. There is no need to rewrite current
applications as part of the process of adding to or removing data from then system.
The second type or level of data independence is known as physical data independence. This
approach has to do with altering the organization or storage procedures related to the data, rather
than modifying the data itself. Accomplishing this shift in file organization or the indexing
strategy used for the data does not require any modification to the external structure of the
applications, meaning that users of the applications are not likely to notice any difference at all in
the function of their programs.
Database Instance: The term instance is typically used to describe a complete database
environment, including the RDBMS software, table structure, stored procedures and other
functionality.
It
is
most
commonly
used
when
administrators
describe
multiple
relationships between those elements independent of any particular DBMS and implementation
details.
An ER Diagram
Entity
An entity is an real world objects (living or non living) or concept about which you want to
store information..
Weak Entity
A weak entity is an entity that must defined by a foreign key relationship with another entity as it
cannot be uniquely identified by its own attributes alone.
Key attribute
A key attribute is the unique, distinguishing characteristic of the entity, which can uniquely
identify the instances of entity set.. For example, an employee's social security number might be
the employee's key attribute.
Derived attribute
A derived attribute is based on another attribute. For example, an employee's monthly salary is
based on the employee's annual salary.
Relationships
Relationships illustrate how two entities share information in the database structure.
First, connect the two entities, then drop the relationship notation on the line.
Cardinality
Cardinality specifies how many instances of an entity relate to one instance of another entity.
ordinality is also closely linked to cardinality. While cardinality specifies the occurrences of a
relationship, ordinality describes the relationship as either mandatory or optional. In other words,
cardinality specifies the maximum number of relationships and ordinality specifies the absolute
minimum number of relationships.
Recursive relationship
In some cases, entities can be self-linked. For example, employees can supervise other
employees.
Using colors can help you highlight important features in your diagram
5) Create a polished diagram by adding shadows and color. You can choose from a number of
ready-made styles in the Edit menu under Colors and Shadows, or you can create your own.
5. It is very simple and easy to understand by various types of users and designers because
specific standards are used for their representation.
For example, members of entity Employee can be grouped further into Secretary, Engineer,
Manager, Technician, Salaried_Employee.
The set listed is a subset of the entities that belong to the Employee entity, which means that
every entity that belongs to one of the sub sets is also an Employee.
Each of these sub-groupings is called a subclass, and the Employee entity is called the superclass.
An entity cannot only be a member of a subclass; it must also be a member of the superclass.
An entity can be included as a member of a number of sub classes, for example, a Secretary
may also be a salaried employee, however not every member of the super class must be a
member of a sub class.
Type Inheritance
The type of an entity is defined by the attributes it possesses, and the relationship types it
participates in.
Because an entity in a subclass represents the same entity from the super class, it should
possess all the values for its attributes, as well as the attributes as a member of the super
class.
This means that an entity that is a member of a subclass inherits all the attributes of the entity
as a member of the super class; as well, an entity inherits all the relationships in which the
super class participates.
Employee
Work
For
Department
Secretary
Engineer
Technician
Specialization
The process of defining a set of subclasses of a super class.
The set of sub classes is based on some distinguishing characteristic of the super class.
For example, the set of sub classes for Employee, Secretary, Engineer, Technician,
differentiates among employee based on job type.
To represent a specialization, the subclasses that define a specialization are attached by lines
to a circle that represents the specialization, and is connected to the super class.
The subset symbol (half-circle) is shown on each line connecting a subclass to a super class,
indicates the direction of the super class/subclass relationship.
Attributes that only apply to the sub class are attached to the rectangle representing the
subclass. They are called specific attributes.
A sub class can also participate in specific relationship types. See Example.
Employee
Work
For
Department
Secretary
Engineer
Technician
Belongs
To
Professional
Organization
Certain attributes may apply to some but not all entities of a super class. A subclass is
defined in order to group the entities to which the attributes apply.
The second reason for using subclasses is that some relationship types may be participated in
only by entities that are members of the subclass.
Summary of Specialization
Allows for:
Defining set of subclasses of entity type
Create additional specific relationship types between each sub class and other entity types or
other subclasses.
Generalization
Several classes with common features are generalized into a super class.
For example, the entity types Car and Truck share common attributes License_PlateNo,
VehicleID and Price, therefore they can be generalized into the super class Vehicle.
The specialization may also consist of a single subclass, such as the manager specialization;
in this case we dont use the circle notation.
Types of Specializations
Predicate-defined or Condition-defined specialization
Occurs in cases where we can determine exactly the entities of each sub class by placing a
condition of the value of an attribute in the super class.
An example is where the Employee entity has an attribute, Job Type. We can specify the
condition of membership in the Secretary subclass by the condition, JobType=Secretary
Example:
The condition is a constraint specifying exactly those entities of the Employee entity type
whose attribute value for Job Type is Secretary belong to the subclass.
Predicate defined subclasses are displayed by writing the predicate condition next to the line
that connects the subclass to the specialization circle.
Attribute-defined specialization
If all subclasses in a specialization have their membership condition on the same attribute of
the super class, the specialization is called an attribute-defined specialization, and the
attribute is called the defining attribute.
Attribute-defined specializations are displayed by placing the defining attribute name next to
the arc from the circle to the super class.
User-defined specialization
When we do not have a condition for determining membership in a subclass the subclass is
called user-defined.
Membership to a subclass is determined by the database users when they add an entity to the
subclass.
Specifies that the subclass of the specialization must be disjoint, which means that an entity
can be a member of, at most, one subclass of the specialization.
Overlap means that an entity can be a member of more than one subclass of the
specialization.
Completeness Constraint
A total specialization constraint specifies that every entity in the super class must be a
member of at least one subclass of the specialization.
Total specialization is shown by using a double line to connect the super class to the circle.
A single line is used to display a partial specialization, meaning that an entity does not have
to belong to any of the subclasses.
Disjointness vs. Completeness
Disjoint constraints and completeness constraints are independent. The following possible
constraints on specializations are possible:
Disjoint, total
Department
Academic
Administrative
Employee
Disjoint, partial
Secretary
Analyst
Engineer
Overlapping, total
Part
o
Manufactured
Puchased
Overlapping, partial
Movie
Children
Comedy
Drama
Chapter-I
INTRODUCTION TO DBMS AND DATA MODELING.
End Chapter quizzes:
Circle
Ellipse
Rectangle
Square
Q2. A relationship is
(a)
(b)
(c)
(d)
an item in an application
a meaningful dependency between entities
a collection of related entities
related data
Chapter-2
RELATIONAL DATABASE MODEL
2. Introductory Concepts
Relational Database Management System
A Relational Database Management System (RDBMS) provides a complete and integrated move
towards information management.
Structures
Operations
Integrity rules
Structures consist of a collection of objects or relations that store data. An example of relation is
a table. You can store information in a table and use the table to retrieve and modify data.
Operations are used to manipulate data and structures in a database. When using operations.
You must stick to a predefined set of integrity rules.
Integrity rules are laws that govern the operations allowed on data in a database. This ensures
data accuracy and consistency.
Relational database components include:
Table
Row
Column
Field
Primary key
Foreign key
A Column is a collection of one type of data in a table. Columns represent the attributes of an
object. Each column has a column name and contains values that are bound by the same type and
size. For example, a column in the table S_DEPT specifies the names of the departments in the
organization.
A Field is an intersection of a row and a column. A field contains one data value. If there is no
data in the field, the field is said to contain a NULL value.
A Foreign key is a column or set of columns that refers to a primary key in the same table or
another table. You use foreign keys to establish principle connections between, or within, tables.
A foreign key must either match a primary key or else be NULL. Rows are connected logically
when required. The logical connections are based upon conditions that define a relationship
between corresponding values, typically between a primary key and a matching foreign key. This
relational method of linking provides great flexibility as it is independent of physical links
between records.
RDBMS Properties
An RDBMS is easily accessible. You execute commands in the Structured Query Language
(SQL) to manipulate data. SQL is the international Standards Organization (ISO) standard
language for interacting with a RDBMS.
An RDBMS provides full data independence. The organization of the data is independent of the
applications that use it. You do not need to specify the access routes to tables or know how data
is physically arranged in a database.
A relational database is a collection of individual, named objects. The basic unit of data storage
in a relational database is called a table. A table consists of rows and columns used to store
values. For access purpose, the order of rows and columns is insignificant. You can control the
access order as required.
An RDBMS enables data sharing between users. At the same time, you can ensure consistency
of data across multiple tables by using integrity constraints. An RDBMS uses various types of
data integrity constraints. These types include entity, column, referential and user-defined
constraints.
The constraint, entity, ensures uniqueness of rows, and the constraint column ensures
consistency of the type of data within a column. The other type, referential, ensures validity of
foreign keys, and user-defined constraints are used to enforce specific business rules.
An RDBMS minimizes the redundancy of data. This means that similar data is not
3. Codd's 12 rules
Codd's 12 rules are a set of twelve rules proposed by E. F. Codd, a pioneer of the relational
model for databases, designed to define what is required from a database management system in
order for it to be considered relational, i.e., an RDBMS. Codd produced these rules as part of a
personal campaign to prevent his vision of the relational database being diluted.
The system must support an online, inline, relational catalog that is accessible to authorized users
by means of their regular query language. That is, users must be able to access the database's
structure (catalog) using the same query language that they use to access the database's data.
Rule 5: The comprehensive data sublanguage rule:
The system must support at least one relational language that
o
operations
(update
as
well
as
retrieval),
security
and
integrity
constraints,
The distribution of portions of the database to various locations should be invisible to users of
the database. Existing applications should continue to operate successfully:
o
a constraint specified on more than one relation. This ensures that the consistency is maintained
across the relations.
Table A
DeptID
DeptName
DeptManager
F-1001
Financial
Nathan
S-2012
Software
Martin
H-0001
HR
Jason
Table B
EmpNo
DeptID
EmpName
1001
F-1001
Tommy
1002
S-2012
Will
1003
H-0001
Jonathan
4. Relational algebra
Relational algebra is a procedural query language, which consists of a set of operations that take
one or two relations as input and produce a new relation as their result. The fundamental
operations that will be discussed in this section are: select, project, union, and set difference.
Besides the fundamental operations, the following additional operations will be discussed: setintersection.
Each operation will be applied to tables of a sample database. Each table is otherwise known as a
relation and each row within the table is referred to as a tuple. The sample database consists of
tables in which one might see in a bank. The sample database consists of the following 6
relations:
Account
branch-name
Downtown
Mianus
Perryridge
Round Hill
Brighton
Redwood
Brighton
account-number
A-101
A-215
A-102
A-305
A-201
A-222
A-217
balance
500
700
400
350
900
700
750
Branch
branch-name
Downtown
Redwood
Perryridge
Mianus
Round Hill
Pownal
North Town
Brighton
branch-city
Brooklyn
Palo Alto
Horseneck
Horseneck
Horseneck
Bennington
Rye
Brooklyn
assets
9000000
2100000
1700000
400000
8000000
300000
3700000
7100000
Customer
customer-name
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green
customer-street
Main
North
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
customer-city
Harrison
Rye
Harrison
Rye
Pittsfield
Stamford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stamford
Depositor
customer-name
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay
account-number
A-101
A-215
A-102
A-305
A-201
A-217
A-222
Loan
branch-name
Downtown
Redwood
Perryridge
Downtown
Mianus
Round Hill
Perryridge
loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-16
amount
1000
2000
1500
1500
500
900
1300
Borrower
customer-name
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
The Select operation is a unary operation, which means it operates on one relation. Its function is
to select tuples that satisfy a given predicate. To denote selection, the lowercase Greek letter
sigma ( ) is used. The predicate appears as a subscript to
(loan)
loan-number
L-15
L-16
amount
1500
1300
Comparisons like =, , <, >, can also be used in the selection predicate. An example query using
a comparison is to find all tuples in which the amount lent is more than $1200 would be written:
amount > 1200 (loan)
The project operation is a unary operation that returns its argument relation with certain
attributes left out. Since a relation is a set, any duplicate rows are eliminated. Projection is
denoted by the Greek letter pi ( ). The attributes that wish to be appear in the result are listed as
a subscript to . The argument relation follows in parentheses. For example, the query to list all
loan numbers and the amount of the loan is written as:
Loan-number, amount (loan)
The result of the query is the following:
loan-number
L-17
L-23
L-15
amount
1000
2000
1500
L-14
L-93
L-11
L-16
1500
500
900
1300
Another more complicated example query is to find those customers who live in Harrison is
written as:
Customer-name ( customer-city = "Harrison" (customer
The union operation yields the results that appear in either or both of two relations. It is a binary
operation denoted by the symbol
An example query would be to find the name of all bank customers who have either an account
or a loan or both. To find this result we will need the information in the depositor relation and in
the borrower relation. To find the names of all customers with a loan in the bank we would write:
Customer-name (borrower)
and to find the names of all customers with an account in the bank, we would write:
Customer-name (depositor)
Then by using the union operation on these two queries we have the query we need to obtain the
wanted results. The final query is written as:
Customer-name (borrower)
The result of the query is the following:
customer-name (depositor)
customer-name
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Adams
The set intersection operation is denoted by the symbol . It is not a fundamental operation,
however it is a more convenient way to write r - (r - s).
An example query of the operation to find all customers who have both a loan and and account
can be written as:
Customer-name (borrower)
customer-name (depositor)
Set Difference Operation Set difference is denoted by the minus sign ( ). It finds tuples that are
in one relation, but not in another. Thus
but not in .
Cartesian Product Operation The Cartesian product of two relations is denoted by a cross (
written
The result of
and .
is a new relation with a tuple for each possible pairing of tuples from
),
Chapter-2
RELATIONAL DATABASE MODEL
End Chapter quizzes:
Q1. Which of the following are characteristics of an RDBMS?
a) Data are organized in a series of two-dimensional tables each of which contains records for one
entity.
b) Queries are possible on individual or groups of tables.
c) It cannot use SQL.
d) Tables are linked by common data known as keys.
Unary operation
Ternary operation
binary operation
None of the above
a) True
b) False
Q10 Union operation in relational algebra is performed on
a) Single Relation
b) Two relation
c) Both a and b
d) None
Q11. As per Codds rule NULL value is same as
a) blank space
b) Zero
c) Character string
d) None of the above.
Q12 Relational Algebra is a non procedural query language
a) True
b) False
Chapter: 3
FUNCTIONAL DEPENDENCY AND NORMALIZATION
1. Functional Dependency
Consider a relation R that has two attributes A and B. The attribute B of the relation is
functionally dependent on the attribute A if and only if for each value of A no more than one
value of B is associated. In other words, the value of attribute A uniquely determines the value of
B and if there were several tuples that had the same value of A then all these tuples will have an
identical value of attribute B. That is, if t1 and t2 are two tuples in the relation R and t1(A) =
t2(A) then we must have t1(B) = t2(B).
A and B need not be single attributes. They could be any subsets of the attributes of a relation R
(possibly single attributes). We may then write
R.A -> R.B
If B is functionally dependent on A (or A functionally determines B). Note that functional
dependency does not imply a one-to-one relationship between A and B although a one-to-one
relationship may exist between A and B.
A simple example of the above functional dependency is when A is a primary key of an entity
(e.g. student number) and A is some single-valued property or attribute of the entity (e.g. date of
birth). A -> B then must always hold.
Functional dependencies also arise in relationships. Let C be the primary key of an entity and D
be the primary key of another entity. Let the two entities have a relationship. If the relationship is
one-to-one, we must have C -> D and D -> C. If the relationship is many-to-one, we would have
C -> D but not D -> C. For many-to-many relationships, no functional dependencies hold. For
example, if C is student number and D is subject number, there is no functional dependency
between them. If however, we were storing marks and grades in the database as well, we would
have
Functional dependencies arise from the nature of the real world that the database models. Often
A and B are facts about an entity where A might be some identifier for the entity and B some
characteristic. Functional dependencies cannot be automatically determined by studying one or
more instances of a database. They can be determined only by a careful study of the real world
and a clear understanding of what each attribute means.
We have noted above that the definition of functional dependency does not require that A and B
be single attributes. In fact, A and B may be collections of attributes. For example
(sno, cno) -> (mark, date)
When dealing with a collection of attributes, the concept of full functional dependence is an
important one. Let A and B be distinct collections of attributes from a relation R end let R.A ->
R.B. B is then
fully functionally dependent on A if B is not functionally dependent on any subset of A. The
above example of students and subjects would show full functional dependence if mark and date
are not functionally dependent on either student number ( sno) or subject number ( cno) alone.
The implies that we are assuming that a student may have more than one subjects and a subject
would be taken by many different students. Furthermore, it has been assumed that there is at
most one enrolment of each student in the same subject.
The above example illustrates full functional dependence. However the following dependence
(sno, cno) -> instructor is not full functional dependence because cno -> instructor holds.
As noted earlier, the concept of functional dependency is related to the concept of candidate key
of a relation since a candidate key of a relation is an identifier which uniquely identifies a tuple
and therefore determines the values of all other attributes in the relation. Therefore any subset X
of the attributes of a relation R that satisfies the property that all remaining attributes of the
relation are functionally dependent on it (that is, on X), then X is candidate key as long as no
attribute can be removed from X and still satisfy the property of functional dependence. In the
example above, the attributes (sno, cno) form a candidate key (and the only one) since they
functionally determine all the remaining attributes.
Functional dependence is an important concept and a large body of formal theory has been
developed about it. We discuss the concept of closure that helps us derive all functional
dependencies that are implied by a given set of dependencies. Once a complete set of functional
dependencies has been obtained, we will study how these may be used to build normalised
relations.
Rules about Functional Dependencies
Let F be set of FDs specified on R
2 Normalization
Designing a database, usually a data model is translated into relational schema. The important
question is whether there is a design methodology or is the process arbitrary. A simple answer to
this question is affirmative. There are certain properties that a good database design must possess
as dictated by Codds rules. There are many different ways of designing good database. One of
such methodologies is the method involving Normalization. Normalization theory is built
around the concept of normal forms. Normalization reduces redundancy. Redundancy is
unnecessary repetition of data. It can cause problems with storage and retrieval of data. During
the process of normalization, dependencies can be identified, which can cause problems during
deletion and updation. Normalization theory is based on the fundamental notion of Dependency.
Normalization helps in simplifying the structure of schema and tables.
For example the normal forms; we will take an example of a database of the following logical
design: Relation S
{ S#, SUPPLIERNAME, SUPPLYTATUS, SUPPLYCITY}, Primary Key{S#}
Relation P { P#, PARTNAME, PARTCOLOR, PARTWEIGHT, SUPPLYCITY}, Primary
Key{P#}
Relation SP { S#, SUPPLYCITY, P#, PARTQTY}, Primary Key{S#, P#}
5F
SUPPLYCITY
Bombay
Bombay
Bombay
Bombay
Bombay
Bombay
Mumbai
Mumbai
Mumbai
Madras
Madras
Madras
P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
PARTQTY
3000
2000
4000
2000
1000
1000
3000
4000
2000
2000
3000
4000
Let us examine the table above to find any design discrepancy. A quick glance reveals that some
of the data are being repeated. That is data redundancy, which is of course an undesirable. The
fact that a particular supplier is located in a city has been repeated many times. This redundancy
causes many other related problems. For instance, after an update a supplier may be displayed to
be from Madras in one entry while from Mumbai in another. This further gives rise to many
other problems.
Therefore, for the above reasons, the tables need to be refined. This process of refinement of a
given schema into another schema or a set of schema possessing qualities of a good database is
known as Normalization. Database experts have defined a series of Normal forms each
conforming to some specified design
Decomposition. Decomposition is the process of splitting a relation into two or more relations.
This is nothing but projection process. Decompositions may or may not loose information. As
you would learn shortly, that normalization process involves breaking a given relation into one
or more relations and also that these decompositions should be reversible as well, so that no
information is lost in the process. Thus, we will be interested more with the decompositions that
incur no loss of information rather than the ones in which information is lost.
Lossless decomposition: The decomposition, which results into relations without loosing any
information, is known as lossless decomposition or nonloss decomposition. The decomposition
that results in loss of information is known as lossy decomposition.
shown below.
S
S#
SUPPLYSTATUS
SUPPLYCITY
S3
100
Madras
S5
100
Mumbai
(2)
SX
SX
S#
SUPPLYSTATUS
SY
S#
SUPPLYCITY
S3
100
S3
Madras
S5
100
S5
Mumbai
S#
SUPPLYSTATUS
SY
SUPPLYSTATUS
SUPPLYCITY
S3
100
100
Madras
S5
100
100
Mumbai
Let us examine these decompositions. In decomposition (1) no information is lost. We can still
say that S3s status is 100 and location is Madras and also that supplier S5 has 100 as its status
and location Mumbai. This decomposition is therefore lossless.
In decomposition (2), however, we can still say that status of both S3 and S5 is 100. But the
location of suppliers cannot be determined by these two tables. The information regarding the
location of the suppliers has been lost in this case. This is a lossy decomposition. Certainly,
lossless decomposition is more desirable because otherwise the decomposition will be
irreversible. The decomposition process is in fact projection, where some attributes are selected
from a table. A natural question arises here as to why the first decomposition is lossless while the
second one is lossy? How should a given relation must be decomposed so that the resulting
projections are nonlossy? Answer to these questions lies in functional dependencies and may be
given by the following theorem.
Heaths theorem: Let R {A, B, C} be a relation, where A, B and C are sets of attributes. If R
satisfies the FD A
B,
B} and
t
hen
{A,
Ri
C}.
sequalt
ot
hej
oi
nofi
t
spro
Let us apply this theorem on the decompositions described above. We observe that relation S
satisfies two irreducible sets of FDs
SUPPLYSTATUSS# SUPPLYCITY
S#
SUPPLYSTA
SUPPLYCIT
lost.
An alternative criteria for lossless decomposition is as follows. Let R be a relation schema, and
let F be a set of functional dependencies on R. let R1 and R2 form a decomposition of R. this
decomposition is a lossless-join decomposition of R if at least one of the following functional
dependencies are in F+:
R1
R2 R1
R1
R2 R2
Note
suppliers status is determined by the location of that supplier e.g. all suppliers from Madras
must have status of 100. The primary key of the relation Rel1 is {S#, P#}.
Let us discuss some of the problems with this 1NF relation. For the purpose of
illustration, let us insert some sample tuples into this relation
REL1 S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
SUPPLYSTATUS
SUPPLYCITY
200 Madras
200 Madras
200 Madras
200 Madras
200 Madras
200 Madras
100 Mumbai
100 Mumbai
100 Mumbai
200 Madras
200 Madras
200 Madras
P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
PARTQTY
3000
2000
4000
2000
1000
1000
3000
4000
2000
2000
3000
4000
The redundancies in the above relation causes many problems usually known as update
anomalies, that is in INSERT, DELETE and UPDATE operations. Let us see these problems
due to supplier-city redundancy corresponding to FD S#
SUPPLYCITY.
INSERT: In this relation, unless a supplier supplies at least one part, we cannot insert the
information regarding a supplier. Thus, a supplier located in Kolkata is missing from the relation
because he has not supplied any part so far.
DELETE: Let us see what problem we may face during deletion of a tuple. If we delete the
tuple of a supplier (if there is a single entry for that supplier), we not only delte the fact that the
supplier supplied a particular part but also the fact that the supplier is located in a particular city.
In our case, if we delete entries corresponding to S#=S2, we loose the information that the
supplier is located at Mumbai. This is definitely undesirable. The problem here is there are too
many informations attached to each tuple, therefore deletion forces loosing too many
informations.
UPDATE: If we modify the city of a supplier S1 to Mumbai from Madras, we have to make sure
that all the entries corresponding to S#=S1 are updated otherwise inconsistency will be
introduced. As a result some entries will suggest that the supplier is located at Madras while
others will contradict this fact.
SUPPLYCIT
SUPPLYCITY
SUPPLYSTA
transitive dependency. We will see that this transitive dependency gives rise to another set of
anomalies.
INSERT: We are unable to insert the fact that a particular city has a particular status until we
have some supplier actually located in that city.
DELETE: If we delete sole REL2 tuple for a particular city, we delete the information that that
city has that particular status.
UPDATE: The status for a given city still has redundancy. This causes usual redundancy
problem related to update.
RELATION 5
SUPPLYCITY
Madras
Mumbai
Mumbai
Madras
Kolkata
SUPPLYCITY
Madras
Mumbai
Kolakata
SUPPLYSTATUS
200
100
300
Evidently, the above relations RELATION 4 and RELATION5 are in 3NF, because there is no
transitive
dependencies. Every 2NF can be reduced into 3NF by decomposing it further and removing any
transitive dependency.
2.4 Boyce-Codd Normal Form
The previous normal forms assumed that there was just one candidate key in the relation and that
key was also the primary key. Another class of problems arises when this is not the case. Very
often there will be more candidate keys than one in practical database designing situation. To be
precise the 1NF, 2NF and 3NF did not deal adequately with the case of relations that had two or
more candidate keys, and that the candidate keys were composite, and they overlapped
(i.e. had at least one attribute common).
A relation is in BCNF (Boyce-Codd Normal Form) if and only if every nontrivial, leftirreducible FD has a candidate key as its determinant. Or
A relation is in BCNF if and only if all the determinants are candidate keys.
It should be noted that the BCNF definition is conceptually simpler than the old 3NF definition,
in that it makes no explicit reference to first and second normal forms as such, nor to the concept
of transitive dependence. Furthermore, although BCNF is strictly stronger than 3NF, it is still the
case that any given relation can be nonloss decomposed into an equivalent collection of BCNF
relations. Thus, relations REL 1 and REL 2 which were not in 3NF, are not in BCNF either; also
that relations REL3, REL 4, and REL5, which were in 3NF, are also in BCNF. Relation REL1
contains three determinants, namely {S#}, {SUPPLYCITY}, and {S#, P#}; of these, only {S#,
P#} is a candidate key, so REL1 is not in BCNF. Similarly, REL2 is not in BCNF either, because
the determinant {SUPPLYCITY} is not a candidate key. Relations REL 3, REL 4, and REL 5,
on the other hand, are each in BCNF, because in each case the sole candidate key is the only
determinant in the respective relations.
2.5 Comparison of BCNF and 3NF
We have seen two normal forms for relational-database schemas: 3NF and BCNF. There is an
advantage to 3NF in that we know that it is always possible to obtain a 3NF design without
sacrificing a lossless join or dependency preservation. Nevertheless, there is a disadvantage to
3NF. If we do not eliminate all transitive dependencies, we may have to use null values to
represent some of the possible meaningful relationship among data items, and there is the
problem of repetition of information. The other difficulty is the repetition of information.
If we are forced to choose between BCNF and dependency preservation with 3NF, it is generally
preferable to opt for 3NF. If we cannot test for dependency preservation efficiently, we either
pay a high penalty in system performance or risk the integrity of the data in our database. Neither
of these alternatives is attractive.
With such alternatives, the limited amount of redundancy imposed by transitive dependencies
allowed under 3NF is the lesser evil.
Thus, we normally choose to retain dependency preservation and to sacrifice BCNF.
2.6 Multi-valued dependency
Multi-valued dependency may be formally defined as:
Let R be a relation, and let A, B, and C be subsets of the attributes of R. Then we say that B is
multi-dependent on A - in symbols,
A B
read "A multi-determines B," or simply "A double arrow B") - if and only if, in every possible
legal value of R, the set of B values matching a given A value, C value pair depends only on the
A value and is independent of the C value.
It seems that the sole operation necessary or available in the further normalization process is the
replacement of a relation in a nonloss way by exactly two of its projections. This assumption has
successfully carried us as far as 4NF. It comes perhaps as a surprise, therefore, to discover that
there exist relations that cannot be nonloss-decomposed into two projections but can be nonlossdecomposed into three (or more). An unpleasant but convenient term, we will describe such a
relation as "n-decomposable" (for some n > 2) - meaning that the relation in question can be
nonloss-decomposed into n projections but not into m for any m < n.
A relation that can be nonloss-decomposed into two projections we will call "2-decomposable"
and similarly term n-decomposable may be defined.
2.8 Join Dependency:
Let R be a relation, and let A, B, Z be subsets of the attributes of R. Then we say that R satisfies
the Join Dependency (JD)
*{ A, B, ..., Z} (Read "star A, K ..., Z") if and only if every possible legal value of R is equal to
the join of its projections on A, B,..., Z.
Fifth normal form: A relation R is in 5NF - also called projection-join normal torn (PJ/NF) - if
and only if every nontrivial* join dependency that holds for R is implied by the candidate keys of
R. Let us understand what it means for a JD to be "implied by candidate keys."
Relation REL12 is not in 5NF, it satisfies a certain join dependency, namely Constraint 3D, that
is certainly not implied by its sole candidate key (that key being the combination of all of its
attributes).
Now let us understand through an example, what it means for a JD to be implied by candidate
keys. Suppose that the familiar suppliers relation REL1 has two candidate keys, {S#} and
{SUPPLIERNAME}. Then that relation satisfies several join dependencies - for example, it
satisfies the JD
*{ { S#, SUPPLIERNAME, SUPPLYSTATUS }, { S#, SUPPLYCITY } }
That is, relation REL1 is equal to the join of its projections on {S#, SUPPLIERNAME,
SUPPLYSTATUS} and {S#, SUPPLYCITY), and hence can be nonloss-decomposed into those
projections. (This fact does not mean that it should be so decomposed, of course, only that it
could be.) This JD is implied by the fact that {S#} is a candidate key (in fact it is implied by
Heath's theorem) Likewise, relation REL1 also satisfies the JD
To conclude, we note that it follows from the definition that 5NF is the ultimate normal form
with respect to projection and join (which accounts for its alternative name, projection-join
normal form). That is, a relation in 5NF is guaranteed to be free of anomalies that can be
eliminated by taking projections. For a relation is in 5NF the only join dependencies are those
that are implied by candidate keys, and so the only valid decompositions are ones that are based
on those candidate keys.
Chapter-3
FUNCTIONAL DEPENDENCY AND NORMALIZATION
End Chapter quizzes:
Q1 Normalization is step by step process of decomposing:
(e) Table
(f) Database
(g) Group Data item
(h) All of the above
Q2 A relation is said to be in 2 NF if
(i) it is in 1 NF
(ii) non-key attributes dependent on key attribute
(iii) non-key attributes are independent of one another
(iv) if it has a composite key, no non-key attribute should be dependent on
part of the composite key.
(b) i and ii
(d) i, iv
(b) i and iv
(d) ii and iv
Chapter: 4
STRUCTURE QUERY LANGUAGE
1. INTRODUCTARY CONCEPT
1.1 What is SQL?
SQL stands for Structured Query Language
SQL is an ANSI (American National Standards Institute) standard computer language for
accessing and manipulating database systems. SQL statements are used to retrieve and update
data in a database. SQL works with database programs like MS Access, DB2, Informix, MS
SQL Server, Oracle, Sybase, etc
1.2 SQL Database Tables:
A database most often contains one or more tables. Each table is identified by a name (e.g.
"Customers" or "Orders"). Tables contain records (rows) with data.
Below is an example of a table called "Persons":
LastName
FirstName
Address
City
Hansen
Ola
Timoteivn 10
Sandnes
Svendson
Tove
Borgvn 23
Sandnes
Pettersen
Kari
Storgt 20
Stavanger
The table above contains three records (one for each person) and four columns (LastName,
FirstName, Address, and City).
2. DATABASE LANGUAGE
2.1 SQL Data Definition Language (DDL)
The Data Definition Language (DDL) part of SQL permits database tables to be created or
deleted. We can also define indexes (keys), specify links between tables, and impose constraints
between database tables.
Create a Table
To create a table in a database:
CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
.......
)
Example
This example demonstrates how you can create a table named "Person", with four columns. The
column names will be "LastName", "FirstName", "Address", and "Age":
ALTER TABLE
The ALTER TABLE statement is used to add, drop and modify columns in an existing table.
You can also specify the columns for which you want to insert data:
Operator
Description
Equal
<>
Not equal
>
Greater than
<
Less than
>=
<=
BETWEEN
LIKE
"Persons" table
LastName
FirstName
Address
City
Year
Hansen
Ola
Timoteivn 10
Sandnes
1951
Svendson
Tove
Borgvn 23
Sandnes
1978
Svendson
Stale
Kaivn 18
Sandnes
1980
Pettersen
Kari
Storgt 20
Stavanger
1960
Result
LastName
FirstName
Address
City
Year
Hansen
Ola
Timoteivn 10
Sandnes
1951
Svendson
Tove
Borgvn 23
Sandnes
1978
Svendson
Stale
Kaivn 18
Sandnes
1980
A "%" sign can be used to define wildcards (missing letters in the pattern) both before and after
the pattern.
Using LIKE
The following SQL statement will return persons with first names that start with an 'O':
SELECT *
FROM Persons
WHERE FirstName LIKE 'O%'
OrderNumber
Sega
3412
ABC Shop
5678
W3Schools
2312
W3Schools
6798
Example
To display the companies in alphabetical order:
SELECT Company, OrderNumber FROM Orders
ORDER BY Company
Result:
Company
OrderNumber
ABC Shop
5678
Sega
3412
W3Schools
6798
W3Schools
2312
Example
To display the companies in alphabetical order AND the order numbers in numerical order:
Result:
Company
OrderNumber
ABC Shop
5678
Sega
3412
W3Schools
2312
W3Schools
6798
GROUP BY...
Aggregate functions (like SUM) often need an added GROUP BY functionality.
GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate
of all column values every time they are called, and without the GROUP BY function it was
impossible to find the sum for each individual group of column values.
The syntax for the GROUP BY function is:
GROUP BY Example
This "Sales" Table:
Company
Amount
W3Schools
5500
IBM
4500
W3Schools
7100
3. What is a View?
In SQL, a VIEW is a virtual table based on the result-set of a SELECT statement.
A view contains rows and columns, just like a real table. The fields in a view are fields from one
or more real tables in the database. You can add SQL functions, WHERE, and JOIN statements
to a view and present the data as if the data were coming from a single table.
Syntax
CREATE VIEW view_name AS
SELECT
FROM
WHERE
column_name(s)
table_name
condition
View is of two types updateable view and non-updateable view. Using updateable view value of
the table can be modified where as in case of non updateable view base table can not be updated.
<table>
In addition to renaming tables and indexes Oracle9i Release 2 allows the renaming of columns
and constraints on tables. In this example once the the TEST1 table is created it is renamed along
with it's columns, primary key constraint and the index that supports the primary key:
SQL> CREATE TABLE test1
(
2
Table created.
SQL> ALTER TABLE test1
ADD (
2
Table altered.
SQL> DESC
Name
test1
Null?
Type
COL2
user_constraints
3 WHERE
table_name
4 AND c
= 'TEST1'
onstraint_type = 'P';
CONSTRAINT_NAME
-----------------------------TEST1_PK
1 row selected.
SQL> SELECT index_name, column_name
2 FROM user_ind_columns
3 WHERE table_name = 'TEST1';
INDEX_NAME
COLUMN_NAME
-------------------- -------------------TEST1_PK
COL1
1 row selected.
SQL> -- Rename the table, columns, primary key
SQL> -- and supporting index.
SQL> ALTER TABLE test1 RENAME TO test;
Table altered.
SQL> ALTER TABLE test RENAME COLUMN col1 TO id;
Table altered.
SQL> ALTER TABLE test RENAME COLUMN col2 TO description;
Table altered.
SQL> ALTER TABLE test RENAME CONSTRAINT test1_pk TO test_pk;
Table altered.
SQL> ALTER INDEX test1_pk RENAME TO test_pk;
Index altered.
SQL> DESC test
Name
Null?
Type
DESCRIPTION
= 'TEST'
CONSTRAINT_NAME
-------------------TEST_PK
1 row selected.
INDEX_NAME
COLUMN_NAME
-------------------- -------------------TEST_PK
1 row selected.
ID
Chapter: 5
PROCEDURAL QUERY LANGUAGE
1. Introduction to PL/SQL
PL/SQL is a procedural extension for Oracles Structured Query Language. PL/SQL is not a
separate language rather a technology. Mean to say that you will not have a separate place or
prompt for executing your PL/SQL programs. PL/SQL technology is like an engine that executes
PL/SQL blocks and subprograms. This engine can be started in Oracle server or in application
development tools such as Oracle Forms, Oracle Reports etc.
As shown in the above figure PL/SQL engine executes procedural statements and sends SQL
part of statements to SQL statement processor in the Oracle server. PL/SQL combines the data
manipulating power of SQL with the data processing power of procedural languages.
Variables:
Variables
are
declared
in
DECLARE
section
of
PL/SQL.
DECLARE
SNO NUMBER (3);
SNAME VARCHAR2 (15);
the Block terminates abruptly with errors. Every statement in the above three sections must end
with a semicolon (;). PL/SQL blocks can be nested within other PL/SQL blocks. Comments can
be used to document code.
Hence
you
must
first
declare
the
variable
and
then
use
it.
Variables can have any SQL data type, such as CHAR, DATE, NUMBER etc or any PL/SQL
data type like BOOLEAN, BINARY_INTEGER etc.
Declaring
Variables:
Variables
are
declared
in
DECLARE
section
of
PL/SQL.
DECLARE
SNO NUMBER (3);
SNAME VARCHAR2 (15);
BEGIN
Assigning values to variables:
SNO NUMBER: = 1001;
or
SNAME: = JOHN; etc
Following screen shot explain you how to write a simple PL/SQL program and execute it
.
SET SERVEROUTPUT ON is a command used to access results from Oracle Server.
A PL/SQL program is terminated by a / . DBMS_OUTPUT is a package and PUT_LINE is a
procedure in it.
You will learn more about procedures, functions and packages in the following sections of this
tutorial.
Above program can also be written as a text file in Notepad editor and then executed as
explained in the following screen shot.
4. Control Statements
This section explains about how to structure flow of control through a PL/SQL program. The
control structures of PL/SQL are simple yet powerful. Control structures in PL/SQL can be
divided into selection:
Conditional,
Iterative and
Sequential.
4.1 Conditional Control (Selection): This structure tests a condition, depending on the
condition is true or false it decides the sequence of statements to be executed.
Example
Syntax for IF-THEN
IF THEN
Statements
END IF;
Example:
Example:
5. CURSOR
For every SQL statement execution certain area in memory is allocated. PL/SQL allows you to
name this area. This private SQL area is called context area or cursor. A cursor acts as a handle
or pointer into the context area. A PL/SQL program controls the context area using the cursor.
Cursor represents a structure in memory and is different from cursor variable. When you declare
a cursor, you get a pointer variable, which does not point any thing. When the cursor is opened,
memory is allocated and the cursor structure is created. The cursor variable now points the
cursor. When the cursor is closed the memory allocated for the cursor is released.
Cursors allow the programmer to retrieve data from a table and perform actions on that data one
row at a time. There are two types of cursors implicit cursors and explicit cursors.
5.1 Implicit cursors
For SQL queries returning single row PL/SQL declares implicit cursors. Implicit cursors are
simple SELECT statements and are written in the BEGIN block (executable section) of the
PL/SQL. Implicit cursors are easy to code, and they retrieve exactly one row. PL/SQL implicitly
declares cursors for all DML statements.
Syntax:
SELECT Ename , sal
Note: Ename and sal are columns of the table EMP and ena and esa are the variables
used to store ename and sal fetched by the query.
DECLARE
CURSOR
emp_cur
IS
SELECT
ename
FROM
EMP;
BEGIN
-----END;
Processing multiple rows is similar to file processing. For processing a file you need to open it,
process records and then close. Similarly user-defined explicit cursor needs to be opened, before
reading the rows, after which it is closed. Like how file pointer marks current position in file
processing, cursor marks the current position in the active set.
FETCH statement retrieves one row at a time. Bulk collect clause need to be used to fetch more
than one row at a time. Closing the cursor: After retrieving all the rows from active set the
cursor should be closed. Resources allocated for the cursor are now freed. Once the cursor is
closed the execution of fetch statement will lead to errors.
CLOSE <cursor-name>;
%NOTFOUND: It is a Boolean attribute, which evaluates to true, if the last fetch failed.
%FOUND: Boolean variable, which evaluates to true if the last fetch, succeeded.
3.
%ROWCOUNT: Its a numeric attribute, which returns number of rows fetched by the
cursor so far.
4.
%ISOPEN: A Boolean variable, which evaluates to true if the cursor is opened otherwise
to false.
In above example I wrote a separate fetch for each row, instead loop statement could be used
here. Following example explains the usage of LOOP.
6. Exceptions
An Exception is an error situation, which arises during program execution. When an error occurs
exception is raised, normal execution is stopped and control transfers to exception-handling part.
Exception handlers are routines written to handle the exception. The exceptions can be internally
defined (system-defined or pre-defined) or User-defined exception.
6.1 Predefined exception is raised automatically whenever there is a violation of Oracle coding
rules. Predefined exceptions are those like ZERO_DIVIDE, which is raised automatically when
we try to divide a number by zero. Other built-in exceptions are given below. You can handle
unexpected Oracle errors using OTHERS handler. It can handle all raised exceptions that are not
handled by any other handler. It must always be written as the last handler in exception block.
DUP_VAL_ON_INDEX When you try to insert a duplicate value into a unique column
value.
TOO_MANY_ROWS When a select query returns more than one row and the
occurs.
Predefined exception handlers are declared globally in package STANDARD. Hence we need
not have to define them rather just use them.
The biggest advantage of exception handling is it improves readability and reliability of the code.
Errors from many statements of code can be handles with a single handler. Instead of checking
for an error at every point we can just add an exception handler and if any exception is raised it is
handled by that.
For checking errors at a specific spot it is always better to have those statements in a separate
begin end block.
The DUP_VAL_ON_INDEX is raised when a SQL statement tries to create a duplicate value in
a column on which a primary key or unique constraints are defined.
Example: To demonstrate the exception DUP_VAL_ON_INDEX.
More than one Exception can be written in a single handler as shown below.
EXCEPTION
When
Statements;
END;
NO_DATA_FOUND
or
TOO_MANY_ROWS
then
Exception:
BEGIN
RAISE
-------
myexception;
Handling Exception:
BEGIN
--------EXCEPTION
WHEN
Statements;
END;
myexception
THEN
Points To Ponder:
Exceptions declared in a block are considered as local to that block and global to its sub-
blocks.
RAISE_APPLICATION_ERROR
To display your own error messages one can use the built-in RAISE_APPLICATION_ERROR.
They display the error message in the same way as Oracle errors. You should use a negative
number between 20000 to 20999 for the error_number and the error message
should not exceed 512 characters. The syntax to call raise_application_error is
RAISE_APPLICATION_ERROR (error_number, error_message, { TRUE | FALSE })
2. Example:
emp_rec is automatically created variable of %ROWTYPE. We have not used OPEN, FETCH ,
and CLOSE in the above example as for cursor loop does it automatically. The above example
can be rewritten as shown in the Fig , with less lines of code. It is called Implicit for Loop.
7. PL/SQL subprograms
A subprogram is a named block of PL/SQL. There are two types of subprograms in PL/SQL
namely Procedures and Functions. Every subprogram will have a declarative part, an executable
part or body, and an exception handling part, which is optional.
When client executes a procedure are function, the processing is done in the server. This reduces
network traffic. The subprograms are compiled and stored in the Oracle database as stored
programs and can be invoked whenever required. As they are stored in compiled form when
called they only need to be executed. Hence they save time needed for compilation.
Subprograms provide the following advantages
1. They allow you to write PL/SQL program that meet our need
2. They allow you to break the program into manageable modules.
3. They provide reusability and maintainability for the code.
7.1 Procedures
Procedure is a subprogram used to perform a specific action. A procedure contains two parts
specification and the body. Procedure specification begins with CREATE and ends with
procedure name or parameters list. Procedures that do not take parameters are written without a
parenthesis. The body of the procedure starts after the keyword IS or AS and ends with keyword
END.
In the above given syntax things enclosed in between angular brackets (< > ) are user
defined
and
those
enclosed
in
square
brackets
([
])
are
optional.
OR REPLACE is used to overwrite the procedure with the same name if there is any.
AUTHID clause is used to decide whether the procedure should execute with invoker (currentuser or person who executes it) or with definer (owner or person created) rights
Example
CREATE PROCEDURE MyProc
(ENO NUMBER)
AUTHID DEFINER AS
BEGIN
DELETE FROM EMP
WHERE EMPNO= ENO;
EXCEPTION
WHEN NO_DATA_FOUND THEN
DBMS_OUTPUT.PUT_LINE
(No
employee
with
this
number);
END;
Let us assume that above procedure is created in SCOTT schema (SCOTT user area) and say is
executed by user SEENU. It will delete rows from the table EMP owned by SCOTT, but not
from the EMP owned by SEENU. It is possible to use a procedure owned by one user on tables
owned by other users. It is possible by setting invoker-rights
AUTHID
CURRENT_USER
Parameters are used to pass the values to the procedure being called. There are 3 modes to be
used with parameters based on their usage. IN, OUT, and IN OUT. IN mode parameter used to
pass the values to the called procedure. Inside the program IN parameter acts like a constant. i.e
it cannot be modified. OUT mode parameter allows you to return the value from the procedure.
Inside Procedure the OUT parameter acts like an uninitialized variable. Therefore its value
cannot be assigned to another variable.
IN OUT mode parameter allows you to both pass to and return values from the subprogram.
Default mode of an argument is IN.
POSITIONAL
---
PROC1
is
name
of
the
procedure.
END;
/
Functions:
A function is a PL/SQL subprogram, which is used to compute a value. Function is same like a
procedure
except
for
the
difference
that
it
have
RETURN
clause.
Syntax for Function
Examples
Function without arguments
Chapter-5
PROCEDURAL QUERY LANGUAGE
End Chapter quizzes
Q1 Select the correct statement
c)
d)
e)
f)
Chapter: 6
TRANSACTION MANAGEMENT & CONCURRENCY
CONYROL TECHNIQUE
1.
2.
3.
4.
If no errors occurred during the execution of the transaction then the system commits the
transaction. A transaction commit operation applies all data manipulations within the scope of
the transaction and persists the results to the database. If an error occurs during the transaction,
or if the user specifies a rollback operation, the data manipulations within the transaction are not
persisted to the database. In no case can a partial transaction be committed to the database since
that would leave the database in an inconsistent state.
Internally, multi-user databases store and process transactions, often by using a transaction ID or
XID.
2. ACID properties
When a transaction processing system creates a transaction, it will ensure that the transaction
will have certain characteristics. The developers of the components that comprise the transaction
are assured that these characteristics are in place. They do not need to manage these
characteristics themselves. These characteristics are known as the ACID properties. ACID is an
acronym for atomicity, consistency, isolation, and durability.
2.1 Atomicity
The atomicity property identifies that the transaction is atomic. An atomic transaction is either
fully completed, or is not begun at all. Any updates that a transaction might affect on a system
are completed in their entirety. If for any reason an error occurs and the transaction is unable to
complete all of its steps, the then system is returned to the state it was in before the transaction
was started. An example of an atomic transaction is an account transfer transaction. The money
is removed from account A then placed into account B. If the system fails after removing the
money from account A, then the transaction processing system will put the money back into
account A, thus returning the system to its original state. This is known as a rollback, as we said
at the beginning of this chapter..
2.2 Consistency
A transaction enforces consistency in the system state by ensuring that at the end of any
transaction the system is in a valid state. If the transaction completes successfully, then all
changes to the system will have been properly made, and the system will be in a valid state. If
any error occurs in a transaction, then any changes already made will be automatically rolled
back. This will return the system to its state before the transaction was started. Since the system
was in a consistent state when the transaction was started, it will once again be in a consistent
state.
Looking again at the account transfer system, the system is consistent if the total of all accounts
is constant. If an error occurs and the money is removed from account A and not added to
account B, then the total in all accounts would have changed. The system would no longer be
consistent. By rolling back the removal from account A, the total will again be what it should be,
and the system back in a consistent state.
2.3 Isolation
When a transaction runs in isolation, it appears to be the only action that the system is carrying
out at one time. If there are two transactions that are both performing the same function and are
running at the same time, transaction isolation will ensure that each transaction thinks it has
exclusive use of the system. This is important in that as the transaction is being executed, the
state of the system may not be consistent. The transaction ensures that the system remains
consistent after the transaction ends, but during an individual transaction, this may not be the
case. If a transaction was not running in isolation, it could access data from the system that may
not be consistent. By providing transaction isolation, this is prevented from happening.
2.4 Durability
A transaction is durable in that once it has been successfully completed, all of the changes it
made to the system are permanent. There are safeguards that will prevent the loss of information,
even in the case of system failure. By logging the steps that the transaction performs, the state of
the system can be recreated even if the hardware itself has failed. The concept of durability
allows the developer to know that a completed transaction is a permanent part of the system,
regardless of what happens to the system later on.
transaction 1
Read tr (y)
transaction 2
Write tr (y)
transaction 2
Read tr(y)
transaction 1
Write tr(x)
transaction 1
Abort
transaction 1
3.2 Conflicting operations: Two operations in a schedule are said to be in conflict if they satisfy
these conditions
i)
ii)
iii)
The operations listed in S are exactly the same operations as in T1, T2 Tn,
including the commit or abort operations. Each transaction is terminated by either a
commit or an abort operation.
ii)
The operations in any transaction. Ti appear in the schedule in the same order in
which they appear in the Transaction.
iii)
Whenever there are conflicting operations, one of two will occur before the other in
the schedule.
A Partial order of the schedule is said to occur, if the first two conditions of the complete
schedule are satisfied, but whenever there are non conflicting operations in the schedule, they
can occur without indicating which should appear first.
This can happen because non conflicting operations any way can be executed in any order
without affecting the actual outcome.
However, in a practical situation, it is very difficult to come across complete schedules. This is
because new transactions keep getting included into the schedule. Hence, often one works with a
committed projection C(S) of a schedule S. This set includes only those operations in S that
have committed transactions i.e. transaction Ti whose commit operation Ci is in S.
Put in simpler terms, since non committed operations do not get reflected in the actual outcome
of the schedule, only those transactions, who have completed their commit operations, contribute
to the set and this schedule is good enough in most cases.
3.4 Schedules and Recoverability :
Recoverability is the ability to recover from transaction failures. The success or otherwise of
recoverability depends on the schedule of transactions. If fairly straightforward operations
without much interleaving of transactions are involved, error recovery is a straight forward
process. On the other hand, if lot of interleaving of different transactions have taken place, then
recovering from the failure of any one of these transactions could be an involved affair. In
certain cases, it may not be possible to recover at all. Thus, it would be desirable to characterize
the schedules based on their recovery capabilities.
To do this, we observe certain features of the recoverability and also of schedules. To begin
with, we note that any recovery process, most often involves a roll back operation, wherein
the operations of the failed transaction will have to be undone. However, we also note that
the roll back need to go only as long as the transaction T has not committed. If the
transaction T has committed once, it need not be rolled back. The schedules that satisfy this
criterion are called recoverable schedules and those that do not, are called nonrecoverable schedules. As a rule, such non-recoverable schedules should not be permitted.
Formally, a schedule S is recoverable if no transaction T which appears is S commits, until all
transactions T1 that have written an item which is read by T have committed. The concept is a
simple one. Suppose the transaction T reads an item X from the database completes its
operations (based on this and other values) and commits the values. i.e. the output values of T
become permanent values of database.
But suppose, this value X is written by another transaction T (before it is read by T), but
aborts after T has committed. What happens? The values committed by T are no more valid,
because the basis of these values (namely X) itself has been changed. Obviously T also needs to
be rolled back (if possible), leading to other rollbacks and so on.
The other aspect to note is that in a recoverable schedule, no committed transaction needs to be
rolled back. But, it is possible that a cascading roll back scheme may have to be effected, in
which an uncommitted transaction has to be rolled back, because it read from a value contributed
by a transaction which later aborted. But such cascading rollbacks can be very time consuming
because at any instant of time, a large number of uncommitted transactions may be operating.
Thus, it is desirable to have cascadeless schedules, which avoid cascading rollbacks.
This can be ensured by ensuring that transactions read only those values which are written by
committed transactions i.e. there is no fear of any aborted or failed transactions later on. If the
schedule has a sequence wherein a transaction T1 has to read a value X by an uncommitted
transaction T2, then the sequence is altered, so that the reading is postponed, till T2 either
commits or aborts.
It may be noted that the recoverable schedule, cascadeless schedules and strict schedules each is
more stringent than its predecessor. It facilitates the recovery process, but sometimes the
process may get delayed or even may become impossible to schedule.
4 Serializability
Given two transaction T1 and T2 are to be scheduled, they can be scheduled in a number of ways.
The simplest way is to schedule them without in that bothering about interleaving them. i.e.
schedule all operation of the transaction T1 followed by all operations of T2 or alternatively
schedule all operations of T2 followed by all operations of T1.
T1
T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Time
read_tr(X)
X=X+P
Write_tr(X)
T1
T2
read_tr(X)
X=X+P
Write_tr(X)
read_tr(X)
X=X+N
write_tr(X )
read_tr(Y)
Y=Y+N
Write_tr(Y)
These now can be termed as serial schedules, since the entire sequence of operation in one
transaction is completed before the next sequence of transactions is started.
In the interleaved mode, the operations of T1 are mixed with the operations of T2. This can be
done in a number of ways. Two such sequences are given below:
T1
T2
read_tr(X )
X=X+N
read_tr(X)
X=X+P
write_tr(X)
read_tr(Y)
Write_tr(X)
Y=Y+N
Write_tr(Y)
Interleaved (non-serial schedule): C
T2
T1
read_tr(X)
X=X+N
write_tr(X)
read_tr(X)
X=X+P
Write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Formally a schedule S is serial if, for every transaction, T in the schedule, all operations of T are
executed consecutively, otherwise it is called non serial. In such a non-interleaved schedule, if
the transactions are independent, one can also presume that the schedule will be correct, since
each transaction commits or aborts before the next transaction begins.
As long as the
transactions individually are error free, such sequences of events are guaranteed to give correct
results.
The problem with such a situation is the wastage of resources. If in a serial schedule, one
of the transactions is waiting for an I/O, the other transactions also cannot use the system
resources and hence the entire arrangement is wasteful of resources. If some transaction T is
very long, the other transaction will have to keep waiting till it is completed. Moreover, wherein
hundreds of machines operate concurrently becomes unthinkable. Hence, in general, the serial
scheduling concept is unacceptable in practice.
However, once the operations are interleaved, so that the above cited problems are overcome,
unless the interleaving sequence is well thought of, all the problems that we encountered in the
beginning of this block become addressable. Hence, a methodology is to be adopted to find out
which of the interleaved schedules give correct results and which do not.
A schedule S of N transactions is serializable if it is equivalent to some serial schedule
of the some N transactions. Note that there are n different serial schedules possible to
be made out of n transaction. If one goes about interleaving them, the numbers of
possible combinations become unmanageably high. To ease our operations, we form
two disjoint groups of non serial schedules- these non serial schedules that are
equivalent to one or more serial schedules, which we call serializable schedules and
those that are not equivalent to any serial schedule and hence are not serializable once
a non-serial schedule is serializable, it becomes equivalent to a serial schedule and by
our previous definition of serial schedule will become a correct schedule. But now can
one prove the equivalence of a non-serial schedule to a serial schedule?
The simplest and the most obvious method to conclude that two such schedules are
equivalent is to find out their results. If they produce the same results, then they can be
considered equivalent. i.e. it two schedules are result equivalent, then they can be considered
equivalent. But such an oversimplification is full of problems. Two sequences may produce the
same set of results of one or even a large number of initial values, but still may not be equivalent.
Consider the following two sequences:
S1
S2
read_tr(X)
read_tr(X)
X=X+X
X=X*X
write_tr(X)
Write_tr(X)
For a value X=2, both produce the same result. Can be conclude that they are equivalent?
Though this may look like a simplistic example, with some imagination, one can always come
out with more sophisticated examples wherein the bugs of treating them as equivalent are less
obvious. But the concept still holds -result equivalence cannot mean schedule equivalence. One
more refined method of finding equivalence is available. It is called conflict equivalence.
Two schedules can be said to be conflict equivalent, if the order of any two conflicting
operations in both the schedules is the same (Note that the conflicting operations essentially
belong to two different transactions and if they access the same data item, and atleast one of
them is a write _tr(x) operation). If two such conflicting operations appear in different orders in
different schedules, then it is obvious that they produce two different databases in the end and
hence they are not equivalent.
4.1 Testing for conflict serializability of a schedule:
We suggest an algorithm that tests a schedule for conflict serializability.
1. For each transaction Ti, participating in the schedule S, create a node labeled T1 in the
precedence graph.
2. For each case where Tj executes a readtr(x) after Ti executes write_tr(x), create an
edge from Ti to Tj in the precedence graph.
3. For each case where Tj executes write_tr(x) after Ti executes a read_tr(x), create an
edge from Ti to Tj in the graph.
4. For each case where Tj executes a write_tr(x) after Ti executes a write_tr(x), create
an edge from Ti to Tj in the graph.
5. The schedule S is serialisable if and only if there are no cycles in the graph.
If we apply these methods to write the precedence graphs for the four cases of section 4,
we get the following precedence graphs.
X
T1
T2
T1
T2
X
Schedule A
Schedule B
X
T1
T2
T1
T2
Schedule C
Schedule D
The same set of transactions participates in S and S1 and S and S1 include the
same operations of those transactions.
ii)
For any operation ri(X) of Ti in S, if the value of X read by the operation has been
written by an operation wj(X) of Tj(or if it is the original value of X before the
schedule started) the same condition must hold for the value of x read by
operation ri(X) of Ti in S1.
iii)
If the operation Wk(Y) of Tk is the last operation to write, the item Y in S, then
Wk(Y) of Tk must also be the last operation to write the item y in S1.
The concept being view equivalent is that as long as each read operation of the
transaction reads the result of the same write operation in both the schedules, the
write operations of each transaction must produce the same results. Hence, the
read operations are said to see the same view of both the schedules. It can easily
be verified when S or S1 operate independently on a database with the same initial
state, they produce the same end states.
But the main problem with view serializability is that it is extremely complex
computationally and there is no efficient algorithm to do the same.
But all is not well yet. The scheduling process is done by the operating system routines
after taking into account various factors like system load, time of transaction submission,
priority of the process with reference to other process and a large number of other factors.
Also since a very large number of possible interleaving combinations are possible, it is
extremely difficult to determine before hand the manner in which the transactions are
interleaved.
In other words getting the various schedules itself is difficult, let alone
Hence, instead of generating the schedules, checking them for serializability and then
using them, most DBMS protocols use a more practical method impose restrictions on
the transactions themselves. These restrictions, when followed by every participating
transaction, automatically ensure serializability in all schedules that are created by these
participating schedules.
Also, since transactions are being submitted at different times, it is difficult to determine
when a schedule begins and when it ends. Hence serializability theory can be used to
deal with the problem by considering only the committed projection C(CS) of the
schedule. Hence, as an approximation, we can define a schedule S as serializable if its
committed C(CS) is equivalent to some serial schedule.
little more specific. Suppose we are considering the number of reservations in a particular train
of a particular date. Two persons at two different places are trying to reserve for this train. By
the very definition of concurrency, each of them should be able to perform the operations
irrespective of the fact that the other person is also doing the same. In fact they will not even
know that the other person is also booking for the same train. The only way of ensuring the
same is to make available to each of these users their own copies to operate upon and finally
update the master database at the end of their operation.
Now suppose there are 10 seats are available. Both the persons, say A and B want to get
this information and book their seats. Since they are to be accommodated concurrently, the
system provides them two copies of the data. The simple way is to perform a read tr (X) so that
the value of X is copied on to the variable X of person A (let us call it XA) and of the person B
(XB). So each of them know that there are 10 seats available.
Suppose A wants to book 8 seats. Since the number of seats he wants is (say Y) less than
the available seats, the program can allot him the seats, change the number of available seats (X)
to X-Y and can even give him the seat numbers that have been booked for him.
The problem is that a similar operation can be performed by B also. Suppose he needs 7
seats. So, he gets his seven seats, replaces the value of X to 3 (10 7) and gets his reservation.
The problem is noticed only when these blocks are returned to main database
(the
5.1 The lost update problem: This problem occurs when two transactions that access the same
database items have their operations interleaved in such a way as to make the value of some
database incorrect. Suppose the transactions T1 and T2 are submitted at the (approximately)
same time. Because of the concept of interleaving, each operation is executed for some period of
time and then the control is passed on to the other transaction and this sequence continues.
Because of the delay in updating, this creates a problem. This was what happened in the
previous example. Let the transactions be called TA and TB.
TA
TB
Read tr(X)
Read tr(X)
Time
X = X NA
X = X - NB
Write tr(X)
Write tr(X)
Note that the problem occurred because the transaction TB failed to record the
transactions TA. I.e. TB lost on TA. Similarly since TA did the writing later on, TA lost the
updating of TB.
This happens when a transaction TA updates a data item, but later on (for some reason) the
transaction fails. It could be due to a system failure or any other operational reason or the system
may have later on noticed that the operation should not have been done and cancels it. To be
fair, it also ensures that the original value is restored.
But in the meanwhile, another transaction TB has accessed the data and since it has no indication
as to what happened later on, it makes use of this data and goes ahead. Once the original value is
restored by TA, the values generated by TB are obviously invalid.
TA
TB
Read tr(X)
Time
X=XN
Write tr(X)
Read tr(X)
X=X-N
Write tr(X)
Failure
X=X+N
Write tr(X)
The value generated by TA out of a non-sustainable transaction is a dirty data which is read by
TB, produces an illegal value. Hence the problem is called a dirty read problem.
5.3 The Incorrect Summary Problem: Consider two concurrent operations, again called TA and
TB. TB is calculating a summary (average, standard deviation or some such operation) by
accessing all elements of a database (Note that it is not updating any of them, only is reading
them and is using the resultant data to calculate some values). In the meanwhile TA is updating
these values. In case, since the Operations are interleaved, TA, for some of its operations will be
using the not updated data, whereas for the other operations will be using the updated data. This
is called the incorrect summary problem.
TA
TB
Sum = 0
Read tr(A)
Sum = Sum + A
Read tr(X)
X=XN
Write tr(X)
Read tr(X)
Sum = Sum + X
Read tr(Y)
Sum = Sum + Y
Read (Y)
Y=YN
Write tr(Y)
In the above example, both TA will be updating both X and Y. But since it first updates X
and then Y and the operations are so interleaved that the transaction TB uses both of them in
between the operations, it ends up using the old value of Y with the new value of X. In the
process, the sum we got does not refer either to the old set of values or to the new set of
values.
Many of the important techniques for concurrency control make use of the concept of the lock. A
lock is a variable associated with a data item that describes the status of the item with respect to
the possible operations that can be done on it. Normally every data item is associated with a
unique lock. They are used as a method of synchronizing the access of database items by the
transactions that are operating concurrently. Such controls, when implemented properly can
overcome many of the problems of concurrent operations listed earlier. However, the locks
themselves may create a few problems, which we shall be seeing in some detail in subsequent
sections.
become a disadvantage. It is obvious that more than one transaction should not go on
writing into X or while one transaction is writing into it, no other transaction should be
reading it, no harm is done if several transactions are allowed to simultaneously read
the item. This would save the time of all these transactions, without in anyway affecting
the performance.
This concept gave rise to the idea of shared/exclusive locks. When only read operations are
being performed, the data item can be shared by several transactions, only when a transaction
wants to write into it that the lock should be exclusive. Hence the shared/exclusive lock is also
sometimes called multiple mode lock. A read lock is a shared lock (which can be used by
several transactions), whereas a write lock is an exclusive lock. So, we need to think of three
operations, a read lock, a write lock and unlock. The algorithms can be as follows:
read locked,
No of reads(X)
}
else if Lock(X) = read locked
then no. of reads(X) = no of reads(X)0+1;
else { wait until Lock(X)
Write lock(X)
unlocked.
Unlock(X)
If lock(X) = write locked
Then {Lock(X)
unlocked
no of reads 1
if no of reads(X)=0
then { Lock(X) = unlocked
wake up one of the waiting transactions, if any
}
}
3. A transaction T must issue the operation unlock (X) after all readtr(X) are completed
in T.
4. A transaction T will not issue a read lock(X) operation if it already holds a readlock
or write lock on X.
5. A transaction T will not issue a write lock(X) operation if it already holds a readlock
or write lock on X.
6.1.3 Two phase locking:
A transaction is said to be following a two phase locking if the operation of the
transaction can be divided into two distinct phases. In the first phase, all items that are
needed by the transaction are acquired by locking them. In this phase, no item is
unlocked even if its operations are over. In the second phase, the items are unlocked
one after the other. The first phase can be thought of as a growing phase, wherein the
store of locks held by the transaction keeps growing. The second phase is called the
shrinking phase, the no. of locks held by the transaction keep shrinking.
readlock(Y)
readtr(Y)
Phase I
writelock(X)
----------------------------------unlock(Y)
readtr(X)
Phase II
X=X+Y
writetr(X)
unlock(X)
Example: A two phase locking
The two phase locking, though provides serializability has a disadvantage. Since the
locks are not released immediately after the use of the item is over, but is retained till all
the other needed locks are also acquired, the desired amount of interleaving may not be
derived worse, while a transaction T may be holding an item X, though it is not using
it, just to satisfy the two phase locking protocol, another transaction T1 may be
genuinely needing the item, but will be unable to get it till T releases it. This is the price
that is to be paid for the guaranteed serializability provided by the two phase locking
system.
that. It can be easily seen that this is an infinite wait and the dead lock will never get resolved.
T11
T21
readlock(Y)
T11
T21
readtr(Y)
readlock(X)
readtr(X)
writelock(X)
writelock(Y)
and is not releasing other items held by it. The solution is to develop a protocol wherein a
transaction will first get all the items that it needs & then only locks them. I.e. if it cannot get any
one/more of the items, it does not hold the other items also, so that these items can be useful to
any other transaction that may be needing them. Their method, though prevents deadlocks,
further limits the prospects of concurrency.
A better way to deal with deadlocks is to identify the deadlock when it occurs and then
take some decision. The transaction involved in the deadlock may be blocked or aborted or the
transaction can preempt and abort the other transaction involved. In a typical case, the concept
of transaction time stamp TS (T) is used. Based on when the transaction was started, (given by
the time stamp, larger the value of TS, younger is the transaction), two methods of deadlock
recovery are devised.
1.
so because X is locked by Tj with a conflicting lock. Then if TS(Ti)<TS(Tj), (Ti is older then
Tj) then Ti waits. Otherwise (if Ti is younger than Tj) then Ti is aborted and restarted later with
the same time stamp. The policy is that the older of the transactions will have already spent
sufficient efforts & hence should not be aborted.
2.
Wound-wait method: If TS(Ti) <TS(Tj), (Ti is older then Tj), abort and restart Tj
with the same time stamp later. On the other hand, if Ti is younger then Ti is allowed to wait.
It may be noted that in both cases, the younger transaction will get aborted. But the actual
method of aborting is different. Both these methods can be proved to be deadlock free, because
no cycles of waiting as seen earlier are possible with these arrangements.
There is another class of protocols that do not require any time stamps. They include the no
waiting algorithm and the cautious waiting algorithms. In the no-waiting algorithm, if a
transaction cannot get a lock, it gets aborted immediately (no-waiting). It is restarted again at a
later time. But since there is no guarantee that the new situation. is dead lock free, it may have to
aborted again. This may lead to a situation where a transaction may end up getting aborted
repeatedly.
To overcome this problem, the cautious waiting algorithm was proposed. Here, suppose the
transaction Ti tries to lock an item X, but cannot get X since X is already locked by another
transaction Tj. Then the solution is as follows: If Tj is not blocked (not waiting for same other
locked item) then Ti is blocked and allowed to wait. Otherwise Ti is aborted. This method not
only reduces repeated aborting, but can also be proved to be deadlock free, since out of Ti & Tj,
only one is blocked, after ensuring that the other is not blocked.
6.2.2
The second method of dealing with deadlocks is to detect deadlocks as and when they happen.
The basic problem with the earlier suggested protocols is that they assume that we know what is
happening in the system which transaction is waiting for which item and so on. But in a
typical case of concurrent operations, the situation is fairly complex and it may not be possible to
predict the behavior of transaction.
In such cases, the easier method is to take on deadlocks as and when they happen and try
to solve them. A simple way to detect a deadlock is to maintain a wait forgraph. One node in
the graph is created for each executing transaction. Whenever a transaction Ti is waiting to lock
an item X which is currently held by Tj, an edge (TiTj) is created in their graph. When Tj
releases X, this edge is dropped. It is easy to see that whenever there is a deadlock situation,
there will be loops formed in the wait-for graph, so that suitable corrective action can be taken.
Again, once a deadlock has been detected, the transaction to be aborted is to be chosen. This is
called the victim selection and generally newer transactions are selected for victimization.
Another easy method of dealing with deadlocks is the use of timeouts. Whenever a
transaction is made to wait for periods longer than a predefined period, the system assumes that a
deadlock has occurred and aborts the transaction. This method is simple & with low overheads,
but may end up removing the transaction, even when there is no deadlock.
6.3 Starvation:
The other side effect of locking in starvation, which happens when a transaction cannot proceed
for indefinitely long periods, though the other transactions in the system, are continuing
normally. This may happen if the waiting schemes for locked items is unfair. I.e. if some
transactions may never be able to get the items, since one or the other of the high priority
transactions may continuously be using them. Then the low priority transaction will be forced to
starve for want of resources.
The solution to starvation problems lies in choosing proper priority algorithms like first-comefirst serve. If this is not possible, then the priority of a transaction may be increased every time it
is made to wait / aborted, so that eventually it becomes a high priority transaction and gets the
required services.
6.4.2 An algorithm for ordering the time stamp: The basic concept is to order the transactions
based on their time stamps. A schedule made of such transactions is then serializable. This
concept is called the time stamp ordering (To). The algorithm should ensure that whenever a
data item is accessed by conflicting operations in the schedule, the data is available to them in
the serializability order. To achieve this, the algorithm uses two time stamp values.
1. Read_Ts (X): This indicates the largest time stamp among the transactions that have
successfully read the item X. Note that the largest time stamp actually refers to the
youngest of the transactions in the set (that has read X).
2. Write_Ts(X): This indicates the largest time stamp among all the transactions that have
successfully written the item-X. Note that the largest time stamp actually refers to the
youngest transaction that has written X.
The above two values are often referred to as read time stamp and write time stamp of the
item X.
6.4.3 The concept of basic time stamp ordering: When ever a transaction tries to read or write
an item X, the algorithm compares the time stamp of T with the read time stamp or the write
stamp of the item X, as the case may be. This is done to ensure that T does not violate the order
of time stamps. The violation can come in the following ways.
1. Transaction T is trying to write X
a) If read TS(X) > Ts(T) or if write Ts (X) > Ts (T) then abort and roll back T and
reject the operation. In plain words, if a transaction younger than T has already
read or written X, the time stamp ordering is violated and hence T is to be aborted
and all the values written by T so far need to be rolled back, which may also
involve cascaded rolling back.
b) If read TS(X) < TS(T) or if write Ts(X) < Ts(T), then execute the write tr(X)
operation and set write TS(X) to TS(T). i.e. allow the operation and the write time
stamp of X to that of T, since T is the latest transaction to have accessed X.
(so that recoverability is enhanced) and serializable. In this case, any transaction T that tries to
read or write such that write TS(X) < TS(T) is made to wait until the transaction T that
originally wrote into X (hence whose time stamp matches with the writetime time stamp of X,
i.e. TS(T) = write TS(X)) is committed or aborted. This algorithm also does not cause any dead
lock, since T waits for T only if TS(T) > TS(T).
Read TS(Xi): the read timestamp of Xi indicates the largest of all time stamps of
transactions that have read Xi. (This, in plain language means the youngest of the
transactions which has read it).
ii)
Write TS(Xi) :
Whenever a transaction T writes into X, a new version XK+1 is created, with both write.
TS(XK+1) and read TS(Xk+1) being set to TS(T). Whenever a transaction T reads into X, the value
of read TS(Xi) is set to the larger of the two values namely read TS(Xi) and TS(T).
To ensure serializability, the following rules are adopted.
i) If T issues a write tr(X) operation and Xi has the highest write TS(Xi) which is less than or
equal to TS(T), and has a read TS(Xi) >TS(T), then abort and roll back T, else create a new
version of X, say Xk with read TS(Xk) = write TS(Xk) = TS(T)
In plain words, if the highest possible write timestamp among all versions is less than or
equal to that of T, and if it has been read by a transaction younger than T, then, we have no
option but to abort T and roll back all its effects otherwise a new version of X is created with
its read and write timestamps initiated to that of T.
ii)
If a transaction T issues a read tr(X) operation, find the version Xi with the highest write
TS(Xi) that is also less than or equal to TS(T) then return the value of Xi to T and set the value
of read TS(Xi) to the value that is larger amongst TS(T) and current read TS(Xi).
This only means, try to find the highest version of Xi that T is eligible to read, and return
its value of X to T. Since T has now read the value find out whether it is the youngest
transaction to read X by comparing its timestamp with the current read TS stamp of X. If X is
younger (if timestamp is higher), store it as the youngest timestamp to visit X, else retain the
earlier value.
6.5.2
interleaved. This concept is extended to the multiversion locking system by using what
are known as multiple-mode locking schemes. In this, there are three locking modes
for the item : read, write and certify. I.e. a unit can be locked for read(X), write(x) or
certify(X), as also it can remain unlocked. To see how the scheme works, we first see
how the normal read, write system works by means of a lock compatibility table.
Lock compatibility Table
Read
Read
Yes
Write
No
Write
No
No
Write
Certify
Read
Yes
Yes
No
Write
Yes
No
No
Certify
No
No
No
The multimode locking system works on the following lines. When one of the transactions has
obtained a write lock for a data item, the other transactions may still be provided with the read
locks for the item. To ensure this, two versions of the X are maintained. X(old) is a version
which has been written and committed by a previous transaction. When a transaction T wants a
write lock to be provided to it, a new version X(new) is created and handed over to T for writing.
While T continues to hold the lock for X(new) other transactions can continue to use X(old)
under read lock.
Once T is ready to commit it should get exclusive certify locks on all items it wants to
commit by writing. Note that write lock is no more an exclusive lock under our new scheme
of things, since while one transaction is holding a write lock on X, one/more other transactions
may be holding the read locks of the same X. To provide certify lock, the system waits till all
other read locks are cleared on the item. Note that this process has to repeat on all items that T
wants to commit.
Once all these items are under the certify lock of the transaction, it can commit to its
values. From now on, the X(new) become X(old) and X(new) values will be created only if
another T wants a write lock on X. This scheme avoids cascading rollbacks. But since a
transaction will have to get exclusive certify rights on all items, before it can commit, a delay in
the commit operation is inevitable. This may also leads to complexities like dead locks and
starvation.
Chapter: 6
TRANSACTION MANAGEMENT & CONCURRENCY CONYROL
TECHNIQUE
End Chapter quizzes
Q1. The sequence of operations on the database is called
a) Schedule
b) Database Recovery
c) Locking
d) View
Q2. Two operations in a schedule are said to be in conflict if they satisfy the conditions
a) The operations belong to different transactions
b) They access the same item x
c) At least one of the operations is a write operation.
d) All of the above.
Q3.
Chapter: 7
DATABASE RECOVEY, BACKUP & SECURITY
A System Crash: A hardware, software or network error can make the completion
of the transaction impossibility.
ii)
iii)
Some programs provide for the user to interrupt during execution. If the user
changes his mind during execution, (but before the transactions are complete) he
may opt out of the operation.
iv)
Local exceptions: Certain conditions during operation may force the system to
raise what are known as exceptions. For example, a bank account holder may
not have sufficient balance for some transaction to be done or special instructions
might have been given in a bank transaction that prevents further continuation of
the process. In all such cases, the transactions are terminated.
v)
The other reasons can be physical problems like theft, fire etc or system problems like
disk failure, viruses etc. In all such cases of failure, a recovery mechanism is to be in
place.
1.2 Database Recovery
Recovery most often means bringing the database back to the most recent consistent state, in the
case of transaction failures. This obviously demands that status information about the previous
consistent states are made available in the form a log (which has been discussed in one of the
previous sections in some detail).
A typical algorithm for recovery should proceed on the following lines.
1. If the database has been physically damaged or there are catastrophic crashes like disk
crash etc, the database has to be recovered from the archives.
In many cases, a
the no-undo /Redo algorithm. The whole concept works only when the system is
working on a deferred update mode.
However, this may not be the case always. In certain situations, where the system is
working on the immediate update mode, the transactions keep updating the database
without bothering about the commit operation. In such cases however, the updating will
be normally onto the disk also. Hence, if a system fails when the immediate updating are
being made, then it becomes necessary to undo the operations using the disk entries. This
will help us to reach the previous consistent state. From there onwards, the transactions
will have to be redone. Hence, this method of recovery is often termed as the Undo/Redo
algorithm.
2. Role of check points in recovery:
A Check point, as the name suggests, indicates that everything is fine up to the point.
In a log, when a check point is encountered, it indicates that all values up to that have been
written back to the DBMS on the disk. Any further crash / system failure will have to take care
of the data appearing beyond this point only. Put the other way, all transactions that have their
commit entries in the log before this point need no rolling back.
The recovery manager of the DBMS will decide at what intervals, check points need to be
inserted (in turn, at what intervals data is to be written back to the disk). It can be either after
specific periods of time (say M minutes) or specific number of transaction (t transactions) etc.,
When the protocol decides to check point it does the following:-
The force writing need not only refer to the modified data items, but can include the various lists
and other auxiliary information indicated previously.
However, the force writing of all the data pages may take some time and it would be wasteful to
halt all transactions until then. A better way is to make use of the Fuzzy check pointing where
in the check point is inserted and while the buffers are being written back (beginning from the
previous check point) the transactions are allowed to restart. This way the i/o time is saved.
Until all data up to the new check point is written back, the previous check point is held valid for
recovery purposes.
3 Write ahead logging:
When updating is being used, it is necessary to maintain a log for recovery purposes. Normally
before the updated value is written on to the disk, the earlier value (called Before Image Value
(BFIM)) is to noted down elsewhere in the disk for recovery purposes. This process of recording
entries is called the write ahead logging (write ahead of logging). It is to be noted that the
type of logging also depends on the type of recovery. If no undo / Redo type of recovery is
being used, then only those values which could not be written back before the crash, need to be
logged. But in a undo / Redo types, the values before the image was created as well as those that
were computed, but could not be written back need to be logged.
Two other update mechanisms need brief mention. The cache pages, updated by the transaction,
cannot be written back to the disk, by the DBMS manager, until and unless the transaction
commits. If the system strictly follows this approach, then it is called a no steal approach.
However, in some cases, the protocol allows the writing of the updated buffer back to the disk,
even before the transaction commits.
Secondly, if all pages are updated once the transaction commits, then it is a force approach,
otherwise it is called a no force approach.
Most protocols make use of steal / no force strategies, so that there is no urgency of writing back
to the buffer once the transaction commits.
However, just the before image (BIM) and After image (AIM) values may not be sufficient for
successful recovery. A number of lists, including the list of active transaction (those that have
started operating, but have not committed yet), committed transactions as also aborted
transactions need to be maintained, to avoid a brute force method of recovery.
However, in practice, most transactions are very long and it is dangerous us to hold all their
updates in the buffer, since the buffers can run out of space and may need a page replacement.
To avoid such situations, where in a page is removed inadvertently, a simple two pronged
protocol is used.
1. A transaction cannot change the DBMS values on the disk until it commits.
2. A transaction does not reach commit stage until all its update values are written on to the
log and log itself in force written on to the disk.
Notice that in case of failures, recovery is by the No UNDO/REDO techniques, since all data
will be in the log if a transaction fails after committing.
4.1 An algorithm for recovery using the deferred update in a single user environment.
In a single user entrainment, the algorithm is a straight application of the REDO
procedure i.e. it uses two lists of transactions: The committed transactions since the last check
point and the currently active transactions when the crash occurs, apply the REDO to all write tr
operations of the committed transactions from the log. And let the active transactions run again.
The assumption is that the REDO operations are idem potent. I.e. the operations produce the
same results irrespective of the number of times they are redone provided, they start from the
same initial state. This is essential to ensure that the recovery operation does not produce a result
that is different from the case where no crash was there to begin with.
(Through this may look like a trivial constraint, students may verify themselves that not all
DBMS applications satisfy this condition).
Also since there was only one transaction active (because it was a single user system)
and it had not updated the buffer yet, all that remains to be done is to restart this
transaction.
To simplify the matters, we presume that we are in talking of strict and serializable
schedules. I.e. there is strict two phase locking and they remain effective till the
transactions commit themselves. In such a scenario, an algorithm for recovery could
be as follows:-
Use two lists: The list of committed transactions T since the last check point and the list of active
checkpoints T1 REDO all the write operations of committed transactions in the order in which
they were written into the log. The active transactions are simply cancelled and resubmitted.
Note that once we put the strict serializability conditions, the recovery process does not
vary too much from the single user system.
Note that in the actual process, a given item x may be updated a number of times, either
by the same transaction or by different transactions at different times. What is important to the
user is its final value. However, the above algorithm simply updates the value whenever its
value was updated in the log. This can be made more efficient by the following manner. Instead
of starting from the check point and proceeding towards the time of the crash, traverse the log
from the time of the crash backwards. Whenever a value is updated, for the first time, update it
and maintain the information that its value has been updated. Any further updating of the same
can be ignored.
This method though guarantees correct recovery has some drawbacks. Since the items
remain locked with the transactions until the transaction commits, the concurrent execution
efficiency comes down. Also lot of buffer space is wasted to hold the values, till the transactions
commit. The number of such values can be large, when the long transactions are working in
concurrent mode, they delay the commit operation of one another.
5.1 A typical UNDO/REDO algorithm for a immediate update single user environment
Here, at the time of failure, the changes envisaged by the transaction may have
already been recorded in the database. These must be undone. A typical procedure
for recovery should follow the following lines:
a) The system maintains two lists: The list of committed transactions since the last
checkpoint and the list of active transactions (only one active transaction, infact,
because it is a single user system).
b) In case of failure, undo all the write_tr operations of the active transaction, by using
the information on the log, using the UNDO procedure.
6. Shadow paging
It is not always necessary that the original database is updated by overwriting the
previous values. As discussed in an earlier section, we can make multiple versions of
the data items, whenever a new update is made. The concept of shadow paging
illustrates this:
Current Directory
1
2
3
4
5
6
7
Pages
Shadow Directory
Page 2
Page 5
Page 7
Page 7(new)
Page5 (New)
Page 2 (new)
1
2
3
4
5
6
7
8
In a typical case, the database is divided into pages and only those pages that need
updation are brought to the main memory(or cache, as the case may be). A shadow directory
holds pointers to these pages. Whenever an update is done, a new block of the page is created
(indicated by the suffice(new) in the figure) and the updated values are included there. Note that
(i) the new pages are created in the order of updatings and not in the serial order of the pages. A
current directory holds pointers to these new pages. For all practical purposes, these are the
valid pages and they are written back to the database at regular intervals.
Now, if any roll back is to be done, the only operation to be done is to discard the current
directory and treat the shadow directory as the valid directory.
One difficulty is that the new, updated pages are kept at unrelated spaces and hence the
concept of a continuous database is lost. More importantly, what happens when the new
pages are discarded as a part of UNDO strategy? These blocks form garbage in the system.
(The same thing happens when a transaction commits the new pages become valid pages, while
the old pages become garbage). A mechanism to systematically identify all these pages and
reclaim them becomes essential.
i)
ii)
Mandatory
security
mechanisms:
These
are
standard
security
It may be noted that in all these cases, the role of the DBA becomes critical. He
normally logs into the system under a DBA account or a superuser account, which
provides full capabilities to manage the Database, ordinarily not available to the other
uses. Under the superuser account, he can manage the following aspects regarding
security.
i)
ii)
iii)
iv)
Another concept is the creation of views. While the database record may have
large number of fields, a particular user may be authorized to have information only
about certain fields. In such cases, whenever he requests for the data item, a view is
created for him of the data item, which includes only those fields which he is authorized
to have access to. He may not even know that there are many other fields in the
records.
The concept of views becomes very important when large databases, which
cater to the needs of various types of users are being maintained. Every user can have
and operate upon his view of the database, without being bogged down by the details.
It also makes the security maintenance operations convenient.
Chapter: 7
DATABASE RECOVEY, BACKUP & SECURITY
End Chapter quizzes
Data hiding
Encryption
Data Mining
Both a and c
Geographical Database
Statistical Database
Web Database
Time Database