You are on page 1of 20

DATABASE MANAGEMENT SYSTEM: DBMS is a collection of interrelated data and set of programs to access the data.

a DBMS allows users and other software to store and retrieve data in a structured way. It helps to specify the logical organization for a database and access and use the information within a database. It provides facilities for controlling data access, enforcing data integrity, managing concurrency controlled, restoring database. FILE SYSTEM: File system is one of the approaches to maintain the data. But there are some disadvantages in File management system. 1) Data redundancy and inconsistency: Redundancy is the unnecessary duplication of data. When the data is manipulated to one of the multiple copies of the same data, then it leads to inconsistency of the data. For example, suppose an employees address is stored in two tables. If the employees address is changed, it must be updated in two tables. If it is updated in one of the tables, it leads to inconsistent data. 2) Concurrent Access Anomaly: When the same data is accessed by two or more programs simultaneously, there is a possibility of anomaly. Ex: If there is joint account for A and B with amount Rs.50000. If A wants to withdraw Rs.10000 and B wants to withdraw Rs.20 000 the same time. Both the transactions read the balance as Rs.50000. But the final update will be done by one of the transactions. That leads to the final balance as either Rs.30000 or Rs.40000. But not Rs.20000. 3) Atomicity problem: The atomicity rule is either ALL or NONE. If a transaction has to be executed till the last instructions or none of the instruction should be executed. Ex: Transfer an amount of Rs.10000 from account A to account B. The transaction includes the following instructions. - read account A - deduct Rs.10000 from A - read Account B - add Rs.10000 to B If the program fails after second instructions, the amount is deducted from A but not added to B. This leads to data inconsistency. 4) Security problem: The file system does not identify the roles and authentication of different users of the database system. 5) Integrity problem: The file system has no provision for enforcing the integrity constraints on the data. For example, the account balance should be greater than Rs.1000, The employee id should be unique etc. ADVANTAGES OF A DBMS Using a DBMS to manage data has many advantages: 1) Data independence: Application programs are independent of data representation and storage. i.e. Changes in the data representation doesnt cause the application to be rewritten. DBMS can provide an abstract view of the data to separate application code from such details. 2) Efficient data access: A DBMS uses several techniques to store and retrieve data efficiently. The query language can provide efficient retrieval of data. ----------------------------------------------------------------------------------------------------------- 1 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

3) Data integrity and security: When the data is accessed through the DBMS, the DBMS can enforce integrity constraints on the data. For example, before performing withdraw transaction for an account, the DBMS can check that the minimum balance is maintained. DBMS also provides the access control on data. 4) Data administration: When several users share the data, it is necessary to centralize the data to minimize redundancy so that inconsistency can be avoided. These activities are monitored by data administration. 5) Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the effects of system failures. 6) Reduced application development time: DBMS can perform many tasks related to the application development. Therefore, applications need not be developed from the scratch. The development of application can be reduced. LEVELS OF ABSTRACTION IN A DBMS The DBMS provides three levels of abstractions. - Physical Level - Conceptual Level - External (View) Level

Physical Level The physical schema specifies how how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. The data structures called indexes are used to speed up data retrieval operations. For Example: A sample physical schema for the university database follows: - Store all relations as unsorted list of records or collection of pages in an operating system. - Create indexes on the RollNo column of the Student, and Fac_id and Salary Columns of Faculty tables. The process of arriving at a good physical schema is called physical database design. Conceptual Level The conceptual schema (sometimes called the logical schema) describes What data stored in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all relations that are stored in the database. In our sample university database, these relations contain information about entities, such as students and faculty, and about ----------------------------------------------------------------------------------------------------------- 2 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

relationships, such as students' enrollment in courses. All student entities can be described using records in a Students relation. Sililarly Faculty, Course etc. leading to the following conceptual schema: Students (sid: string, name: string, login: string, age: integer, gpa: real) Faculty (FAc_id: string, fname: string, sal: real) Courses (cid: string, cname: string, credits: integer) External Level The database has exactly one conceptual schema and one physical schema because it has just one set of stored relations, but it may have several external schemas, each leading to a particular group of users. Each external schema consists of a collection of one or more views and relations from the conceptual schema. A view is conceptually a relation, but the records in a view are not stored in the DBMS. Rather, they are computed using a definition for the view, in terms of relations stored in the DBMS. The external schema design is guided by end user requirements. For example, we want to allow students to find out the names of faculty members teaching courses, as well as course enrollments. This can be done by defining the following view: Courseinfo (cid: string, fname: string, enrollment: integer) DATA INDEPENDENCE The data independence is the ability to isolate an application programs from changes in the way the data is structured and stored. There are mainly two types of data independence. - Logical data independence - Physical data independence Logical data independence is the ability to isolate the applications from changes in the logical structure of the data, or changes in the choice of relations to be stored. Example, if the changes in the structures of a Student and Faculty relations is not effected by the application, then this is logical data independence. Physical independence is the ability to isolate the applications from changes in storage structure of the data.

----------------------------------------------------------------------------------------------------------A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

ENTITY-RELATIONSHIP MODEL
The entity-relationship (ER) data model allows us to describe the data involved in a real-world enterprise in terms of objects and their relationships and is widely used to develop an initial database design. It provides useful concepts that allow us to move from an informal description of what users want from their database to a more detailed and precise, description that can be implemented in a DBMS. OVERVIEW OF DATABASE DESIGN The database design process can be divided into six steps. (1) Requirements Analysis: The first step in designing a database application is to understand what data is to be stored in the database, what applications must be built on top of it, and what operations are most frequent and subject to performance requirements. i.e. we must find out what the users want from the database. This is usually an informal process that involves discussions with user groups, study of the current operating environment etc. (2) Conceptual Database Design: The information gathered in the requirements analysis step is used to develop a high-level description of the data to be stored in the database, along with the constraints that are known to hold over this data. This step is carried out using the ER model. (3) Logical Database Design: The DBMS chooses data model implement the database design, and convert the conceptual database design into a database schema. We will only consider relational DBMS. If the relational model is selected, the task in the logical design step is to convert an ER schema into a relational database schema. The result is a conceptual schema, sometimes called the logical schema. (4) Schema Refinement: The schema refinement analyzes the collection of relations in our relational database schema to identify potential problems, and to refine it. The schema refinement can be guided by normalizing relations restructuring them to ensure some desirable properties. (5) Physical Database Design: This step focuses on the expected workloads on database and further refines the database design to ensure that it meets desired performance criteria. This step may simply involve building indexes on some tables and clustering some tables. (6) Security Design: In this step, we identify different user groups and different roles played by various users. For each role and user group, we must identify the parts of the database that they must be able to access and the parts of the database that they should not be allowed to access. Necessary steps are taken to avoid unauthorized access of data. ENTITIES, ATTRIBUTES, AND ENTITY SETS An entity is an object in the real world that is distinguishable from other objects. Example, the head of computer science department. It is often useful to identify a collection of similar entities. Such a collection is called an entity set. Example: Student, Course, Employee.

----------------------------------------------------------------------------------------------------------A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

An entity is described using a set of attributes. For each attribute associated with an entity set, we must identify a domain of possible values. For example, the domain associated with the attribute name of Employees might be the set of 20-character strings RELATIONSHIPS AND RELATIONSHIP SETS A Relationship is an association among two or more entities. For example, we may have the relationship that Inna Reddy works in the Computer Science department. The collection of such relationships is called as Relationship Set. A relationship set can be thought of as a set of n-tuples: {(e1,e2,.....,en) | e1 E1, e2 E2,. En En } Where e1,e2,.....,en are the identifiers of the corresponding entity sets E1, E2,... En. A relationship can also have descriptive attributes. Descriptive attributes are used to record information about the relationship, rather than about the participating entity sets. For Example, Mr. Jaj working in the Department of Computer science since January 2007. this can be identified by adding one descriptive attribute Since.

FEATURES OF ER MODEL 1) Key Constraints The key constraints focuses on the no. of instances of a particular entity set participating in a relation instance with an instance of another entity set. There are mainly four types of key constraints as given below.

----------------------------------------------------------------------------------------------------------A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

1-to-1

1-to Many

Many-to-1

Many-to-Many

For example, the constraint There should be atmost one manager for a department says that a department can have either zero or one manager and the manager can manage any no. of departments. Therefore there is 1-to-many relationship between Employees and Departments.

name ssn lot

since dname did budget

Employees

Manages

Departments

2) Participation Constraints The participation constraint identifies where every entity (instance of entity set) is participating in the relation. There are two types of participation constraint. - Total participation - Partial participation If all entities of an entity are participating in the relationship, then the entity set is said to be Total.It is represented by thick line. If there is no total participation, then it is said to be Partial. Represented by normal line. For example, if there is constraint that There should be atleast one manager for every department. This constraint leads to the total participation of Department entity set. This can be shown as follows. ----------------------------------------------------------------------------------------------------------A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29 6

name ssn lot

since dname did budget

Employees

Manages

Departments

3) Weak Entities An entity which is unable to create a unique identifier with the attributes of its own but it depends on attribute of another entity (owner entity) to create a unique identifier is called weak entity. The following restrictions must hold by a weak entity. i) The owner entity set and the weak entity set must participate in a one-to-many relationship set. This relationship set is called the identifying relationship set of the weak entity set. ii) The weak entity set must have total participation in the identifying relationship. For example, Transactions table can be defined as weak entity since it depends on Actno of Account entity for unique identification along with TrNo of its own. The transaction table satisfies the above two restrictions.

cname Amount
Actno

Bal

TrNo

Account

has

Transactions

4) Class Hierarchies The class hierarchy identifies the relationship between the generalized entity and specialized entity. The generalized entity contains the attribute which are common to all its specialized classes and the specialized entity contain the attributes which are specific to that entity. The relationship between these two types of entities is given by ISA relationship. For example, consider the relationship between Employees and different types of employees. i.e Hourly_emps and Contract_emps. The Employees entity contains the common attributes like ssn, name, lot. The Hourly_Emps entity contains the specialized attributes hourly_wages, hours_worked. Similarly, the Contract_Emps contains the attribute contractid. This relationship can be shown as follows. ----------------------------------------------------------------------------------------------------------A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29 7

5. Aggregation A relationship set is an association between entity sets. Sometimes we have to model a relationship between a collection of entities and relationships. Aggregation allows us to indicate that a relationship set (identified through a dashed box) participates in another relationship set. For example, we have an entity set called Projects and that each Projects entity is sponsored by one or more departments. The Sponsors relationship set captures this information. A department that sponsors a project might assign employees to monitor the sponsorship. Naturally, Monitors should be a relationship set that associates a Sponsors relationship (rather than a Projects or Departments entity) with an Employees entity. i.e. we have defined relationships to associate two or more entities. Aggregation is used to define a relationship set such as Monitors.

----------------------------------------------------------------------------------------------------------A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

THE RELATIONAL MODEL


The main element used to represent data in relational model is a relation (table). A relation consists of relation schema and relation instance. A relation instance is the table and a relation schema describes the fields of the table. The name of each field (or column, or attribute) is associated with the domain . A domain is referred to in a relation schema by the domain name and has a set of associated values. The degree, also called arity, of a relation is the number of fields. The cardinality of a relation instance is the number of tuples (records, or rows) in it. A relational database is a collection of relations with distinct relation names. The relational database schema is the collection of schemas for the relations in the database. LOGICAL DATABASE DESIGN: ER TO RELATIONAL The logical database design deals with the representation of high level ER diagrams in the form of relations (tables). The steps involved in this design can be given as follows. 1) Entity Sets to Tables An entity set is mapped to a relation in a straightforward way: Each attribute of the entity set becomes an attribute of the table. The identifier of the entity set becomes the primary key of the table. Consider the Employees entity set with attributes ssn, name, and lot. ename
SSN

lot

Employees The following SQL statement captures the preceding information, including the domain constraints and key information: CREATE TABLE Employees ( ssn varcharchar2(11), name varcharchar2(30), lot number(3), primary key (ssn) ) 2) Relationship Sets (without Constraints) to Tables A relationship set, like an entity set, is mapped to a relation in the relational model. To represent a relationship without constraint, each participating entity and the descriptive attributes of the relationship must be identified. Thus, the attributes of the relation include: - The primary key attributes of each participating entity set, as foreign key fields. - The descriptive attributes of the relationship set. The set of nondescriptive attributes is a super key for the relation. If there are no key constraints, this set of attributes is a candidate key. Consider the WorksIn2 relationship set ----------------------------------------------------------------------------------------------------------- 9 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

shown below. Each department has offices in several locations and we want to record the locations at which each employee works.

The table creation for this relation can be given as follows. CREATE TABLE Works In2 ( ssn varchar2(11), did number(3), address varchar2 (20), since DATE, PRIMARY KEY (ssn, did, address), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (address) REFERENCES Locations, FOREIGN KEY (did) REFERENCES Departments ); 3) Translating Relationship Sets with Key Constraints If a relationship set involves many entity sets and some of them are linked via arrows in the ER diagram, then there are two alternatives in translating to relations. Consider the following relation with the key constraint. since dname lot did budget

name ssn

Employees

Manages

Departments

Alternative -I A relation can be created for relationship set Manages by including the key attributes of participating relations and the descriptive attribute of the relationship set as follows. CREATE TABLE Manages ( ssn varchar2(11),did number(3), since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments ); ----------------------------------------------------------------------------------------------------------- 10 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

Alternative-II A combined relation can be created for relationship set Manages with entity set Department as follows. CREATE TABLE Dept Mgr ( did number(3),dname varchar2(20), budget number(6), ssn varchar2(15), since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees ); 4) Translating Relationship Sets with Participation Constraints Every department is required to have a manager, due to the participation constraint, and at most one manager, due to the key constraint. The following SQL statement reflects the second translation approach.

name ssn lot

since dname did budget

Employees

Manages

Departments

CREATE TABLE Dept_Mgr ( did number(3),dname varchar2(20), budget number(6), ssn CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees ON DELETE NO ACTION ) The NO ACTION specification, which is the default and need not be explicitly specified, ensures that an Employees tuple cannot be deleted while it is pointed to by a Dept_Mgr tuple. NOT NULL constraint specifies that each department has a manager. 5) Translating Weak Entity Sets A weak entity set always participates in a one-to-many binary relationship and has a key constraint and total participation. Consider the relationship between Transaction and Account entity sets. Transactions entity set can be defined as weak entity since it depends on Actno of Account entity set for unique identification along with TrNo of its own. The transaction table satisfies the above two restrictions. The following command illustrates the translation of weak entity set into table. CREATE TABLE Transaction (Trno number(4), AcctNo number(5), Amt number(72), PRIMARY KEY(Trno, AcctNo), FOREIGN KEY(AcctNo) REFERENCES Account ON DELETE CASCADE); The CASCADE option ensures that information about an Account and its transactions are deleted if the corresponding Account is deleted. ----------------------------------------------------------------------------------------------------------- 11 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

cname Amount
Actno

Bal

TrNo

Account

has

Transactions

6) Translating Class Hierarchies There are two basic approaches to handling ISA hierarchies by applying them to the ER diagram shown below. (Note: refer to Class Hierarchy diagram) 1. Map each of the entity sets Employees, Hourly Emps, and Contract Emps to a different relation. The Employees relation is created as entity is translated into relation. Hourly Emps and Contract Emps entities are handled similarly. The relation for Hourly Emps includes the hourly wages and hours worked attributes of Hourly Emps. It also contains the key attributes of the superclass (ssn, in this example), which serve as the primary key for Hourly Emps. Similarly for Contract_Emps. 2. Alternatively, two relations can be created corresponding to Hourly Emps and Contract Emps. The relation for Hourly Emps includes all the attributes of Hourly Emps as well as all the attributes of Employees (i.e., ssn, name, lot, hourly wages, hours worked). 7) Translating ER Diagrams with Aggregation Translating aggregation into the relational model is easy because there is no real distinction between entities and relationships in the relational model. Consider the ER diagram (Note: Refer to Aggregation for diagram) - The Employees, Projects, and Departments entity sets and the Sponsors relationship set are mapped as described in Entity sets to tables). - For Sponsors relationship set the following attributes are included: The key attribute of Project(pid), key attribute of Department(did),and the descriptive attribute of sponsors(since). The primary key is (pid,did). - For the Monitors relationship set, we create a relation with the following attributes: the key attributes of Employees (ssn), the key attributes of Sponsors (did, pid), and the descriptive attributes of Monitors (until).

----------------------------------------------------------------------------------------------------------- 12 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

VIEWS A view is a logical representation whose rows are not explicitly stored in the database but are computed as needed from a view definition. SQL CRERATE VIEW statement is the SQL command that adds a new view to a SQL database. A view can be accessed using the SQL SELECT statement like a table. A view is built by selecting data from one or more tables. The Syntax for creating view is CREATE [OR REPLACE] VIEW <View Name> [<column names>] AS <Select statement> For Example: CREATE VIEW Emp_view AS SELECT empno, ename, sal from emp; If we insert a record through Emp_view, the values for empno, ename, sal are inserted in the base table. The remaining columns which are not specified in view definition will contain NULL values. The data manipulated in the base tables can be reflected in view and vise versa. SQL views are used because they can provide the following benefits / functions: - Database queries are simplified - Database complexity is hidden - Flexibility is increased - queries of views may not change when underlying tables change. - Security is increased - sensitive information can be excluded from a view. Some views can also support the SQL INSERT, SQL UPDATE and SQL DELETE statements. In that case, the view must include all NOT NULL columns of that table. If the view definition does not include the field like Primary key of the table, the insertion through the view forces the table to insert the record with NULL value for primary key. This type of insertion is not allowed by DBMS.

----------------------------------------------------------------------------------------------------------- 13 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

SCHEMA REFINEMENT AND NORMAL FORMS


PROBLEMS CAUSED BY REDUNDANCY Storing the same information redundantly, that is , in more than one place within a database, can lead to several problems. i) Redundant storage: Some information is stored repeatedly. ii) Update Anomalies: If one copy of such repeated data is updated, an inconsistency is created unless all copies are similarly updated. iii) Insertion Anomalies: It may not be possible to store some information unless some other information is stored as well. iv) Deletion Anomalies: It may not be possible to delete some information without losing some other information. Consider the following schema: Hourly_Emps (ssn, name, lot, rating, hourly_wages, hours_worked) The key value for Hourly_Emps is ssn. Suppose that the hourly_wages attribute is determined by the rating attribute. This functional dependency leads to possible redundancy. If the same value of rating column appears in more than one tuples, the same value must appear in the hourly_wages column as well. The updating of the repeated values may lead to inconsistency. The updation has to be done for all repeated values. It is not possible to insert the values of rating and hourly_wages unless there is an employee with that rating . ssn column does not allow null values since it is a primary key. It is not possible to delete the information of employee without deleting the information of rating and hourly_wages. Use of Decomposition. We can use the concept of functional dependency to refine the above schema. We can deal with the redundancy in Hourly_Emps by decomposing into two relations. Hourly_Emps2( ssn, name, lot, rating, hours, hours_worked) Wages( rating, hourly_wages) These two schemas overcome the all anomalies. FUNCTIONAL DEPENDENCIES A functional dependency (FD) is a kind of Integrity constraint (IC) that generalizes the concept of a key. Let R be a relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R satisfies the FD X Y if the following holds for every pair of tuples t1 and t2 in r, If t1.X = t2.X, then t1.Y=t2.Y. Consider the following example. ----------------------------------------------------------------------------------------------------------- 14 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

A B C D a1 b1 c1 d1 a1 b1 c1 d2 a1 b2 c2 d1 a2 b1 c3 d1 The above instance satisfies ABC EXAMPLES MOTIVATING SCHEMA REFINEMENT ER diagrams can produce schemas with redundancy problems, because it is a complex, and certain constraints are not expressible in terms of ER diagrams. The following examples explains the need for refinement. 1) Constraints on an Entity Set Consider the Hourly Emps relation. The constraint that attribute ssn is a key can be expressed as an FD: {ssn} {ssn; name; lot; rating; hourly wages; hours worked} we will write this FD as S SNLRWH If there is an additional constraint that the hourly wages attribute is determined by the rating. i.e FD: RW. This dependency leads to the redundant storage of hourly wages for a rating value. It cannot be expressed in terms of the ER model. Only FDs that determine all attributes of a relation (i.e., key constraints) can be expressed in the ER model. This constraint identifies the need of decomposition of relation into two relations with following dependencies. FD : S SNLRH and FD : R W 2) Constraints on a Relationship Set Suppose that we have entity sets Parts, Suppliers, and Departments, as well as a relationship set Contracts that involves all of them. We refer to the schema for Contracts as CQPSD. A contract with contract id C specifies that a supplier S will supply some quantity Q of a part P to a department D. If there is a constraint that a department purchases at most one part from any given supplier. Thus, if there are several contracts between the same supplier and department, we know that the same part must be involved in all of them. This constraint is an FD: DS P We can address this situation by decomposing Contracts into two relations with attributes CQSD and SDP. 3) Identifying Attributes of Entities Consider the relationship works_in between Employees and Departments.

----------------------------------------------------------------------------------------------------------- 15 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

The above ER diagram results into two schemas. Workers(ssn, name, lot, did, since) Departments(did, dname, budget) suppose that employees are assigned parking lots based on their department, and that all employees in a given department are assigned the same lot. This constraint is not expressible with respect to the ER diagram. The functional dependency FD: didlot leads to the removal of lot attribute from Employees entity set and add as an attribute of department entity set. This can be shown in the following diagram.

REASONING ABOUT FUNCTIONAL DEPENDENCIES 1) Closure of a Set of FDs. The set of all FDs implied by a given set of F of FDs is called the closure of F and is denoted as F+. The following three rules, called Armstrongs Axioms, can be applied repeatedly to infer all FDs implied by a set of F of FDs. We use X, Y, and Z to denote sets of attributes over a relation R. Reflexivity: If X Y, then XY. Augmentation: If XY, then XZYZ for any Z. Transitivity: If XY and YZ, then XZ. It is convenient to use some additional rules while reasoning about F+. Union: If XY and XZ, then xYZ. Decomposition: If XYZ, then Xy and xZ. 2) Attribute Closure. To compute the attribute closure X+ with respect to F, which is the set of attributes A such that XA can be inferred using the Armstrong Axioms. The algorithm for computing the attribute closure of a set X of attributes is given below. Closure=X; Repeat until there is no change:{ If there is an FD UV in F such that U Then set closure=closure U V }

closure,

----------------------------------------------------------------------------------------------------------- 16 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

DECOMPOSITION A decomposition of a relation schema R consists of replacing the relation schema by two or more relation schemas that each contain a subset of the attributes of R and together include all attributes in R. The decomposition should satisfy the two properties. 1) Lossless-Join Decomposition. 2) Dependency-Preserving Decomposition. 1) Lossless-Join Decomposition Let R be a relation schema and let F be a set of FDs over R. A decomposition of R into two schemas with attribute sets X and Y is said to be a lossless-join decomposition with respect to F if for every instance r of R that satisfies the dependencies in F, x(r) y(r) = r Consider the following Lossy Decomposition. S s1 s2 s3 P p1 p2 p1 D d1 d2 d3 P P1 P2 P1 D D1 D2 D3

S s1 s2 s3

P p1 p2 p1

Instance r
S s1 s2 s3 s1 s3 P p1 p2 p2 p1 p1 D d1 d2 d2 d3 d1

SP(r)

PD(r)

x(r) y(r)

By replacing the above decomposed tables, we loss some information. But, all the decompositions used to eliminate redundancy must be lossless. 2) Dependency-Preserving Decomposition. A dependency-preserving decomposition allows us to enforce all FDs by examining a single relation instance on each insertion or modification. The decomposition of relation schema R with FDs F into schemas with attribute sets X and Y is dependency preserving if (Fx U Fy)+ = F+. ----------------------------------------------------------------------------------------------------------- 17 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

That is, if we take the dependencies in Fx and Fy and compute the closure of their union, we get back all dependencies in the closure of F. Therefore we need to enforce only the dependencies in Fx and Fy. Then all FDs in F+ are then sure to be satisfied. NORMALIZATION Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table) First Normal Form (1NF) A relation is in first normal form (1NF) if the values in the relation are atomic for every attribute in the relation. That means, no attribute value can be a set of values or, as it is sometimes expressed a repeating group. Consider the following table WORKER. It is not in 1NF because BLDG-ID is not atomic. That is, for any given tuple, BLDG-ID can have multiple values. worker
WORKER-ID NAME SKILL-TYPE SUPV-ID BLDG-ID

1234 1456 7600

Faraday James Sania

Electric Plumbing Electric

1322

{355,766} {355,908,666} 455

The above table, which is not in 1NF, can be converted into the table satisfying the 1NF as follows. Worker
WORKER-ID NAME SKILL-TYPE SUPV-ID BLDG-ID

1234 1234 1456 1456 1456 7600

Faraday Faraday James James James Sania

Electric Electric Plumbing Plumbing Plumbing Electric

1322 1322

355 766 355 908 666 455

Second Normal Form (2NF): A relation is in second normal form (2NF) if and only if it is in 1NF and no nonkey attribute if functionally dependent on just a part of the key. Thus, 2NF can be violated only when a key is a composite key. Consider the following relation. ASSIGNMENT (WORKER-ID,BLDG-ID, START-DATE, NAME)
WORKER-ID BLDG-ID START-DATE NAME

1235 312 10/10/04 Faraday 1412 312 10/01/04 James ----------------------------------------------------------------------------------------------------------- 18 1235 515 10/07/04 Faraday A. Inna Reddy, Associate Professor, St. 12/08/05 Degree & PG College, Hyd-29 Josephs 1412 460 James 1412 435 01/02/04 James

The above relation is not in 2NF since the NAME (nonkey) attribute is depending on the partial key (WORKER-ID) and it leads to the following problems. 1. The worker name is repeated in every row that refers to an assignment for that worker. 2. If the name of the worker changes, every recording an assignment of that worker must be updated. Otherwise, it leads to inconsistency. This is an update anomaly. 3. If there is no assignment for the worker, there may be no row to keep workers name. It is insertion anomaly. To solve these problems, the relation can be decomposed into the following two relation schemes, both of which are 2NF. ASSIGNMENT(WORKER-ID, BLDG-ID, START-DATE) Foreign key: WORKER-ID REFERENCES WORKER WORKER(WORKER-ID, NAME) Assignment Worker
WORKER-ID BLDG-ID START-DATE WORKER-ID NAME

1235 1412 1235 1412 1412

312 312 515 460 435

10/10/04 10/01/04 10/07/04 12/08/05 01/02/04

1235 1412

Faraday James

Third Normal Form (3NF): A relation is in third normal form (3NF) if and only if it is in 2NF and there no transitive dependency. The transitive dependency occurs when a nonkey attribute is functionally dependent on one or more nonkey attributes. Consider the following relation. WORKER
WORKER-ID 1235 1412 1311 SKILL-TYPE Electric Plumbing Electric BONUS 3.50 3.00 3.50

We see that, FD: WORKER-IDSKILL-TYPE FD: WORKER-IDBONUS-RATE Are functional dependencies for this relation since WORKER-ID is a key. However, FD: SKILL-TYPEBONUS-RATE is also a functional dependency. It is violating 3NF and leading to update, insertion and deletion anomalies. Now, the above relation can be converted into two 3NF relations as follows. R1( WORKER-ID, SKILL-TYPE) R2( SKILL-TYPE, BONUS-RATE) Boyce-Codd Normal Form (BCNF): ----------------------------------------------------------------------------------------------------------- 19 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

Every determinant is a key. A transitive dependency occurs when a nonkey attribute is functionally dependent on one or more other nonkey attributes. This criterion does not handle two cases. 1. A nonkey attribute is dependent on a key attribute in a composite key. (the criterion for non-2NF relations). 2. A key attribute in a composite key is dependent on a nonkey attribute. BCNF handles both of these. Thus, if a relation is BCNF, then it is also 3NF. Fourth Normal Form (4NF): A relation is forth-normal form (4NF) if and only if it is in third normal form and has no multi values dependencies. A condition that enforces attribute independence by requiring this duplication of values is called a multivalued dependency (MVD). Consider the following relation. FACULTY FNAME Jones Jones Jones Jones COMMITTEE Admissions Scholarship Admissions Scholarship COURSE MCA MCA MBA MBA

The Fname is repeated for several times for each row. The multi-valued attributes (committee and course) have been placed in relations by themselves. The table can be decomposed into two relations FAC-COMM by the key fname and committee and FACCOURSE by the key fname and course FAC-COMM FAC-COURSE FNAME Jones Jones COMMITTEE Admissions Scholarships FNAME Jones Jones COURSE MCA MBA

Fifth Normal Form (5NF): Fifth normal form (5NF) eliminates anomalies that result from a special type of constraint called join dependencies(JD). A relation schema R is said to be in fifth normal form (5NF) if for every JD {R1, R1,..Rn} that holds over R, one of the following statements is true. Ri = R for some i, or The JD is implied by the set of those FDs over R in which the left side is a key for R.

----------------------------------------------------------------------------------------------------------- 20 A. Inna Reddy, Associate Professor, St. Josephs Degree & PG College, Hyd-29

You might also like