You are on page 1of 24

Chapter VI Logical Design and Relational Data Model

Logical Database Schema


The relational model gives us a single way to represent data: as a two-dimensional table called a relation.

1. Schema

Figure 1

The name of a relation and the set of attributes for a relation is called the schema for that relation. To show the schema of the relation, use the relation name followed by a parenthesized list of its attributes. Using figure 1 above, we can form the schema: Movies (Title, Year, Length, Film Type) The attributes in a relation schema are a set, not a list. The standard order of attributes must be followed when displaying the relation or any of its rows.

2. Tuples
The rows of a relation, other than the header row containing the attribute names, are called tuples. A tuple has one component for each attribute. When we want to display the tuple alone, not as part of the relation, we use commas to separate the components, and use a parenthesis to surround the tuple. For example, we will use the first row of the given relation: (Star Wars, 1977, 124, color) We should always use the order in which the attributes were listed in the relation schema because the attributes are not displayed.

3. Domains
The relational data model requires that each component of each tuple should be atomic, meaning that its values cannot be broken into smaller components. The components of any tuple of the relation must have, in each component, a value that belongs to the domain of the corresponding column. For example, tuples of the Movies relation

1|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model of Fig. 1 must have a first component that is a string, second and third components that are integers, and a fourth component whose value is one of the constants color and blackAndWhite.

4. Relation Instances
A relation about movies is not static; rather, relations change over time. We expect that these changes involve the tuples of the relation, such as adding new tuples, editing the components of the tuples, and deleting the tuples. A set of tuples is for a given relation is called an instance of that relation. For example, the first three tuples in figure 1 form an instance of relation Movies. Presumably, the relation Movies has changed over time and will continue to change over time. For example, in 1980, Movies did not contain the tuples for Mighty Ducks or Wayne's World. However, a conventional database system maintains only one version of any relation: the set of tuples that are in the relation "now." This instance of the relation is called the current instance

2|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model

Relational Data Model


Model
- a representation of real world objects and events, and their associations. - concentrates on the essential, inherent aspects of an organization and ignores the accidental properties

Data Model
- an integrated collection of concepts for describing data, relationships between data, and constraints on the data used by an organization - attempts to represent the data requirements of the organization, or the part of the organization that you wish to model. - provides the basic concepts and notations that will allow database designers and end-users to communicate their understanding of the organization data unambiguously and accurately - consists of three components (1) structural part set of rules that define how the database is to be constructed (2) manipulative defining the types of operations/transactions that are allowed on the data (including operations used for updating or retrieving data and for changing the structure of the database) (3) set of integrity rules ensures that the data is accurate - the purpose of a data model is to represent data and to make the data understandable The relational data model is based on the mathematical concept of a relation which is physically represented as a table. Codd, a trained mathematician, used terminology taken from mathematics, principally set theory and predicate logic. Relation a table with columns and rows A relational DBMS requires only that the database be perceived by the user as tables Attribute a named column of a relation In a relational model, we use relations to hold information about the objects that we want to represent in the database. The rows of the table correspond to individual records and the columns correspond to the attributes Attributes can appear in any order and the relation will still be the same relation and convey the same meaning Domain the set of allowable values for one or more attributes Important feature of the relational model, every attribute in a relational database is associated with a domain. Domains may be distinct for each attribute, or two or more attributes may be associated with the same domain.

3|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model Note that, at any given time, typically there will be values in a domain that dont currently appear as values in the corresponding attribute. In other words, a domain describes possible values for an attribute. Allows us to define the meaning and source of values that attributes can hold. More information is available to the system and it can (theoretically) reject operations that dont make sense

4|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model

Tuple a record of a relation The fundamental elements of a relation Relational database a collection of normalized tables consists of tables that are appropriately structured

Properties of relational tables the table has a name that is distinct from all other tables in the database Each cell of the table contains exactly one value; tables dont contain repeating groups of data A relational table that satisfies this property is said to be normalized (first normal form) Each column has a disctinct name The values of a column are all from the same domain The order of columns has no significance. Each record is distinct; there are no duplicate records The order of records has no significance, theoretically. Relational keys Each record in a table must be unique, therefore we must be able to identify a column or combination of columns (relational keys) that provides uniquenes. Superkey a column or set of columns that uniquely identifies a record within a table
5|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model

Candidate key a superkey that contains only the minimum number of columns necessary for unique identification has two properties: (1) Uniquenes (2) Irreducibility no proper subset of the candidate key has the uniqueness property Primary key the candidate key that is selected to identify records uniquely within the table Foreign key a column or set of columns within one table that matches the candidate key of some table (possibly the same table) Representing Relational Databases A relational database consists of one or more tables. The common convention for representing a description of a relational database is to give the name of each table, followed by the column names in parentheses. Normally, the primary key is underlined. The description of the relational database for the StayHome video rental company is:

Relational Integrity Since every column has an associated domain, there are constraints (called domain constraints) in the form of restrictions on the set of values allowed for the columns of tables.

There are two important integrity rules, which are constraints that apply to all instances of the database.
6|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model

1. Entity Integrity 2. Referential Integrity Nulls represent a value for a column that is currently unknown or is not applicable for this record A way to deal with incomplete or exceptional data It is not the same as a zero numeric value or a text string filled with spaces, but a null represents the absence of a value

Entity Integrity - In a base table no column of a primary key can be null - A base table is a named table whose records are physically stored in the database. Referential Integrity - If a foreign key exists in a table, either the foreign key value must match a candidate key value of some record in its home table or the foreign key must be wholly null. Advantages: 1. Ease of use: The revision of any information as tables consisting of rows and columns is much easier to understand . 2. Flexibility: Different tables from which information has to be linked and extracted can be easily manipulated by operators such as project and join to give information in the form in which it is desired. 3. Precision: The usage of relational algebra and relational calculus in the manipulation of he relations between the tables ensures that there is no ambiguity, which may otherwise arise in establishing the linkages in a complicated network type database. 4. Security: Security control and authorization can also be implemented more easily by moving sensitive attributes in a given table into a separate relation with its own authorization controls. If authorization requirement permits, a particular attribute could be joined back with others to enable full information retrieval.

7|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model

5. Data Independence: Data independence is achieved more easily with normalization structure used in a relational database than in the more complicated tree or network structure. 6. Data Manipulation Language: The possibility of responding to query by means of a language based on relational algebra and relational calculus e.g SQL is easy in the relational database approach. For data organized in other structure the query language either becomes complex or extremely limited in its capabilities. Disadvantages : 1. Performance: A major constraint and therefore disadvantage in the use of relational database system is machine performance. If the number of tables between which relationships to be established are large and the tables themselves effect the performance in responding to the sql queries. 2. Physical Storage Consumption: With an interactive system, for example an operation like join would depend upon the physical storage also. It is, therefore common in relational databases to tune the databases and in such a case the physical data layout would be chosen so as to give good performance in the most frequently run operations. It therefore would naturally result in the fact that the lays frequently run operations would tend to become even more shared. 3. Slow extraction of meaning from data: if the data is naturally organized in a hierarchical manner and stored as such, the hierarchical approach may give quick meaning for that data.

8|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model

Concept of Normalization
Normalization of Database
Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like insertion, update and deletion Anomalies. It is a two step process that puts data into tabular form by removing duplicated data from the relation tables.

Uses of Normalization
1. Eliminating redundant (useless) data 2. Ensuring data dependencies make sense i.e. data is logically stored Without Normalization it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if Database is not normalized. To understand these anomalies let us take an example of Student table. Student table:

Updation Anomaly: To update address of a student who occurs twice or more than twice in a table, we will have to update S_Address column in all the rows, else data will become inconsistent. Insertion Anomaly: Suppose for a new admission, we have a student id(S_id), name and address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to insertion anomaly. Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it. Normalization Rule Normalization rule are divided into following normal form. 1. First Normal Form 9|Page Team Crescendo

Chapter VI Logical Design and Relational Data Model 2. Second Normal Form 3. Third Normal Form 4. BCNF

First Normal Form (1NF)


A row of data cannot contain repeating group of data i.e each column must have a unique value. Each row of data must have a unique identifier i.e Primary key. For example consider a table which is not in First Normal form.

You can clearly see here that student name Adam is used twice in the table and subject math is also repeated. This violates the First Normal form. To reduce above table to First Normal form break the table into two different tables.

In Student table concentration of subject_id is the Primary key. Now both the Student table and Subject table are normalized to first normal form. 10 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

Second Normal form (2NF)


A table to be normalized to Second Normal form should all meet the needs of First Normal form and there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second Normal form. For example, consider a table which is not in Second Normal form.

In customer table concatenation of Customer_id and Order_id is the primary key. This table is in First Normal form but not in Second Normal form because there are partial dependencies of columns on primary key. Customer_Name is only dependent on customer_id, Order_name is dependent on Order_id and there is no link between sale_detail and Customer_name. To reduce Customer table to Second Normal form break the table into following three different tables.

11 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

Denormalization
Databases intended for online transaction processing (OLTP) are typically more normalized than databases intended for online analytical processing (OLAP). OLTP applications are characterized by a high volume of small transactions such as updating a sales record at a supermarket checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial. Some Good Reasons Not To Normalize That said, there are some good reasons not to normalize your database. Lets look at a few: 1. Joins are expensive. Normalizing your database often involves creating lots of tables. In fact, you can easily wind up with what might seem like a simple query spanning five or ten tables. If youve ever tried doing a five-table join, you know that it works in principle, but its painstakingly slow in practice. If youre building a web application that relies upon multiplejoin queries against large tables, you might find yourself thinking: If only this database wasnt normalized! When you hear that thought in your head, its a good time to consider denormalizing. If you can stick all of the data used by that query into a single table without really jeopardizing your data integrity, go for it! Be a rebel and denormalize your database. You wont look back! 2. Normalized design is difficult. If youre working with a complex database schema, youll probably find yourself banging your head against the table over the complexity of normalization. As a simple rule of thumb, if youve been banging your head against the table for an hour or two trying to figure out how to move to the fourth normal form, you might be taking normalization too far. Step back and ask yourself if its really worth continuing. 3. Quick and dirty should be quick and dirty. If youre just developing a prototype, just do whatever works quickly. Really. Its OK. Rapid application development is sometimes more important than elegant design. Just remember to go back and take a careful look at your design once youre ready to move beyond the prototyping phase. The price you pay for a quick and dirty database design is that you might need to throw it away and start over when its time to build for production.

12 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

Five Basic Normal Forms


I. First Normal Form An entity is in the first normal form if it contains no repeating groups. In relational terms, a table is in the first normal form if it contains no repeating columns. Repeating columns make your data less flexible, waste disk space, and make it more difficult to search for data. Example: In the telephone directory, it appears that the name table contains repeating columns, child1, child2, and child3.

You can see some problems in the current table. The table always reserves space on the disk for three child records, whether the person has children or not. The maximum number of children that you can record is three, but some of your acquaintances might have four or more children. To look for a particular child, you have to search all three columns in every row. To eliminate the repeating columns and bring the table to the first normal form, separate the table into two tables. Put the repeating columns into one of the tables. The association between the two tables is established with a primary-key and foreign-key combination. Because a child cannot exist without an association in the name table, you can reference the name table with a foreign key, rec_num.

13 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

II. Second Normal Form An entity is in second normal form if each attribute that is not in the primary key provides a fact that depends on the entire key. A violation of the second normal form occurs when a non-primary key attribute is a fact about a subset of a composite key. Example: An inventory entity records quantities of specific parts that are stored at particular warehouses.

Here, the primary key consists of the PART and the WAREHOUSE attributes together. Because the attribute WAREHOUSE_ADDRESS depends only on the value of WAREHOUSE, the entity violates the rule for second normal form. This design causes several problems:

Each instances for a part that this warehouse stores repeats the address of the warehouse. If the address of the warehouse changes, every instance referring to a part that is stored in that warehouse must be updated. Because of the redundancy, the data might become inconsistent. Different instances could show different addresses for the same warehouse. If at any time the warehouse has no stored parts, the address of the warehouse might not exist in any instances in the entity.

To satisfy second normal form, the information in the figure above would be in two entities.

14 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

III. Third Normal Form An entity is in third normal form if each non-primary key attribute provides a fact that is independent of other non-key attributes and depends only on the key. A violation of the third normal form occurs when a non-primary attribute is a fact about another non-key attribute. Example: The first entity in the following figure contains the attributes EMPLOYEE_NUMBER and DEPARTMENT_NUMBER. Suppose that a program or user adds an attribute, DEPARTMENT_NAME, to the entity. The new attribute depends on DEPARTMENT_NUMBER, whereas the primary key is on the EMPLOYEE_NUMBER attribute. The entity now violates third normal form. Changing the DEPARTMENT_NAME value based on the update of a single employee, David Brown, does not change the DEPARTMENT_NAME value for other employees in that department. The updated version of the entity illustrates the resulting inconsistency. Additionally, updating the DEPARTMENT_NAME in this table does not update it in any other table that might contain a DEPARTMENT_NAME column.

You can normalize the entity by modifying the EMPLOYEE_DEPARTMENT entity and creating two new entities: EMPLOYEE and DEPARTMENT. The DEPARTMENT entity contains attributes for DEPARTMENT_NUMBER and DEPARTMENT_NAME. Now, an update such as

15 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model changing a department name is much easier. You need to make the update only to the DEPARTMENT entity.

16 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

IV. Fourth Normal Form An entity is in fourth normal form if no instance contains two or more independent, multi-valued facts about an entity. Example: Consider the EMPLOYEE entity. Each instance of EMPLOYEE could have both SKILL_CODE and LANGUAGE_CODE. An employee can have several skills and know several languages. Two relationships exist, one between employees and skills, and one between employees and languages. An entity is not in fourth normal form if it represents both relationships.

Instead, you can avoid this violation by creating two entities that represent both relationships.

V. Fifth Normal Form Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy. Second, third, and fourth normal forms also serve this purpose, but fifth normal form generalizes to cases not covered by the others. Example: If agents represent companies, companies make products, and agents sell products, then we might want to keep a record of which agent sells which product for which company. This information could be kept in one record type with three fields:

17 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model In this case, it turns out that we can reconstruct all the true facts from a normalized form consisting of three separate record types, each containing two fields:

Roughly speaking, we may say that a record type is in fifth normal form when its information content cannot be reconstructed from several smaller record types. If a record type can only be decomposed into smaller records which all have the same key, then the record type is considered to be in fifth normal form without decomposition. A record type in fifth normal form is also in fourth, third, second, and first normal forms.

18 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

Transforming E-R Diagrams into Relations


It is useful to transform the conceptual data model into a set of normalized relations Steps: 1. 2. 3. 4. Represent entities Represent relationships Normalize the relations Merge the relations

In translating a relationship set to a relation, attributes of the relation must include: - The primary key for each participating entity set - All descriptive attributes of the relationship set

From E/R Diagrams to Relational Designs

From Entity Sets to Relations


For each non-weak entity set, create a relation of the same name and with the same set of attributes.

Example: (Entity = Stars) Name Carrie Fisher Mark Hamill Address 123 Maple St., Hollywood 456 Oak Rd., Brentwood

19 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

From E/R Relationships to Relations


For each entity set involved in relationship R, we take its key attribute or attributes as part of the schema of the relation for R. If the relationship has attributes, then these are also attributes of relation R Title Star Wars Mighty Ducks Year 1977 1991 studioName Fox Disney

Example: (Relationship = Owns)

Combining Relations
Example: (Combining relation Movies with relation Owns) Title Star Wars Mighty Ducks Year 1977 1991 Length 124 104 filmType color color studioName Fox Disney

Handling Weak Entity Sets


If W is a weak entity set, construct for W a relation whose schema consists of: 1. All Attributes of the weak entity set W. 2. All attribute of supporting relationship for W. 3. For each supporting relationship for W, all the key attributes of the entity set E. Rename attributes, if necessary, to avoid name conflicts. Do not construct a relation for any supporting relationship for W. Example:

20 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

Schema for Relation Contracts: Contracts (starName, studioName, title, year, salary)

Converting Subclass Structures to Relations


Principal conversion strategies: 1. Follow the E/R viewpoint. For each entity set E in the hierarchy, create a relation that includes the key attributes from the root and any attributes belonging to E. 2. Treat entities as objects belonging to a single class. For each possible subtree including the root, create on relation, whose schema includes all the attributes of all the entity sets in the subtree. 3. Use null values. Create one relation with all the attributes of all the entity sets in the hierarchy. Each entity is represented by one tuple, and that tuple has a null value for whatever attributes the entity does not have.

E/R Style Conversion


Create a relation for each entity set. If the entity set E is not the root of the hierarchy, then the relation for E will include the key attributes at the root, to identify the entity represented by each tuple, plus all the attributes of E.

Example:

1. Movies (title, year, length, filmType) 2. MurderMysteries (title, year, weapon) 3. Cartoons (title, year)

21 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

An Object-Oriented Approach
An alternate strategy for converting isa-hierarchies to relations is to enumerate all the possible subtrees of the hierarchy. For each, create one relation that represents entities that have components in exactly those subtrees; the schema for this relation has all the attributes of any entity set in the subtree.

Example: (refer to image in previous example) Four possible subtrees including the root: 1. Movie alone. Movies (title, year, lenth, filmType) 2. Movies and Cartoons only. MoviesC (title, year, length, filmType) 3. Movies and Murder-Mysteries only. MoviesMM (title, year, length, filmType, weapon) 4. All three entity sets. MoviesCMM (title, year, length, filmType, weapon) We can combine Movies with MoviesC and MoviesMM with MoviesCMM, although doing so loses some information.

Using Null Values to Combine Relations


If we are allowed to use NULL as a value in tuples, we can handle a hierarchy of entity sets with a single relation. This relation has all the attributes belonging to any entity set of the hierarchy.

Example: (based from the previous examples) Movie (title, year, lenth, fimType, weapon)

22 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

Combining Relations:
Relations are sets. Therefore, set operations (, , ) can be applied to relations with respect to the underlying sets to form a new relation. Let A= {1, 2, 3} and B= {1, 2, 3, 4} The relation R1 is on A and the relation R2 is on B: R1= {(1, 1), (2, 2), (3, 3)} and R2= {(1, 1), (1, 2), (1, 3), (1, 4)}

Because a movie has several stars, we are forced to repeat all the information about the movie, once for each star, that the length of the Star Wars is repeated three times once for each stars as is the fact that the movie is owned by Fox. And this redundancy is undesirable, and the purpose of merging or combining relation is to split relation and thereby remove the redundancy.

23 | P a g e Team Crescendo

Chapter VI Logical Design and Relational Data Model

24 | P a g e Team Crescendo

You might also like