You are on page 1of 52

Chapter 1

File Systems and Databases


Database
Persistent collection of data and
Metadata (data about the characteristics of the data and relationships of the data)
Database Management System
collection of programs to manage the database
Manages and enforces database structure
Includes interface to manipulate the data
Allows data to be shared among multiple users

Data vs Information
• Data:
o Raw facts; building blocks of information
o Unprocessed information
• Information:
o Data processed to reveal meaning
• Accurate, relevant, and timely information is key to good decision making
• Good decision making is key to survival in global environment

Importance of DBMS
• Makes data management more efficient and effective
• Query language allows quick answers to ad hoc queries
• Provides easier access to more and better-managed data
• Promotes an integrated view of organization’s operations
• Reduces the chance of inconsistent data
• Helps protect against loss of data

DBMS Manages Interface Between Data and Users


Database Design: Why Design is important
• The database is the foundation of the information system.
• Design should reflect the expected use
• Poor design results in unwanted redundancy
• Poor design leads to inconsistent data
• Poor design leads to poor performance
• Poor design leads to improper information systems operation

Historical Roots
• First business computer applications focused on clerical tasks
• Requests for information quickly followed
• File systems developed to address needs
• Data organized according to expected use
• Data Processing (DP) specialists computerized manual file systems

File Systems
File System Data Management
• Requires extensive programming, typically in Third Generation Language (3GL)
• Leads to islands of information and data redundancy
• Difficult to make ad hoc queries to obtain information
• Difficult to maintain data integrity

Data and Structural Dependence


• Data characteristics are embodied in programs not stored with the data.
• Changes in data characteristics requires modifying programs
• Changes in file structures require modification of related programs

Data Redundancy
• Different and possibly conflicting versions of same data
• Results in problems during data:
o Modification (e.g. address changes)
o Insertion
o Deletion
• Data inconsistency: Lack of integrity

Database Systems
• Database consists of logically related data stored in a single repository
• Advantages over file system management approach:
o Eliminates inconsistency, data anomalies, data dependency, and structural
dependency problems
o Stores data structures, relationships, and access paths

Database vs. File Systems

Database System Environment


Database System Types
• Scale
o Single User (desktop)
o Workgroup
o Enterprise
o Distributed or Federated
• Use
o Production/Transaction
o Decision Support/Data Warehouse

Uses of Databases
• Transactional (or production):
o Supports a company’s day-to-day operations
• Data warehouse:
o Stores data used to generate information required to make tactical or
strategic decisions
o Such decisions typically require “data massaging”
o Often used to store historical data
o Structure is quite different

DBMS Functions
• Metadata/Data Dictionary Management
• Data storage management
• Data transformation and presentation
• Security management and Multiuser access control
• Backup and recovery management
• Data integrity management
• Database language and application programming interfaces
• Database communication interfaces

Database Models
Collection of logical constructs used to represent data structure and relationships

• Conceptual Models: logical nature of data representation


• Implementation Models: how data are represented

Database Models
The hierarchical and network models are of historical interest only.

Database Models:

• Relational
• Entity-Relationship
• Object oriented

Relational Model
• Most common model
• Perceived by user as collection of tables containing data
• Actually has a formal definitin based on set theory
• Tables are a series of row/column intersections
• Tables related by sharing common entity characteristic(s)

Relational Database
Relational Database Model Advantages
• Structural independence
• Improved conceptual simplicity
• Easier database design, implementation, management, and use
• Ad hoc query capability with SQL (standard interface)
• Powerful database management system

Relational Database Model Disadvantages


• Substantial hardware and system software overhead
• Poor design and implementation is made easy
• Not a cure all: May promote "islands of information" problems
• SQL is not completely standardized. One DBMS is not a "drop in" replacement
for another.
• May have problems storing some types of data

Entity Relationship Database Model


• Primarily a database design tool.
• Complements the relational data model concepts
• Represented in an entity relationship diagram (ERD)
• Based on entities, attributes, and relationships

ER-Diagram

ER Model Advantages
• Conceptual simplicity
• Visual representation
• Effective communication tool
• Integrated with the relational database model

ER Model Disadvatages
• Limited constraint representation
• Limited relationship representation
• No data manipulation language
• Loss of information content
• May be overly complex for end users

Object-Oriented Model
• Objects or abstractions of real-world entities are stored
• Attributes describe properties
• Collection of similar objects is a class
• Methods represent real world actions of classes
• Classes are organized in a class hierarchy
• Objects inherit attributes and methods of classes above.

Object Oriented Model

OO Model Advantages
• Adds semantic context
• Structural and data independence
• May mesh well with Object Oriented Programming

OO Model Disadvantages
• Lack of standards in model
• Lack of standard manipulation languages
• Complex navigational data access
• Steep learning curve
• Poor performance

Notes

Chapter 4: Entity Relationship


Modeling
• How relationships between entities are defined and refined, and how such
relationships are incorporated into the database design process
• How ERD components affect database design and implementation
• How to interpret the modeling symbols for the four most popular ER
modeling tools
• That real-world database design often requires that you reconcile
conflicting goals

Entity Relationship Model and Diagram


• ER model forms the basis of an ER diagram
• ERD represents the conceptual database as viewed by end user
• ERDs depict the ER model’s three main components:
o Entities
o Attributes
o Relationships
• Several different diagramming conventions

Entities
• Refers to the entity set and not to a single entity occurrence
• Corresponds to a table and not to a row in the relational environment
• In both the Chen and Crow’s Foot models, an entity is represented by a
rectangle containing the entity’s name
• Entity name, a noun, is usually written in capital letters
Attributes
• Characteristics of entities
• Domain is set of possible values
• Primary keys underlined

Attributes (cont)

Attributes (cont)
Simple
Cannot be subdivided
Age, sex, GPA
Composite
Can be subdivided
Address: street city state zip
Single-valued
Has only a single value
Social security number
Multi-valued
Can have many values
Person may have several college degrees
Derived
Can be calculated from other information
Age can be derived from D.O.B.

Multivalued Attributes
Resolving Multivalued Attribute Problems
Although the conceptual model can handle multivalued attributes, you should not
implement them in the relational DBMS

• Within original entity, create several new attributes, one for each of the
original multivalued attribute’s components
o Can lead to major structural problems in the table
• Create a new entity composed of original multivalued attribute’s
components

Creating New Attributes

Creating New Entity Set


Relationships
• Associations between entities
• Established by Business Rules
• Connected entities termed participants
• Connectivity describes relationship classification:
o 1:1, 1:M, M:N
• Cardinality
o Number of entity occurences associated with one occurence of
related entity

Connectivity and Cardinality in an ERD


Relationship Strength
• Existence Dependent
o Entity's existence depends on existence of another related entities
o Existence-independent entities can exist apart from related entities
o Employee claims Child
Child is dependent on employee
• Weak (non-identifying)
o One entity is existence-independent on another
o PK of dependent entity doesn't contain PK component of parent
entity
o Book is somewhat confused on this
• Strong (identifying)
o One entity is existence-dependent on another
o PK of related entity contains PK component of parent entity

Relationship Participation
• Optional
o Entity occurrence does not require a corresponding occurrence in
related entity
o Shown by drawing a small circle on side of optional entity on ERD
• Mandatory
o Entity occurrence requires corresponding occurrence in related
entity
o If no optionality symbol is shown on ERD, it is mandatory

Weak Entity
• Existence-dependent on another entity
• Has primary key that is partially or totally derived from parent entity
Mandatory Class Course relationship
Optional Class Entity in Professor Teaches Class

Degree of Relationship
A relationships degree indicates the number of associated entities.

Implementation of a Ternary Relationship


Composite Entity
• Used to replace M:N relationships with 1:N relationships
• Bridge entities composed of primary keys of each entity needing
connection
Entity Subtypes and Supertypes
Generalization Hierarchy
• Depicts relationships between higher-level supertype and lower-level
subtype entities
• Supertype has shared attributes
• Subtypes have unique attributes
• Disjoint relationships
o Unique subtypes
o Non-overlapping
o Indicated with a `G'
• Overlapping subtypes use `Gs' Symbol

Nulls Created by Unique Attributes


Generalization Hierarchy: Disjoint

Generalization Hierarchy: Overlapping and Disjoint


Supertype/Subtype relationship in an ERD

Comparison of ER Modeling Symbols


Developing an E-R Diagram
• Iterative Process
1. Develop general narrative of organizational operations
2. Draw Basic E-R Model
3. Modify E-R model to incorporate newly discovered
components/relationships
• Repeat until designers and users agree E-R model comple

Dealing with Conflicting Goals in Database Design


• Database must be designed to conform to design standards
• High-speed processing may require design compromises
• Quest for timely information may be the focus of database design
Other concerns:
o Security
o Performance
o Shared access
o Integrity
o Capabilities of actual DBMS

Notes

Chapter 6 Notes
• The basic commands and functions of SQL
• How to use SQL for data administration (to create tables, indexes, and
views)
• How to use SQL for data manipulation (to add, modify, delete, and
retrieve data)
• How to use SQL to query a database to extract useful information

Introduction to SQL Part I


• The relational DBMS is the standard for database management.
• The Structured Query Language, SQL, is the standard for working with
them
• This chapter is an introduction to essential SQL.
SQL strengths
Covers both

• Data definition
• Data manipulation

SQL is relatively easy to learn.

ANSI prescribes a standard SQL.

SQL Weaknesses
• Some eccentric notation
o use of ' marks; in strings, rather than "
o Wildcards: % instead of *
• Some things are hard to do
• Different Vendors implement different dialects
• Not a good conceptual match to most programming language
• Strictly DDL and DML no standard procedural language.

DB2 concepts
• DB2 consists of multiple "instances" on each server (we have one)
• Within each instance there are databases: we have two, SAMPLE and
DBMS, and will be using DBMS
• Within the databases are schemas one for each user.
• Authorization for users is via the system (cs1 account). Your schema name
and user name are the same as your username on CS1

Setup for demonstrations


1. Use the winsql program to connect to the datasource DBMS using your
CS1 username and password
2. Perform the following commands to create the tables:
3. drop table vendor; drop table product; drop table
customer;
4. create table vendor like CH06_SALESCO.vendor;
5. insert into vendor select * from CH06_SALESCO.vendor;
6. alter table vendor add primary key (v_code);
7.
8. create table product like CH06_SALESCO.product;
9. insert into product select * from
CH06_SALESCO.product;
10. alter table product add primary key (p_code)
11. add foreign key (v_code) references vendor on delete
set null on update restrict;
12.
Data Definition Commands
The Database Model
Simple Database -- PRODUCT and VENDOR tables
Each product is supplied by only a single vendor.
A vendor may supply many products.

Data Definition Commands


The Tables and Their Components

• The VENDOR table contains vendors who are not referenced in the
PRODUCT table. PRODUCT is optional to VENDOR.
• Existing V_CODE values in the PRODUCT table must have a match in
the VENDOR table.
• A few products are supplied factory-direct, a few are made in-house, and a
few may have been bought in a special warehouse sale. That is, a product is not
necessarily supplied by a vendor. VENDOR is optional to PRODUCT.

Common SQL Datatypes

Data Type SQL


Numeric NUMBER(L,D)
DECIMAL(L,D)
Data Type SQL
INTEGER
SMALLINT
Character CHAR (L)
VARCHAR (L)
Date DATE

Data Definition Commands


Creating the Database Structure

This varies among databases. In DB2 there are instances

within each instance there are databases

within each database there are schemas (one for each user). DB2 users are the same as
Operating System users.

Statements in DB2 referencing a table include schema (SELECT * FROM


schema.tablename) (current schema are implicit).

Data Definition Commands


Creating Table Structures
CREATE TABLE <table name>(
<attribute1 name and attribute1 characteristics,
attribute2 name and attribute2 characteristics,
attribute3 name and attribute3 characteristics,
primary key designation,
foreign key designation and foreign key requirements>);

Data Definition Commands


CREATE TABLE VENDOR(
V_CODE INTEGER NOT NULL PRIMARY KEY DEFAULT 0,
V_NAME VARCHAR(15),
V_CONTACT VARCHAR(50),
V_AREACODE VARCHAR(3),
V_PHONE VARCHAR(8),
V_STATE VARCHAR(2),
V_ORDER VARCHAR(1)
)

Data Definition Commands


CREATE TABLE PRODUCT(
P_CODE VARCHAR(10) NOT NULL PRIMARY KEY,
P_DESCRIPT VARCHAR(35),
P_INDATE DATE,
P_ONHAND SMALLINT DEFAULT 0,
P_MIN SMALLINT DEFAULT 0,
P_PRICE DECIMAL(15, 2) DEFAULT 0,
P_DISCOUNT DOUBLE DEFAULT 0,
V_CODE INTEGER DEFAULT 0 REFERENCES VENDOR(V_CODE)
)

Data Definition Commands


SQL Integrity Constraints
Entity Integrity
PRIMARY KEY
NOT NULL and UNIQUE
Referential Integrity
FOREIGN KEY
ON DELETE
ON UPDATE
Check Constraint
Validates data when an attribute value is entered

Basic Data Management


Data Entry
INSERT INTO <table name> VALUES (attribute 1 value, attribute 2
value, ... etc.);

INSERT INTO VENDOR VALUES(26000, 'Quality Tools', 'Johnson',


'915','555-3234', 'TX', 'N');
INSERT INTO PRODUCT VALUES('14 ABC12', 'Concrete Saw', '09/02/1996', 2,
1, 510.99, 0.00, 26000, '');

Basic Data Management


Committing Changes

Changes do not take place until they are committed assuming autocommit is off. Many
end user query environments (including ours) do not support turning autocommit off.

COMMIT ;

Listing the Table Contents


SELECT * FROM PRODUCT;
SELECT P_CODE, P_DESCRIPT, P_INDATE, P_ONHAND,
P_MIN, P_PRICE, P_DISCOUNT, V_CODE
FROM PRODUCT;

Basic Data Management


Making a Correction
UPDATE PRODUCT SET P_INDATE = '2003-11-15'
WHERE P_CODE = '13-Q2/P2';

UPDATE PRODUCT SET P_INDATE = '2003-11-15', P_PRICE = 15.99, P_MIN = 10


WHERE P_CODE = '13-Q2/P2';

Restoring the Table Contents (assumes autocommit).


ROLLBACK

Basic Data Management


Deleting Table Rows
DELETE FROM PRODUCT WHERE P_CODE = '2238/QPD';

DELETE FROM PRODUCT WHERE P_MIN = 5;

Delete is a dangerous command. Typing:


DELETE FROM <table>
will neatly delete all the records in the table!

Queries
Partial Listing of Table Contents
SELECT <column(s)>FROM <table name>WHERE <conditions>;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE


FROM PRODUCT
WHERE V_CODE = 21344;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE


FROM PRODUCT
WHERE V_CODE <> 21344;

SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE


FROM PRODUCT
WHERE P_PRICE <= 10;

Queries
Using Mathematical Operators on Character Attributes
SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE
FROM PRODUCT
WHERE P_CODE < '1558-QWI';

Using Mathematical Operators on Dates


SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE, P_INDATE
FROM PRODUCT
WHERE P_INDATE >= '01/01/2004';

Queries
Logical Operators: AND, OR, and NOT
SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE = 21344 OR V_CODE = 21225;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE


FROM PRODUCT
WHERE P_PRICE < 50 AND P_INDATE > '01/01/2004';

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE


FROM PRODUCT
WHERE (P_PRICE < 50 AND P_INDATE > '01/01/2004')
OR V_CODE = 24288;

Queries: Special Operators


• BETWEEN - used to define range limits.
• IS NULL - used to check whether an attribute value is null
• LIKE - used to check for similar character strings.
• IN - used to check whether an attribute value matches a value contained
within a (sub)set of listed values.
• EXISTS - used to check whether an attribute has a value. In effect,
EXISTS is the opposite of IS NULL.

Queries: Special Operators


BETWEEN is used to define range limits.
SELECT *
FROM PRODUCT
WHERE P_PRICE BETWEEN 10.00 AND 100.00;

SELECT * FROM PRODUCT


WHERE P_PRICE > 10.00 AND P_PRICE < 100.00;

Queries: Special Operators


IS NULL is used to check whether an attribute value is null.
SELECT P_CODE, P_DESCRIPT
FROM PRODUCT
WHERE V_CODE IS NULL;
Special Operators
LIKE is used to check for similar character strings.

Note the difference between these queries. Are SQL Strings case sensitive?

SELECT *
FROM VENDOR
WHERE V_CONTACT LIKE 'Smith%';

SELECT *
FROM VENDOR
WHERE V_CONTACT LIKE 'SMITH%';

Special Operators
IN is used to check whether an attribute value matches a value
contained within a (sub)set of listed values.
SELECT * FROM PRODUCT WHERE V_CODE IN (21344, 24288);

EXISTS is used to check whether an attribute has value.

Advanced Data Management Commands


Changing Table Structures

Note: DB2 allows few column modifications

ALTER TABLE <table name>MODIFY (<column name> <new column


characteristics>);

ALTER TABLE <table name>ADD (<column name> <new column


characteristics>);

Changing a Column's Data Type


Probably illegal in DB2

ALTER TABLE PRODUCT MODIFY (V_CODE CHAR(5));


ALTER TABLE PRODUCT MODIFY (P_PRICE DECIMAL(9,2));

Adding a New Column to the Table


ALTER TABLE PRODUCT ADD column P_SALECODE CHAR(1) ;

Updating Data
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_CODE = '1546-QQ2';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_CODE IN ('13-Q2/P2',
'2232/QTY');
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_INDATE < '01/01/2004';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_INDATE >= '01/01/2004'AND
P_INDATE < '10/20/2004';

Copying Tables
Copying table definitions and data
CREATE TABLE NEWPRODUCT LIKE PRODUCT;

INSERT INTO NEWPRODUCT SELECT * FROM PRODUCT;

Copying Parts of Tables


CREATE TABLE PART
(PART_CODE CHAR(8) NOT NULL UNIQUE,
PART_DESCRIPT CHAR(35),
PART_PRICE DECIMAL(8,2),
PRIMARY KEY(PART_CODE));

INSERT INTO PART (PART_CODE, PART_DESCRIPT, PART_PRICE)


SELECT P_CODE, P_DESCRIPT, P_PRICE
FROM PRODUCT;

Deleting a Table from the Database


DROP TABLE <table name>;

DROP TABLE PART;

Primary and Foreign Key Designation


(Note we did these when we created the table)

ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE);

ALTER TABLE PRODUCT ADD FOREIGN KEY (V_CODE) REFERENCES VENDOR;

ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE) ADD FOREIGN KEY (V_CODE)
REFERENCES VENDOR;

Notes

Chapter 6 Structured Query


Language (SQL)
More Complex Queries and SQL
Functions
Ordering a Listing
ORDER BY <attributes>

SELECT P_CODE, P_DESCRIPT, P_INDATE, P_PRICE


FROM PRODUCT
ORDER BY P_PRICE;

SELECT P_CODE, P_DESCRIPT, P_INDATE, P_PRICE


FROM PRODUCT
WHERE P_INDATE < '08/21/2002' AND P_PRICE <= 50.00
ORDER BY V_CODE, P_PRICE DESC;

Listing Unique Values


SELECT DISTINCT V_CODE
FROM PRODUCT;

SELECT V_CODE FROM PRODUCT;

More Complex Queries and SQL Functions

Function Output
COUNT Number of rows containing the specific attribute
MIN Minimum attribute value encountered
MAX Maximum attribute value encountered
AVG Arithmetic mean of attribute values
SUM Sum of attribute values

COUNT
How many products are there?

SELECT COUNT(*) FROM PRODUCT ;

How many different vendors are represented in the PRODUCT table?

SELECT COUNT(DISTINCT V_CODE)


FROM PRODUCT
WHERE V_CODE IS NOT NULL ;

SUM
Calculate inventory value
SELECT SUM(P_ONHAND*P_PRICE)
FROM PRODUCT;

Average
AVG

What is the average product price

SELECT AVG(P_PRICE)
FROM PRODUCT;

Min and Max


SELECT MIN(P_PRICE)
FROM PRODUCT ;

SELECT MAX(P_PRICE)
FROM PRODUCT ;

Nested query or Subquery


What products have an above average price?

SELECT P_DESCRIPT, P_ONHAND, P_PRICE, V_CODE


FROM PRODUCT
WHERE P_PRICE > (SELECT AVG(P_PRICE)
FROM PRODUCT)
ORDER BY P_PRICE DESC;

Grouping Data
GROUP BY
SELECT V_CODE, COUNT (P_CODE), AVG(P_PRICE)
FROM PRODUCT
GROUP BY V_CODE;

Note that only the attribute(s) being grouped by can appear by themselves in the select.
All other attributes need to be in an aggregate function.

HAVING

This is analogous to WHERE but uses predicates from the GROUP BY

SELECT V_CODE, COUNT (DISTINCT V_CODE), AVG (P_PRICE)


FROM PRODUCT
GROUP BY V_CODE
HAVING AVG(P_PRICE)<=10;
Virtual Tables: Creating a View
CREATE VIEW PRODUCT_3 AS
SELECT P_DESCRIPT, P_ONHAND, P_PRICE
FROM PRODUCT
WHERE P_PRICE > 50.00;

SELECT * FROM PRODUCT_3;

SQL Indexes
The use of Indexes leads to faster performance and helps with data integrity.

CREATE INDEX V_CODEX ON PRODUCT(V_CODE);

It is not usually necessary to create indexes for primary keys but they are useful for
alternate keys

CREATE UNIQUE INDEX V_NAME ON VENDOR(V_NAME);

Joining Database Tables


SELECT PRODUCT.P_DESCRIPT, PRODUCT.P_PRICE, VENDOR.V_NAME,
VENDOR.V_CONTACT, VENDOR.V_AREACODE, VENDOR.V_PHONE
FROM PRODUCT, VENDOR
WHERE PRODUCT.V_CODE = VENDOR.V_CODE;

Aliases

Aliases give us a shorter name for a table.

SELECT P.P_DESCRIPT, P.P_PRICE, V.V_NAME, V.V_CONTACT, V.V_AREACODE,


V.V_PHONE
FROM PRODUCT P, VENDOR V
WHERE P.V_CODE = V.V_CODE AND P_INDATE > '08/15/2002';

Renaming Columns
SELECT P.P_DESCRIPT as "Description", P.P_PRICE as "Price",
V.V_NAME as "Vendor", V.V_CONTACT as "Contact", V.V_AREACODE
as "Area", V.V_PHONE as "Phone"
FROM PRODUCT P, VENDOR V
WHERE P.V_CODE = V.V_CODE AND P_INDATE > '08/15/2002';

Casting values

This is a DB2 function that pretties up your input:


SELECT V_CODE, COUNT (P_CODE) as "Product Count", cast(AVG(P_PRICE) as
decimal(5,2)) as "Product price"
FROM PRODUCT
GROUP BY V_CODE;

Outer Joins
These allow selection of rows where there are no matching rows in the table joined.

SELECT P_CODE, VENDOR.V_CODE, V_NAME


FROM VENDOR
LEFT JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE

SELECT P_CODE, VENDOR.V_CODE, V_NAME


FROM VENDOR
RIGHT JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE

SELECT P_CODE, VENDOR.V_CODE, V_NAME


FROM VENDOR
FULL OUTER JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE

Note we can get the conventional join as well:

SELECT P_CODE, VENDOR.V_CODE, V_NAME


FROM VENDOR
JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE

SELECT P_CODE, VENDOR.V_CODE, V_NAME


FROM VENDOR,PRODUCT
WHERE VENDOR.V_CODE = PRODUCT.V_CODE

Converting an ER Model to a
Database Structure
• Requires following specific rules that govern such a conversion
• Decisions made by the designer to govern data integrity are reflected in
the foreign key rules
• Implementation decisions vary according to the problem being addressed

Foreign Key Rules


Chapter 5: Normalization
• What is Normalization?
• Why is its done?
• The normal forms 1NF, 2NF, 3NF, BCNF, 4NF
• Transforming normal forms
• E-R modeling and normalization
• Denormalization

Database Tables and Normalization


Normalization is a process for assigning attributes to entities to:

• Reduce data redundancies


o Help eliminate data anomalies
• Produce controlled redundancies to link tables
• No information is lost in normalization
• Result will be a database that can produce the same information as the
original

Normalization Process
Normalizatin works through a series of stages called normal forms:

• First Normal form (1NF)


• Second normal form (2NF)
• Third normal form (3NF)
• etc (4th and 5th)

• 2NF is better than 1NF; 3NF is better than 2NF


• For most business database design purposes, 3NF is highest we need to go
in the normalization process
• Highest level of normalization is not always most desirable

The Need for Normalization


Example: company that manages building projects

• Charges its clients by billing hours spent on each contract


• Hourly billing rate is dependent on employee’s position
• Periodically, a report is generated that contains information as follows

Sample Report
Table derived form Above
The Need for Normalization
• Structure of data set in Figure 5.1 does not handle data very well
• The table structure appears to work; report is generated with ease
• Unfortunately, the report may yield different results, depending on what
data anomaly has occurred

Issues
• Table entries invite data inconsistencies
• Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103
leaves lose info on Elect Engineers

Repeating Group
Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist
for any single key attribute occurrence

• Relational table must not contain repeating groups


• Normalizing the table structure will reduce these data redundancies
• Normalization is three-step procedure

Converting to First Normal Form


A table in a relational database must be in 1NF.

• Repeating groups must be eliminated


• Primary key determined
o Uniquely identify attribute values (rows)
o All attributes dependent on primary key
o In example: Combination of PROJ_NUM and EMP_NUM

Dependencies
• Dependencies can be depicted with the help of a diagram
• Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird’s-eye view of all relationships among a
table’s attributes
o Use makes it much less likely that an important dependency will
be overlooked

• Desirable dependencies based on entire primary key


• Less desirable dependencies

Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute

Dependency Diagram
1NF: Definition
• Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
• All relational tables must satisfy 1NF requirements
• Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with
caution
• Still subject to data redundancies

Second Normal Form


1. Identify all key components
• Write each key component on separate line
• Write original key on last line
• Write dependent attributes after each key.
2. Each line will become a new table

Second Normal Form Conversion Results


Second Normal Form Defined
Table is in second normal form (2NF) if:

• It is in 1NF and
• It includes no partial dependencies:
• No attribute is dependent on only a portion of the primary key

Converting to Third Normal Form


• Resolve transitive dependencies (attributes dependent on non-key
attributes)
• Create separate table for each transitive dependency

3NF Results
Boyce-Codd Normal Form
• Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not
chosen to be primary key
• If a table contains only one candidate key, the 3NF and the BCNF are
equivalent
• BCNF can be violated only if the table contains more than one candidate
key

BCNF (cont)
• Most designers consider the Boyce-Codd normal form (BCNF) as a
special case of 3NF
• A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is
dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute

Table in 3nf but not BCNF

Decomposition to BCNF
Fourth Normal Form
• Table is in 3NF
• Has no multiple sets of multivalued dependencies
Conversion to 4NF
• 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of
each other
o No row contains two or more multivalued facts about an entity

Improving the Design


• Table structures are cleaned up to eliminate the troublesome initial partial
and transitive dependencies
• Normalization cannot, by itself, be relied on to make good designs
• It is valuable because its use helps eliminate data redundancies
Improving the Design (cont)
The following changes were made:

• PK assignment
• Naming conventions
• Attribute atomicity
• Adding attributes
• Adding relationships
• Refining PKs
• Maintaining historical accuracy
• Dealing with derived attributes

Completed Database

Completed Database: Assign Table


Completed Database: Employee
Final ERD for contracting company
Limitations on System Assigned Keys
• System-assigned primary key may not prevent confusing entries
• Data entries in Table 5.2 are inappropriate because they duplicate existing
records
• Yet there has been no violation of either entity integrity or referential
integrity
• Perhaps Job Description needs to be unique

Normalization and Database Design


• Normalization should be part of design process
• Make sure that proposed entities meet required normal form before table
structures are created
• Many real-world databases have been improperly designed or burdened
with anomalies if improperly modified during course of time
• You may be asked to redesign and modify existing databases
Normalization and Database Design (cont)
• E-R Diagram provides macro view, determines entities
• Normalization provides micro view of entities
o Focuses on characteristics of specific entities
o May yield additional entities
• Difficult to separate Normalization and ER diagramming
• Extra check: No attribute that is not a (primary/foreign) key should be
repeated in the database (except to record historical data)

Denormalization
• Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
• Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
• Do not be too quick to denormalize

Unnormalized Table Defects


• Data updates less efficient
• Indexing more cumbersome
• No simple strategies for creating views

Overnormalization
• This is done for performance reason frequently in distributed and clustered
database systems
• Splits tables beyond the pont required for normal forms.
• In horizontal partitioning rows of a single logical table are split among
several physical tables (e.g. geographically)
• In vertical partitioning a table is split vertically with commonly accessed
columns in separate physical tables

Notes

You might also like