Chapter 1

Chapter 1
File Systems and Databases

Database
Persistent collection of data and
Metadata (data about the characteristics of the data and relationships of the data)
Database Management System
collection of programs to manage the database
Manages and enforces database structure
Includes interface to manipulate the data
Allows data to be shared among multiple users
Data vs Information
• Data:
o Raw facts; building blocks of information
o Unprocessed information
• Information:
o Data processed to reveal meaning
• Accurate, relevant, and timely information is key to good decision making
• Good decision making is key to survival in global environment
Importance of DBMS
• Makes data management more efficient and effective
• Query language allows quick answers to ad hoc queries
• Provides easier access to more and better-managed data
• Promotes an integrated view of organization’s operations
• Reduces the chance of inconsistent data
• Helps protect against loss of data
DBMS Manages Interface Between Data and Users

Database Design: Why Design is important
• The database is the foundation of the information system.
• Design should reflect the expected use
• Poor design results in unwanted redundancy
• Poor design leads to inconsistent data
• Poor design leads to poor performance
• Poor design leads to improper information systems operation
Historical Roots
• First business computer applications focused on clerical tasks
• Requests for information quickly followed
• File systems developed to address needs
• Data organized according to expected use
• Data Processing (DP) specialists computerized manual file systems
File Systems
File System Data Management
• Requires extensive programming, typically in Third Generation Language (3GL)
• Leads to islands of information and data redundancy
• Difficult to make ad hoc queries to obtain information
• Difficult to maintain data integrity
Data and Structural Dependence

• Data characteristics are embodied in programs not stored with the data.
• Changes in data characteristics requires modifying programs
• Changes in file structures require modification of related programs
Data Redundancy
• Different and possibly conflicting versions of same data
• Results in problems during data:
o Modification (e.g. address changes)
o Insertion
o Deletion
• Data inconsistency: Lack of integrity
Database Systems
• Database consists of logically related data stored in a single repository
• Advantages over file system management approach:
o Eliminates inconsistency, data anomalies, data dependency, and structural
dependency problems
o Stores data structures, relationships, and access paths
Database vs. File Systems
Database System Environment

Database System Types
• Scale
o Single User (desktop)
o Workgroup
o Enterprise
o Distributed or Federated
• Use
o Production/Transaction
o Decision Support/Data Warehouse
Uses of Databases
• Transactional (or production):
o Supports a company’s day-to-day operations
• Data warehouse:
o Stores data used to generate information required to make tactical or
strategic decisions
o Such decisions typically require “data massaging”
o Often used to store historical data
o Structure is quite different
DBMS Functions
• Metadata/Data Dictionary Management
• Data storage management
• Data transformation and presentation
• Security management and Multiuser access control
• Backup and recovery management
• Data integrity management
• Database language and application programming interfaces
• Database communication interfaces
Database Models
Collection of logical constructs used to represent data structure and relationships
• Conceptual Models: logical nature of data representation

• Implementation Models: how data are represented
Database Models
The hierarchical and network models are of historical interest only.
Database Models:
• Relational
• Entity-Relationship
• Object oriented
Relational Model
• Most common model
• Perceived by user as collection of tables containing data
• Actually has a formal definitin based on set theory
• Tables are a series of row/column intersections
• Tables related by sharing common entity characteristic(s)
Relational Database
Relational Database Model Advantages
• Structural independence
• Improved conceptual simplicity
• Easier database design, implementation, management, and use
• Ad hoc query capability with SQL (standard interface)
• Powerful database management system
Relational Database Model Disadvantages

• Substantial hardware and system software overhead
• Poor design and implementation is made easy
• Not a cure all: May promote "islands of information" problems
• SQL is not completely standardized. One DBMS is not a "drop in" replacement
for another.
• May have problems storing some types of data
Entity Relationship Database Model

• Primarily a database design tool.
• Complements the relational data model concepts
• Represented in an entity relationship diagram (ERD)
• Based on entities, attributes, and relationships
ER-Diagram
ER Model Advantages
• Conceptual simplicity
• Visual representation
• Effective communication tool
• Integrated with the relational database model
ER Model Disadvatages
• Limited constraint representation
• Limited relationship representation
• No data manipulation language
• Loss of information content
• May be overly complex for end users
Object-Oriented Model
• Objects or abstractions of real-world entities are stored
• Attributes describe properties
• Collection of similar objects is a class
• Methods represent real world actions of classes
• Classes are organized in a class hierarchy
• Objects inherit attributes and methods of classes above.
Object Oriented Model
OO Model Advantages
• Adds semantic context
• Structural and data independence
• May mesh well with Object Oriented Programming
OO Model Disadvantages
• Lack of standards in model
• Lack of standard manipulation languages
• Complex navigational data access
• Steep learning curve
• Poor performance
Notes
Chapter 4: Entity Relationship

Modeling
• How relationships between entities are defined and refined, and how such
relationships are incorporated into the database design process
• How ERD components affect database design and implementation
• How to interpret the modeling symbols for the four most popular ER
modeling tools
• That real-world database design often requires that you reconcile
conflicting goals
Entity Relationship Model and Diagram

• ER model forms the basis of an ER diagram
• ERD represents the conceptual database as viewed by end user
• ERDs depict the ER model’s three main components:
o Entities
o Attributes
o Relationships
• Several different diagramming conventions
Entities
• Refers to the entity set and not to a single entity occurrence
• Corresponds to a table and not to a row in the relational environment
• In both the Chen and Crow’s Foot models, an entity is represented by a
rectangle containing the entity’s name
• Entity name, a noun, is usually written in capital letters
Attributes
• Characteristics of entities
• Domain is set of possible values
• Primary keys underlined
Attributes (cont)
Attributes (cont)
Simple
Cannot be subdivided
Age, sex, GPA
Composite
Can be subdivided
Address: street city state zip
Single-valued
Has only a single value
Social security number
Multi-valued
Can have many values
Person may have several college degrees
Derived
Can be calculated from other information
Age can be derived from D.O.B.
Multivalued Attributes
Resolving Multivalued Attribute Problems
Although the conceptual model can handle multivalued attributes, you should not
implement them in the relational DBMS
• Within original entity, create several new attributes, one for each of the
original multivalued attribute’s components
o Can lead to major structural problems in the table
• Create a new entity composed of original multivalued attribute’s
components
Creating New Attributes
Creating New Entity Set

Relationships
• Associations between entities
• Established by Business Rules
• Connected entities termed participants
• Connectivity describes relationship classification:
o 1:1, 1:M, M:N
• Cardinality
o Number of entity occurences associated with one occurence of
related entity
Connectivity and Cardinality in an ERD

Relationship Strength
• Existence Dependent
o Entity's existence depends on existence of another related entities
o Existence-independent entities can exist apart from related entities
o Employee claims Child
Child is dependent on employee
• Weak (non-identifying)
o One entity is existence-independent on another
o PK of dependent entity doesn't contain PK component of parent
entity
o Book is somewhat confused on this
• Strong (identifying)
o One entity is existence-dependent on another
o PK of related entity contains PK component of parent entity
Relationship Participation
• Optional
o Entity occurrence does not require a corresponding occurrence in
related entity
o Shown by drawing a small circle on side of optional entity on ERD
• Mandatory
o Entity occurrence requires corresponding occurrence in related
entity
o If no optionality symbol is shown on ERD, it is mandatory
Weak Entity
• Existence-dependent on another entity
• Has primary key that is partially or totally derived from parent entity
Mandatory Class Course relationship
Optional Class Entity in Professor Teaches Class
Degree of Relationship
A relationships degree indicates the number of associated entities.
Implementation of a Ternary Relationship

Composite Entity
• Used to replace M:N relationships with 1:N relationships
• Bridge entities composed of primary keys of each entity needing
connection
Entity Subtypes and Supertypes
Generalization Hierarchy
• Depicts relationships between higher-level supertype and lower-level
subtype entities
• Supertype has shared attributes
• Subtypes have unique attributes
• Disjoint relationships
o Unique subtypes
o Non-overlapping
o Indicated with a `G'
• Overlapping subtypes use `Gs' Symbol
Nulls Created by Unique Attributes

Generalization Hierarchy: Disjoint
Generalization Hierarchy: Overlapping and Disjoint

Supertype/Subtype relationship in an ERD
Comparison of ER Modeling Symbols

Developing an E-R Diagram
• Iterative Process
1. Develop general narrative of organizational operations
2. Draw Basic E-R Model
3. Modify E-R model to incorporate newly discovered
components/relationships
• Repeat until designers and users agree E-R model comple
Dealing with Conflicting Goals in Database Design

• Database must be designed to conform to design standards
• High-speed processing may require design compromises
• Quest for timely information may be the focus of database design
Other concerns:
o Security
o Performance
o Shared access
o Integrity
o Capabilities of actual DBMS
Notes
Chapter 6 Notes
• The basic commands and functions of SQL
• How to use SQL for data administration (to create tables, indexes, and
views)
• How to use SQL for data manipulation (to add, modify, delete, and
retrieve data)
• How to use SQL to query a database to extract useful information
Introduction to SQL Part I

• The relational DBMS is the standard for database management.
• The Structured Query Language, SQL, is the standard for working with
them
• This chapter is an introduction to essential SQL.
SQL strengths
Covers both
• Data definition
• Data manipulation
SQL is relatively easy to learn.
ANSI prescribes a standard SQL.
SQL Weaknesses
• Some eccentric notation
o use of ' marks; in strings, rather than "
o Wildcards: % instead of *
• Some things are hard to do
• Different Vendors implement different dialects
• Not a good conceptual match to most programming language
• Strictly DDL and DML no standard procedural language.
DB2 concepts
• DB2 consists of multiple "instances" on each server (we have one)
• Within each instance there are databases: we have two, SAMPLE and
DBMS, and will be using DBMS
• Within the databases are schemas one for each user.
• Authorization for users is via the system (cs1 account). Your schema name
and user name are the same as your username on CS1
Setup for demonstrations

1. Use the winsql program to connect to the datasource DBMS using your
CS1 username and password
2. Perform the following commands to create the tables:
3. drop table vendor; drop table product; drop table
customer;
4. create table vendor like CH06_SALESCO.vendor;
5. insert into vendor select * from CH06_SALESCO.vendor;
6. alter table vendor add primary key (v_code);
7.
8. create table product like CH06_SALESCO.product;
9. insert into product select * from
CH06_SALESCO.product;
10. alter table product add primary key (p_code)
11. add foreign key (v_code) references vendor on delete
set null on update restrict;
12.
Data Definition Commands
The Database Model
Simple Database -- PRODUCT and VENDOR tables
Each product is supplied by only a single vendor.
A vendor may supply many products.

The Tables and Their Components
• The VENDOR table contains vendors who are not referenced in the
PRODUCT table. PRODUCT is optional to VENDOR.
• Existing V_CODE values in the PRODUCT table must have a match in
the VENDOR table.
• A few products are supplied factory-direct, a few are made in-house, and a
few may have been bought in a special warehouse sale. That is, a product is not
necessarily supplied by a vendor. VENDOR is optional to PRODUCT.
Common SQL Datatypes
Data Type SQL

Numeric NUMBER(L,D)
DECIMAL(L,D)
Data Type SQL
INTEGER
SMALLINT
Character CHAR (L)
VARCHAR (L)
Date DATE

Creating the Database Structure
This varies among databases. In DB2 there are instances
within each instance there are databases
within each database there are schemas (one for each user). DB2 users are the same as
Operating System users.
Statements in DB2 referencing a table include schema (SELECT * FROM

schema.tablename) (current schema are implicit).

Creating Table Structures
CREATE TABLE <table name>(
<attribute1 name and attribute1 characteristics,
attribute2 name and attribute2 characteristics,
attribute3 name and attribute3 characteristics,
primary key designation,
foreign key designation and foreign key requirements>);

CREATE TABLE VENDOR(
V_CODE INTEGER NOT NULL PRIMARY KEY DEFAULT 0,
V_NAME VARCHAR(15),
V_CONTACT VARCHAR(50),
V_AREACODE VARCHAR(3),
V_PHONE VARCHAR(8),
V_STATE VARCHAR(2),
V_ORDER VARCHAR(1)
)

CREATE TABLE PRODUCT(
P_CODE VARCHAR(10) NOT NULL PRIMARY KEY,
P_DESCRIPT VARCHAR(35),
P_INDATE DATE,
P_ONHAND SMALLINT DEFAULT 0,
P_MIN SMALLINT DEFAULT 0,
P_PRICE DECIMAL(15, 2) DEFAULT 0,
P_DISCOUNT DOUBLE DEFAULT 0,
V_CODE INTEGER DEFAULT 0 REFERENCES VENDOR(V_CODE)
)

SQL Integrity Constraints
Entity Integrity
PRIMARY KEY
NOT NULL and UNIQUE
Referential Integrity
FOREIGN KEY
ON DELETE
ON UPDATE
Check Constraint
Validates data when an attribute value is entered
Basic Data Management

Data Entry
INSERT INTO <table name> VALUES (attribute 1 value, attribute 2
value, ... etc.);
INSERT INTO VENDOR VALUES(26000, 'Quality Tools', 'Johnson',

'915','555-3234', 'TX', 'N');
INSERT INTO PRODUCT VALUES('14 ABC12', 'Concrete Saw', '09/02/1996', 2,
1, 510.99, 0.00, 26000, '');

Committing Changes
Changes do not take place until they are committed assuming autocommit is off. Many
end user query environments (including ours) do not support turning autocommit off.
COMMIT ;
Listing the Table Contents

SELECT * FROM PRODUCT;
SELECT P_CODE, P_DESCRIPT, P_INDATE, P_ONHAND,
P_MIN, P_PRICE, P_DISCOUNT, V_CODE
FROM PRODUCT;

Making a Correction
UPDATE PRODUCT SET P_INDATE = '2003-11-15'
WHERE P_CODE = '13-Q2/P2';
UPDATE PRODUCT SET P_INDATE = '2003-11-15', P_PRICE = 15.99, P_MIN = 10

WHERE P_CODE = '13-Q2/P2';
Restoring the Table Contents (assumes autocommit).

ROLLBACK

Deleting Table Rows
DELETE FROM PRODUCT WHERE P_CODE = '2238/QPD';
DELETE FROM PRODUCT WHERE P_MIN = 5;
Delete is a dangerous command. Typing:

DELETE FROM <table>
will neatly delete all the records in the table!
Queries
Partial Listing of Table Contents
SELECT <column(s)>FROM <table name>WHERE <conditions>;
SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE

FROM PRODUCT
WHERE V_CODE = 21344;

FROM PRODUCT
WHERE V_CODE <> 21344;
SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE

FROM PRODUCT
WHERE P_PRICE <= 10;
Queries
Using Mathematical Operators on Character Attributes
SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE
FROM PRODUCT
WHERE P_CODE < '1558-QWI';
Using Mathematical Operators on Dates

SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE, P_INDATE
FROM PRODUCT
WHERE P_INDATE >= '01/01/2004';
Queries
Logical Operators: AND, OR, and NOT
FROM PRODUCT
WHERE V_CODE = 21344 OR V_CODE = 21225;

FROM PRODUCT
WHERE P_PRICE < 50 AND P_INDATE > '01/01/2004';

FROM PRODUCT
WHERE (P_PRICE < 50 AND P_INDATE > '01/01/2004')
OR V_CODE = 24288;
Queries: Special Operators

• BETWEEN - used to define range limits.
• IS NULL - used to check whether an attribute value is null
• LIKE - used to check for similar character strings.
• IN - used to check whether an attribute value matches a value contained
within a (sub)set of listed values.
• EXISTS - used to check whether an attribute has a value. In effect,
EXISTS is the opposite of IS NULL.

BETWEEN is used to define range limits.
SELECT *
FROM PRODUCT
WHERE P_PRICE BETWEEN 10.00 AND 100.00;
SELECT * FROM PRODUCT

WHERE P_PRICE > 10.00 AND P_PRICE < 100.00;

IS NULL is used to check whether an attribute value is null.
SELECT P_CODE, P_DESCRIPT
FROM PRODUCT
WHERE V_CODE IS NULL;
Special Operators
LIKE is used to check for similar character strings.
Note the difference between these queries. Are SQL Strings case sensitive?
SELECT *
FROM VENDOR
WHERE V_CONTACT LIKE 'Smith%';
SELECT *
FROM VENDOR
WHERE V_CONTACT LIKE 'SMITH%';
Special Operators
IN is used to check whether an attribute value matches a value
contained within a (sub)set of listed values.
SELECT * FROM PRODUCT WHERE V_CODE IN (21344, 24288);
EXISTS is used to check whether an attribute has value.
Advanced Data Management Commands

Changing Table Structures
Note: DB2 allows few column modifications
ALTER TABLE <table name>MODIFY (<column name> <new column

characteristics>);
ALTER TABLE <table name>ADD (<column name> <new column

characteristics>);
Changing a Column's Data Type

Probably illegal in DB2
ALTER TABLE PRODUCT MODIFY (V_CODE CHAR(5));

ALTER TABLE PRODUCT MODIFY (P_PRICE DECIMAL(9,2));
Adding a New Column to the Table

ALTER TABLE PRODUCT ADD column P_SALECODE CHAR(1) ;
Updating Data
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_CODE = '1546-QQ2';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_CODE IN ('13-Q2/P2',
'2232/QTY');
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_INDATE < '01/01/2004';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_INDATE >= '01/01/2004'AND
P_INDATE < '10/20/2004';
Copying Tables
Copying table definitions and data
CREATE TABLE NEWPRODUCT LIKE PRODUCT;
INSERT INTO NEWPRODUCT SELECT * FROM PRODUCT;
Copying Parts of Tables

CREATE TABLE PART
(PART_CODE CHAR(8) NOT NULL UNIQUE,
PART_DESCRIPT CHAR(35),
PART_PRICE DECIMAL(8,2),
PRIMARY KEY(PART_CODE));
INSERT INTO PART (PART_CODE, PART_DESCRIPT, PART_PRICE)

SELECT P_CODE, P_DESCRIPT, P_PRICE
FROM PRODUCT;
Deleting a Table from the Database

DROP TABLE <table name>;
DROP TABLE PART;
Primary and Foreign Key Designation

(Note we did these when we created the table)
ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE);
ALTER TABLE PRODUCT ADD FOREIGN KEY (V_CODE) REFERENCES VENDOR;
ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE) ADD FOREIGN KEY (V_CODE)
REFERENCES VENDOR;
Notes
Chapter 6 Structured Query

Language (SQL)
More Complex Queries and SQL
Functions
Ordering a Listing
ORDER BY <attributes>
SELECT P_CODE, P_DESCRIPT, P_INDATE, P_PRICE

FROM PRODUCT
ORDER BY P_PRICE;
SELECT P_CODE, P_DESCRIPT, P_INDATE, P_PRICE

FROM PRODUCT
WHERE P_INDATE < '08/21/2002' AND P_PRICE <= 50.00
ORDER BY V_CODE, P_PRICE DESC;
Listing Unique Values

SELECT DISTINCT V_CODE
FROM PRODUCT;
SELECT V_CODE FROM PRODUCT;
More Complex Queries and SQL Functions
Function Output
COUNT Number of rows containing the specific attribute
MIN Minimum attribute value encountered
MAX Maximum attribute value encountered
AVG Arithmetic mean of attribute values
SUM Sum of attribute values
COUNT
How many products are there?
SELECT COUNT(*) FROM PRODUCT ;
How many different vendors are represented in the PRODUCT table?
SELECT COUNT(DISTINCT V_CODE)

FROM PRODUCT
WHERE V_CODE IS NOT NULL ;
SUM
Calculate inventory value
SELECT SUM(P_ONHAND*P_PRICE)
FROM PRODUCT;
Average
AVG
What is the average product price
SELECT AVG(P_PRICE)
FROM PRODUCT;
Min and Max

SELECT MIN(P_PRICE)
FROM PRODUCT ;
SELECT MAX(P_PRICE)
FROM PRODUCT ;
Nested query or Subquery

What products have an above average price?
SELECT P_DESCRIPT, P_ONHAND, P_PRICE, V_CODE

FROM PRODUCT
WHERE P_PRICE > (SELECT AVG(P_PRICE)
FROM PRODUCT)
ORDER BY P_PRICE DESC;
Grouping Data
GROUP BY
SELECT V_CODE, COUNT (P_CODE), AVG(P_PRICE)
FROM PRODUCT
GROUP BY V_CODE;
Note that only the attribute(s) being grouped by can appear by themselves in the select.
All other attributes need to be in an aggregate function.
HAVING
This is analogous to WHERE but uses predicates from the GROUP BY
SELECT V_CODE, COUNT (DISTINCT V_CODE), AVG (P_PRICE)

FROM PRODUCT
GROUP BY V_CODE
HAVING AVG(P_PRICE)<=10;
Virtual Tables: Creating a View
CREATE VIEW PRODUCT_3 AS
SELECT P_DESCRIPT, P_ONHAND, P_PRICE
FROM PRODUCT
WHERE P_PRICE > 50.00;
SELECT * FROM PRODUCT_3;
SQL Indexes
The use of Indexes leads to faster performance and helps with data integrity.
CREATE INDEX V_CODEX ON PRODUCT(V_CODE);
It is not usually necessary to create indexes for primary keys but they are useful for
alternate keys
CREATE UNIQUE INDEX V_NAME ON VENDOR(V_NAME);
Joining Database Tables

SELECT PRODUCT.P_DESCRIPT, PRODUCT.P_PRICE, VENDOR.V_NAME,
VENDOR.V_CONTACT, VENDOR.V_AREACODE, VENDOR.V_PHONE
FROM PRODUCT, VENDOR
WHERE PRODUCT.V_CODE = VENDOR.V_CODE;
Aliases
Aliases give us a shorter name for a table.
SELECT P.P_DESCRIPT, P.P_PRICE, V.V_NAME, V.V_CONTACT, V.V_AREACODE,

V.V_PHONE
FROM PRODUCT P, VENDOR V
WHERE P.V_CODE = V.V_CODE AND P_INDATE > '08/15/2002';
Renaming Columns
SELECT P.P_DESCRIPT as "Description", P.P_PRICE as "Price",
V.V_NAME as "Vendor", V.V_CONTACT as "Contact", V.V_AREACODE
as "Area", V.V_PHONE as "Phone"
FROM PRODUCT P, VENDOR V
WHERE P.V_CODE = V.V_CODE AND P_INDATE > '08/15/2002';
Casting values
This is a DB2 function that pretties up your input:

SELECT V_CODE, COUNT (P_CODE) as "Product Count", cast(AVG(P_PRICE) as
decimal(5,2)) as "Product price"
FROM PRODUCT
GROUP BY V_CODE;
Outer Joins
These allow selection of rows where there are no matching rows in the table joined.
SELECT P_CODE, VENDOR.V_CODE, V_NAME

FROM VENDOR
LEFT JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE

FROM VENDOR
RIGHT JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE

FROM VENDOR
FULL OUTER JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE
Note we can get the conventional join as well:

FROM VENDOR
JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE

FROM VENDOR,PRODUCT
WHERE VENDOR.V_CODE = PRODUCT.V_CODE
Converting an ER Model to a
Database Structure
• Requires following specific rules that govern such a conversion
• Decisions made by the designer to govern data integrity are reflected in
the foreign key rules
• Implementation decisions vary according to the problem being addressed
Foreign Key Rules

Chapter 5: Normalization
• What is Normalization?
• Why is its done?
• The normal forms 1NF, 2NF, 3NF, BCNF, 4NF
• Transforming normal forms
• E-R modeling and normalization
• Denormalization
Database Tables and Normalization

Normalization is a process for assigning attributes to entities to:
• Reduce data redundancies

o Help eliminate data anomalies
• Produce controlled redundancies to link tables
• No information is lost in normalization
• Result will be a database that can produce the same information as the
original
Normalization Process
Normalizatin works through a series of stages called normal forms:
• First Normal form (1NF)

• Second normal form (2NF)
• Third normal form (3NF)
• etc (4th and 5th)
• 2NF is better than 1NF; 3NF is better than 2NF

• For most business database design purposes, 3NF is highest we need to go
in the normalization process
• Highest level of normalization is not always most desirable
The Need for Normalization

Example: company that manages building projects
• Charges its clients by billing hours spent on each contract

• Hourly billing rate is dependent on employee’s position
• Periodically, a report is generated that contains information as follows
Sample Report
Table derived form Above
The Need for Normalization
• Structure of data set in Figure 5.1 does not handle data very well
• The table structure appears to work; report is generated with ease
• Unfortunately, the report may yield different results, depending on what
data anomaly has occurred
Issues
• Table entries invite data inconsistencies
• Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103
leaves lose info on Elect Engineers
Repeating Group
Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist
for any single key attribute occurrence
• Relational table must not contain repeating groups

• Normalizing the table structure will reduce these data redundancies
• Normalization is three-step procedure
Converting to First Normal Form

A table in a relational database must be in 1NF.
• Repeating groups must be eliminated

• Primary key determined
o Uniquely identify attribute values (rows)
o All attributes dependent on primary key
o In example: Combination of PROJ_NUM and EMP_NUM
Dependencies
• Dependencies can be depicted with the help of a diagram
• Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird’s-eye view of all relationships among a
table’s attributes
o Use makes it much less likely that an important dependency will
be overlooked
• Desirable dependencies based on entire primary key

• Less desirable dependencies
Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute
Dependency Diagram
1NF: Definition
• Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
• All relational tables must satisfy 1NF requirements
• Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with
caution
• Still subject to data redundancies
Second Normal Form

1. Identify all key components
• Write each key component on separate line
• Write original key on last line
• Write dependent attributes after each key.
2. Each line will become a new table
Second Normal Form Conversion Results

Second Normal Form Defined
Table is in second normal form (2NF) if:
• It is in 1NF and
• It includes no partial dependencies:
• No attribute is dependent on only a portion of the primary key
Converting to Third Normal Form

• Resolve transitive dependencies (attributes dependent on non-key
attributes)
• Create separate table for each transitive dependency
3NF Results
Boyce-Codd Normal Form
• Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not
chosen to be primary key
• If a table contains only one candidate key, the 3NF and the BCNF are
equivalent
• BCNF can be violated only if the table contains more than one candidate
key
BCNF (cont)
• Most designers consider the Boyce-Codd normal form (BCNF) as a
special case of 3NF
• A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is
dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute
Table in 3nf but not BCNF
Decomposition to BCNF
Fourth Normal Form
• Table is in 3NF
• Has no multiple sets of multivalued dependencies
Conversion to 4NF
• 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of
each other
o No row contains two or more multivalued facts about an entity
Improving the Design

• Table structures are cleaned up to eliminate the troublesome initial partial
and transitive dependencies
• Normalization cannot, by itself, be relied on to make good designs
• It is valuable because its use helps eliminate data redundancies
Improving the Design (cont)
The following changes were made:
• PK assignment
• Naming conventions
• Attribute atomicity
• Adding attributes
• Adding relationships
• Refining PKs
• Maintaining historical accuracy
• Dealing with derived attributes
Completed Database
Completed Database: Assign Table

Completed Database: Employee
Final ERD for contracting company
Limitations on System Assigned Keys
• System-assigned primary key may not prevent confusing entries
• Data entries in Table 5.2 are inappropriate because they duplicate existing
records
• Yet there has been no violation of either entity integrity or referential
integrity
• Perhaps Job Description needs to be unique
Normalization and Database Design

• Normalization should be part of design process
• Make sure that proposed entities meet required normal form before table
structures are created
• Many real-world databases have been improperly designed or burdened
with anomalies if improperly modified during course of time
• You may be asked to redesign and modify existing databases
Normalization and Database Design (cont)
• E-R Diagram provides macro view, determines entities
• Normalization provides micro view of entities
o Focuses on characteristics of specific entities
o May yield additional entities
• Difficult to separate Normalization and ER diagramming
• Extra check: No attribute that is not a (primary/foreign) key should be
repeated in the database (except to record historical data)
Denormalization
• Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
• Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
• Do not be too quick to denormalize
Unnormalized Table Defects

• Data updates less efficient
• Indexing more cumbersome
• No simple strategies for creating views
Overnormalization
• This is done for performance reason frequently in distributed and clustered
database systems
• Splits tables beyond the pont required for normal forms.
• In horizontal partitioning rows of a single logical table are split among
several physical tables (e.g. geographically)
• In vertical partitioning a table is split vertically with commonly accessed
columns in separate physical tables
Notes

Chapter 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1

Uploaded by

Copyright:

Available Formats

Chapter 1

File Systems and Databases

DBMS Manages Interface Between Data and Users

Data and Structural Dependence

Database vs. File Systems

Database System Environment

• Conceptual Models: logical nature of data representation

Relational Database Model Disadvantages

Entity Relationship Database Model

Object Oriented Model

Chapter 4: Entity Relationship

Entity Relationship Model and Diagram

Creating New Attributes

Creating New Entity Set

Connectivity and Cardinality in an ERD

Implementation of a Ternary Relationship

Nulls Created by Unique Attributes

Generalization Hierarchy: Overlapping and Disjoint

Comparison of ER Modeling Symbols

Dealing with Conflicting Goals in Database Design

Introduction to SQL Part I

SQL is relatively easy to learn.

ANSI prescribes a standard SQL.

Setup for demonstrations

Data Definition Commands

Common SQL Datatypes

Data Type SQL

Data Definition Commands

This varies among databases. In DB2 there are instances

within each instance there are databases

Statements in DB2 referencing a table include schema (SELECT * FROM

Data Definition Commands

Data Definition Commands

Data Definition Commands

Data Definition Commands

Basic Data Management

INSERT INTO VENDOR VALUES(26000, 'Quality Tools', 'Johnson',

Basic Data Management

Listing the Table Contents

Basic Data Management

UPDATE PRODUCT SET P_INDATE = '2003-11-15', P_PRICE = 15.99, P_MIN = 10

Restoring the Table Contents (assumes autocommit).

Basic Data Management

DELETE FROM PRODUCT WHERE P_MIN = 5;

Delete is a dangerous command. Typing:

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE

SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE

Using Mathematical Operators on Dates

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE

Queries: Special Operators

Queries: Special Operators

SELECT * FROM PRODUCT

Queries: Special Operators

EXISTS is used to check whether an attribute has value.

Advanced Data Management Commands

Note: DB2 allows few column modifications

ALTER TABLE <table name>MODIFY (<column name> <new column

ALTER TABLE <table name>ADD (<column name> <new column

Changing a Column's Data Type

ALTER TABLE PRODUCT MODIFY (V_CODE CHAR(5));

Adding a New Column to the Table