Professional Documents
Culture Documents
Data vs Information
• Data:
o Raw facts; building blocks of information
o Unprocessed information
• Information:
o Data processed to reveal meaning
• Accurate, relevant, and timely information is key to good decision making
• Good decision making is key to survival in global environment
Importance of DBMS
• Makes data management more efficient and effective
• Query language allows quick answers to ad hoc queries
• Provides easier access to more and better-managed data
• Promotes an integrated view of organization’s operations
• Reduces the chance of inconsistent data
• Helps protect against loss of data
Historical Roots
• First business computer applications focused on clerical tasks
• Requests for information quickly followed
• File systems developed to address needs
• Data organized according to expected use
• Data Processing (DP) specialists computerized manual file systems
File Systems
File System Data Management
• Requires extensive programming, typically in Third Generation Language (3GL)
• Leads to islands of information and data redundancy
• Difficult to make ad hoc queries to obtain information
• Difficult to maintain data integrity
Data Redundancy
• Different and possibly conflicting versions of same data
• Results in problems during data:
o Modification (e.g. address changes)
o Insertion
o Deletion
• Data inconsistency: Lack of integrity
Database Systems
• Database consists of logically related data stored in a single repository
• Advantages over file system management approach:
o Eliminates inconsistency, data anomalies, data dependency, and structural
dependency problems
o Stores data structures, relationships, and access paths
Uses of Databases
• Transactional (or production):
o Supports a company’s day-to-day operations
• Data warehouse:
o Stores data used to generate information required to make tactical or
strategic decisions
o Such decisions typically require “data massaging”
o Often used to store historical data
o Structure is quite different
DBMS Functions
• Metadata/Data Dictionary Management
• Data storage management
• Data transformation and presentation
• Security management and Multiuser access control
• Backup and recovery management
• Data integrity management
• Database language and application programming interfaces
• Database communication interfaces
Database Models
Collection of logical constructs used to represent data structure and relationships
Database Models
The hierarchical and network models are of historical interest only.
Database Models:
• Relational
• Entity-Relationship
• Object oriented
Relational Model
• Most common model
• Perceived by user as collection of tables containing data
• Actually has a formal definitin based on set theory
• Tables are a series of row/column intersections
• Tables related by sharing common entity characteristic(s)
Relational Database
Relational Database Model Advantages
• Structural independence
• Improved conceptual simplicity
• Easier database design, implementation, management, and use
• Ad hoc query capability with SQL (standard interface)
• Powerful database management system
ER-Diagram
ER Model Advantages
• Conceptual simplicity
• Visual representation
• Effective communication tool
• Integrated with the relational database model
ER Model Disadvatages
• Limited constraint representation
• Limited relationship representation
• No data manipulation language
• Loss of information content
• May be overly complex for end users
Object-Oriented Model
• Objects or abstractions of real-world entities are stored
• Attributes describe properties
• Collection of similar objects is a class
• Methods represent real world actions of classes
• Classes are organized in a class hierarchy
• Objects inherit attributes and methods of classes above.
OO Model Advantages
• Adds semantic context
• Structural and data independence
• May mesh well with Object Oriented Programming
OO Model Disadvantages
• Lack of standards in model
• Lack of standard manipulation languages
• Complex navigational data access
• Steep learning curve
• Poor performance
Notes
Entities
• Refers to the entity set and not to a single entity occurrence
• Corresponds to a table and not to a row in the relational environment
• In both the Chen and Crow’s Foot models, an entity is represented by a
rectangle containing the entity’s name
• Entity name, a noun, is usually written in capital letters
Attributes
• Characteristics of entities
• Domain is set of possible values
• Primary keys underlined
Attributes (cont)
Attributes (cont)
Simple
Cannot be subdivided
Age, sex, GPA
Composite
Can be subdivided
Address: street city state zip
Single-valued
Has only a single value
Social security number
Multi-valued
Can have many values
Person may have several college degrees
Derived
Can be calculated from other information
Age can be derived from D.O.B.
Multivalued Attributes
Resolving Multivalued Attribute Problems
Although the conceptual model can handle multivalued attributes, you should not
implement them in the relational DBMS
• Within original entity, create several new attributes, one for each of the
original multivalued attribute’s components
o Can lead to major structural problems in the table
• Create a new entity composed of original multivalued attribute’s
components
Relationship Participation
• Optional
o Entity occurrence does not require a corresponding occurrence in
related entity
o Shown by drawing a small circle on side of optional entity on ERD
• Mandatory
o Entity occurrence requires corresponding occurrence in related
entity
o If no optionality symbol is shown on ERD, it is mandatory
Weak Entity
• Existence-dependent on another entity
• Has primary key that is partially or totally derived from parent entity
Mandatory Class Course relationship
Optional Class Entity in Professor Teaches Class
Degree of Relationship
A relationships degree indicates the number of associated entities.
Notes
Chapter 6 Notes
• The basic commands and functions of SQL
• How to use SQL for data administration (to create tables, indexes, and
views)
• How to use SQL for data manipulation (to add, modify, delete, and
retrieve data)
• How to use SQL to query a database to extract useful information
• Data definition
• Data manipulation
SQL Weaknesses
• Some eccentric notation
o use of ' marks; in strings, rather than "
o Wildcards: % instead of *
• Some things are hard to do
• Different Vendors implement different dialects
• Not a good conceptual match to most programming language
• Strictly DDL and DML no standard procedural language.
DB2 concepts
• DB2 consists of multiple "instances" on each server (we have one)
• Within each instance there are databases: we have two, SAMPLE and
DBMS, and will be using DBMS
• Within the databases are schemas one for each user.
• Authorization for users is via the system (cs1 account). Your schema name
and user name are the same as your username on CS1
• The VENDOR table contains vendors who are not referenced in the
PRODUCT table. PRODUCT is optional to VENDOR.
• Existing V_CODE values in the PRODUCT table must have a match in
the VENDOR table.
• A few products are supplied factory-direct, a few are made in-house, and a
few may have been bought in a special warehouse sale. That is, a product is not
necessarily supplied by a vendor. VENDOR is optional to PRODUCT.
within each database there are schemas (one for each user). DB2 users are the same as
Operating System users.
Changes do not take place until they are committed assuming autocommit is off. Many
end user query environments (including ours) do not support turning autocommit off.
COMMIT ;
Queries
Partial Listing of Table Contents
SELECT <column(s)>FROM <table name>WHERE <conditions>;
Queries
Using Mathematical Operators on Character Attributes
SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE
FROM PRODUCT
WHERE P_CODE < '1558-QWI';
Queries
Logical Operators: AND, OR, and NOT
SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE = 21344 OR V_CODE = 21225;
Note the difference between these queries. Are SQL Strings case sensitive?
SELECT *
FROM VENDOR
WHERE V_CONTACT LIKE 'Smith%';
SELECT *
FROM VENDOR
WHERE V_CONTACT LIKE 'SMITH%';
Special Operators
IN is used to check whether an attribute value matches a value
contained within a (sub)set of listed values.
SELECT * FROM PRODUCT WHERE V_CODE IN (21344, 24288);
Updating Data
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_CODE = '1546-QQ2';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_CODE IN ('13-Q2/P2',
'2232/QTY');
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_INDATE < '01/01/2004';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_INDATE >= '01/01/2004'AND
P_INDATE < '10/20/2004';
Copying Tables
Copying table definitions and data
CREATE TABLE NEWPRODUCT LIKE PRODUCT;
ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE) ADD FOREIGN KEY (V_CODE)
REFERENCES VENDOR;
Notes
Function Output
COUNT Number of rows containing the specific attribute
MIN Minimum attribute value encountered
MAX Maximum attribute value encountered
AVG Arithmetic mean of attribute values
SUM Sum of attribute values
COUNT
How many products are there?
SUM
Calculate inventory value
SELECT SUM(P_ONHAND*P_PRICE)
FROM PRODUCT;
Average
AVG
SELECT AVG(P_PRICE)
FROM PRODUCT;
SELECT MAX(P_PRICE)
FROM PRODUCT ;
Grouping Data
GROUP BY
SELECT V_CODE, COUNT (P_CODE), AVG(P_PRICE)
FROM PRODUCT
GROUP BY V_CODE;
Note that only the attribute(s) being grouped by can appear by themselves in the select.
All other attributes need to be in an aggregate function.
HAVING
SQL Indexes
The use of Indexes leads to faster performance and helps with data integrity.
It is not usually necessary to create indexes for primary keys but they are useful for
alternate keys
Aliases
Renaming Columns
SELECT P.P_DESCRIPT as "Description", P.P_PRICE as "Price",
V.V_NAME as "Vendor", V.V_CONTACT as "Contact", V.V_AREACODE
as "Area", V.V_PHONE as "Phone"
FROM PRODUCT P, VENDOR V
WHERE P.V_CODE = V.V_CODE AND P_INDATE > '08/15/2002';
Casting values
Outer Joins
These allow selection of rows where there are no matching rows in the table joined.
Converting an ER Model to a
Database Structure
• Requires following specific rules that govern such a conversion
• Decisions made by the designer to govern data integrity are reflected in
the foreign key rules
• Implementation decisions vary according to the problem being addressed
Normalization Process
Normalizatin works through a series of stages called normal forms:
Sample Report
Table derived form Above
The Need for Normalization
• Structure of data set in Figure 5.1 does not handle data very well
• The table structure appears to work; report is generated with ease
• Unfortunately, the report may yield different results, depending on what
data anomaly has occurred
Issues
• Table entries invite data inconsistencies
• Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103
leaves lose info on Elect Engineers
Repeating Group
Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist
for any single key attribute occurrence
Dependencies
• Dependencies can be depicted with the help of a diagram
• Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird’s-eye view of all relationships among a
table’s attributes
o Use makes it much less likely that an important dependency will
be overlooked
Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute
Dependency Diagram
1NF: Definition
• Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
• All relational tables must satisfy 1NF requirements
• Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with
caution
• Still subject to data redundancies
• It is in 1NF and
• It includes no partial dependencies:
• No attribute is dependent on only a portion of the primary key
3NF Results
Boyce-Codd Normal Form
• Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not
chosen to be primary key
• If a table contains only one candidate key, the 3NF and the BCNF are
equivalent
• BCNF can be violated only if the table contains more than one candidate
key
BCNF (cont)
• Most designers consider the Boyce-Codd normal form (BCNF) as a
special case of 3NF
• A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is
dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute
Decomposition to BCNF
Fourth Normal Form
• Table is in 3NF
• Has no multiple sets of multivalued dependencies
Conversion to 4NF
• 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of
each other
o No row contains two or more multivalued facts about an entity
• PK assignment
• Naming conventions
• Attribute atomicity
• Adding attributes
• Adding relationships
• Refining PKs
• Maintaining historical accuracy
• Dealing with derived attributes
Completed Database
Denormalization
• Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
• Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
• Do not be too quick to denormalize
Overnormalization
• This is done for performance reason frequently in distributed and clustered
database systems
• Splits tables beyond the pont required for normal forms.
• In horizontal partitioning rows of a single logical table are split among
several physical tables (e.g. geographically)
• In vertical partitioning a table is split vertically with commonly accessed
columns in separate physical tables
Notes