DataBasesSlideCombined PDF

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 1 (Wednesday)
Initial Orientation
Shereen Fouad
Teaching Fellow in School of Computer Science
Overview
Motivation
What is a database.
Applications of database technology.
Initial orientation about the course.
2
Motivation
Consider a supermarket business.
What do you want to keep track of?
What is the size of the data?
How can the business:
store and manage such data?
retrieve, manipulate and disseminate data?
take critical business decision?
How can we monitor the performance of the business?
The answer is Database Technology!!!
What is a Database?
The broad interpretation of a Database =
A collection of logically
coherent interrelated
data (raw facts of
interest to the end user)
Description of data
characteristics and
relationships
(Metadata: data about
data)
Data vs. Information

To understand what derives a database design you need to
differentiate between data and information.
Data are raw facts
Information produced by processing raw data to reveal meaning
Data are the foundation of information, which is the base of knowledge
Raw data must be structured for storage, processing, and

presentation
Database technology provides the most efficient data management.
Database technology is crucial for good decision making.
6
Applications of Database Technology

Storage and retrieval of numerical and
alphanumerical data.
E.g. data about a companys employees,
products, projects, customers, suppliers,
orders, sales, assets, etc.
Applications of Database Technology

Storage and retrieval of multimedia data.
E.g. You tube.
Storage and retrieval of Web content (HTML, PDF, images,..).

E.g. Google
Storage of huge data for analysis.

E.g. data warehouse.
Monitoring data to take action when requires.

E.g. be able to accurately keep track of, e.g., employee pay and tax, the status of items
that any given customer has ordered.
8
Why This Course?

Database systems are at the core of Computer Science.
It integrates various computer science concepts.
Languages, data structures, concurrency
The (digital) world runs on data.

The topic is intellectually rich.
It provides valued job skills
Teaching Staff
10
What Well Mainly Study

Techniques to
design and model a
conceptual/physical
database
Application
Development
Database design
SQL
SQL statements to define,

query and control a
relational database
Database Internal
11
What Well Mainly Study (cont.)

Key aspects of how to develop the conceptual/logical design of relational
databases.
The nature of relational databases, the central modern type of database.
Some basic mathematical concepts underpinning relational databases, and
useful also in many other branches of CS.
In particular, how to achieve certain types of good structuring, to help
achieve certain types of correctness and efficiency.
How to create and manipulate databases using a particular database
language, PostgreSQL (a version of SQL: very widely used in various forms).
12
Note about SQL Coverage

The main coverage of SQL will be via the very detailed weekly Additional Notes
and SQL exercises starting in Week 2 of the term.
Lectures will cover some basic concepts of SQL
Your learning of SQL is best done by
Reading the notes
Doing the exercises
Seeking help from the demonstrators, whether in the lab or in their office
hours.
The lecture material on concepts, theory, and design issues is essential for
designing good databases and writing good SQL.
13
Lectures and Practical Sessions

There are two lectures a week:
Every Wednesday (from week 1 -week 10) from 12:00 pm to 1:00 pm,
in WG5, Aston Webb and week 11 in LT2, Gisbett Kapp.
Every Friday (from Week 1 on-wards) from 1:00 pm to 2:00 pm, in LT1, Law.
One practical session a week

There is a PRACTICAL SESSION (LAB SESSION) every Thursday from Week 2
onwards at 2:00-5:00 in the Lower Ground floor lab (LG04) in the CS building.
For the practical work you will be using a database management system
called PostgreSQL.
14
Database Management System - PostgreSQL

PostgreSQL is the relational database management system (RDBMS)
that we will be using for practical exercises in the module.
It contains a database definition/manipulation language that is one of
many versions of Structured Query Language "SQL".
It has a simple command-line interface and works on the School Unix
system (Linux system).
Many database systems exist with fancy interfaces, but I want to
concentrate in the module on the core technical detail.
15
Course Text
C. Coronel, S. Morris, P. Rob & K. Crockett,
Database Principles: Fundamentals of Design, Implementation and
Management, 2nd Ed or 10th Edition, 2013.
CHAPTERS you need are published on the module website.
You can find the book in the Cs and University library.
16
Exercises
Every week I will give you some exercises to do in the lab session.
You need to submit the exercise electronically via canvas.
The ones up to and including Week 8 will be UNASSESSED.
You will get feedback from demonstrators via canvas.
The ones in Week 9 will be ASSESSED and will be due to submission in
week 11, accounting for 10% of the module mark.
Late submission on assessed exercise will lead to penalties.
17
Grading
Your final grade will be based on:
ONE CLASS TEST,
in week 8, Friday 21/11/2014 in the lecture hall (LT1 LAW)
accounting for 10% of the module mark.
One ASSESSED exercise,

announced in week 9 of the term :Friday 28/11/2014
submission is due in week 11:Friday 12/12/2014
accounting for 10% of the module mark
Final Examination,
accounting for 80% of the module mark.
18
Assessment Differentiation between CS

Master's Students and Year-in-CS Students
The Master's students but NOT the Year in CS students have the
following Learning Outcome: (LO 5):
Apply relational algebra and the mathematical theory of relations
to describe databases, queries, and consistency conditions.
Some lectures (starting from week 8) will be partly or wholly on LO5
topics.
Year-in-CS students will be expected to come to these lectures in full.
Note that, in these lectures I may make occasional additional
comments that are not on LO5.
19
Assessment Differentiation between CS

Master's Students and Year-in-CS Students
Class test will not contain the Learning Outcome: (LO 5) so it will
contain mandatory questions on everyone.
Unassessed exercise sets (from week 9 to week 11) will contain work
on the (LO 5) topic.
If assessed work items (Class Test, Week 9-11 Exercises, or
Examination) contain questions on LO5 stuff, then these questions
will be optional for Year-in-CS students even when compulsory for
Master's students.
20
Summary
Database technologies are all over the place.
Database is a collection of logically coherent interrelated data as well
as a description of this data.
Information is the result of processing data to reveal its meaning
21
Databases
2014/15
Week 2 (Friday)
Introduction to Tables
Shereen Fouad
Teaching Fellow
Reminder of previous lecture

File System Data Management incorporates from data redundancy,
Structural and data dependence and inadequate security features
Difference between Databases, Database Management Systems and
Database systems.
A database (DB) consists of a DB schema and a DB state.
A database management system (DBMS) is a collection of programs that
manage the database structure and control access to database.
A database system (DBS) consists of a DBMS and a database.
Overview
Table Structure Example
Cross-references between places in a data repository (Referential
integrity)
Associative linking versus pointing

Restrictions on Database tables
Student Table Example

Imagine that this table (or relation) has been defined to help keep track student details .
The name of the Table (relation)
STUDENT
A Table is composed of rows and columns.

A Table contains a group of related entities -- i.e. an entity set.
STUDENT ID
F NAME
L NAME
AGE
STUDENT
ADDRESS
COURSE
NO.
COURSE
NAME
DEPARTMENT
LOCATION
E12345
John
Chopples
37
42 George
St.
Finance
Building b1
E12367
Kent
Danial
42
56 Malcom
St.
24
CS
Building c2
E54321
Michal
Blurp
21
5 Bristol St.
12
Marketing
Building b2
E5099
Amber
Rumpel
60
45 Lime St.
24
CS
Building c2
E54344
Lea
John
34
6 Dan St.
Finance
Building b1

STUDENT
Each column represents an attribute and is identified by a distinct name.

Tables must have an attribute to uniquely identify each row
The number of columns is known as its degree.
STUDENT ID
F NAME
L NAME
AGE
STUDENT
ADDRESS
COURSE
NO.
COURSE
NAME
DEPARTMENT
LOCATION
E12345
John
Chopples
37
42 George
St.
Finance
Building b1
E12367
Kent
Danial
42
56 Malcom
St.
24
CS
Building c2
E54321
Michal
Blurp
21
5 Bristol St.
12
Marketing
Building b2
E5099
Amber
Rumpel
60
45 Lime St.
24
CS
Building c2
E54344
Lea
John
34
6 Dan St.
Finance
Building b1

The schema (structure) for the table
STUDENT
STUDENT ID
F NAME
L NAME
AGE
STUDENT
ADDRESS
COURSE
NO.
COURSE
NAME
DEPARTMENT
LOCATION
E12345
John
Chopples
37
42 George
St.
Finance
Building b1
E12367
Kent
Danial
42
56 Malcom
St.
24
CS
Building c2
E54321
Michal
Blurp
21
5 Bristol St.
12
Marketing
Building b2
E5099
Amber
Rumpel
60
45 Lime St.
24
CS
Building c2
E54344
Lea
John
34
6 Dan St.
Finance
Building b1

Each entry in the table is called a row (tuple).
Sometimes an entry in the table is called a data record.
The number of tuples in a table is called its cardinality.
STUDENT
STUDENT ID
F NAME
L NAME
AGE
STUDENT
ADDRESS
COURSE
NO.
COURSE
NAME
DEPARTMENT
LOCATION
E12345
John
Chopples
37
42 George
St.
Finance
Building b1
E12367
Kent
Danial
42
56 Malcom
St.
24
CS
Building c2
E54321
Michal
Blurp
21
5 Bristol St.
12
Marketing
Building b2
E5099
Amber
Rumpel
60
45 Lime St.
24
CS
Building c2
E54344
Lea
John
34
6 Dan St.
Finance
Building b1
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10nd Ed.

STUDENT
STUDENT ID
F NAME
L NAME
AGE
STUDENT
ADDRESS
COURSE
NO.
COURSE
NAME
DEPARTMENT
LOCATION
E12345
John
Chopples
37
42 George
St.
Finance
Building b1
E12367
Kent
Danial
42
56 Malcom
St.
24
CS
Building c2
E54321
Michal
Blurp
21
5 Bristol St.
12
Marketing
Building b2
E5099
Amber
Rumpel
60
45 Lime St.
24
CS
Building c2
E54344
Lea
John
34
6 Dan St.
Finance
Building b1

STUDENT
STUDENT ID
F NAME
L NAME
AGE
STUDENT
ADDRESS
COURSE
NO.
COURSE
NAME
DEPARTMENT
LOCATION
E12345
John
Chopples
15
42 George
St.
Finance
Building b1
E12367
Kent
Danial
18
56 Malcom
St.
24
CS
Building c2
E54321
Michal
Blurp
19
5 Bristol St.
12
Marketing
Building b2
E5099
Amber
Rumpel
22
45 Lime St.
24
CS
Building c2
E54344
Lea
John
32
6 Dan St.
Finance
Building b1

STUDENT
STUDENT
ADDRESS
COURSE
NO.
Chopples 15
42 George
St.
Kent
Danial
18
56 Malcom
St.
24
E54321
Michal
Blurp
19
5 Bristol St.
12
E5099
Amber
Rumpel
22
45 Lime St.
24
E54344
Lea
John
32
6 Dan St.
STUDEN
T ID
F
NAME
L
NAME
E12345
John
E12367
AGE
COURSE
COURSE NO.
COURSE
NAME
COURSE
LOCATION
Finance
Building b1
24
CS
Building c2
12
Marketing
Building b2
Referential Integrity
Referential integrity is relevant when one place in a data repository
needs to refer to something in another place: cross-references.
Referential integrity is achieved when every such referring place
contains a successful reference to another place or place-occupant
(or no reference at all).
Successful there just means that the reference succeeds in
specifying some other place(-occupant).
Ways of Doing Cross-Reference

Notice distinction above between referring to places or to placeoccupants: i.e., where or what, respectively
Pointing or associative linking, respectively.
Your party-attendance plan for the month would use
pointing if it referred to the party-givers by position, e.g. by page and line
number in your address book,
associative linking if it referred by means of party-givers names.
Labels in a diagram are a means for associative linking between the diagram
and the legend (= explanation of the labels, etc.) or other text.
Associative Linking
The notion of relational database rests heavily on associative
linking.
Notice that associative linkages between different places constitute a
specialized sort of needed redundancy.

STUDENT
STUDENT
ADDRESS
COURSE
NO.
Chopples 15
42 George
St.
Finance
Danial
56 Malcom
St.
CS
5 Bristol St.
Marketing
STUDEN
T ID
F
NAME
L
NAME
E12345
John
E12367
Kent
E54321
Michal
Blurp
AGE
18
19
E5099
Amber
Rumpel
22
45 Lime St.
CS
E54344
Lea
John
32
6 Dan St.
Finance
COURSE
COURSE
NAME
COURSE
LOCATION
Finance
Building b1
CS
Building c2
Marketing
Building b2
What are the disadvantages of using character strings

like COURSE NAME as linking values?
Disadvantages of using character strings as

linking values
In entering values, have to ensure exactly the same string of
characters on each occasion
avoid typos e.g. Finance, Finace
Inefficiency of comparing such complex values.
Reduce such problems by:

Using artificial linking values that are simpler in form and easier to
make distinct ..
An Analogy with Programming

Analogous redundancy/anomaly issues arise in program text. E.g.:
If a constant numerical value such as or g (gravitational acceleration) needs
to be used in several places, best to give it a name and replicate the name,
not the value. Aids consistency and maintainability.
If a sequence of operations needs to be invoked in many different places in
the program, package it as a named procedure (function, method, ).
A relational database developer refers to a data record as
(A) a criteria.
(B) a relation.
(C) a tuple.
(D) an attribute.
Specifying the location of a particular information in a book, e.g. by page and line number, is
considered as
(A) associative linking .
(B) pointing .
(C) degree.
(D) tuple.
The number of tuples in a table is called its

(A) cardinality
(B) degree
(B) attribute
(D) relation
Tables must have ------------------- to uniquely identify each row

(A) a cardinality
(B) a tuple
(B) an attribute
(D) a relation
Problems with that Table

NAME
ADDRESS
PHONES
BIRTHDAY
Babloop Porkypasta
107 Worm Drive,

Hedgebarton, Birmngham,
B15 9ZZ
0121-944-5677
07979-888777
11 January 1969
Coriolanus
Zebedee
OCrackpotham
The Wellyboots,
Boring-under-Mosswood,
Berks, HP11 1XX
016789-997710
Johnny
Next to the Tescos in Upper

Street
H: 020-7111-2222
W: 020-7111-2255
M: 07887-842657
Full Monty chip shop
Harborne
Hilary R. Clinton (grr!)
The Old Black House, 15768

Aplanalp St.,
Las Cruces, NM 880011,
USA
???
Oct 05
ex-dir
16 Sep?
(refused to tell me
how old she was)
Problems with that Table

Although that table illustrates the sort of table used in databases in some sense, it has many
tricky features:
Empty entries whats the interpretation?
Spelling error (Birmngham)
Names/addresses of different forms (perhaps unavoidably)
Different numbers of alternatives in different cells
Different interpretations of birthday field
(per year, or when born, or when shop opened)
Vague entries (next to the Tescos in Upper St.; Harborne)
Expressed uncertainty (the question marks, alone or attached)
Additional comments (grr!, refused )
Exceptional entry types (ex-dir, and the contents of the chip-shop row)
Restrictions on Database Tables:

Overall Structure
Regular overall shape: rows all same length, similarly columns.
No division into different regions (with a certain exception).
No labels for rows, as opposed to columns.
Mostly no significance to the order of rows.
No additional comments, footnotes, etc.
Restrictions on Database Tables:

Nature of Entries
All cells in any one column are given the same intuitive interpretation.
Each cells item restricted to a pre-specified, usually fairly simple
value range (data type), and all cells in any given column restricted to
same data type.
No exceptional entries with one exception!:
empty entries
One data item per cell (but it can be a variable-length character

string, containing anything).
Uncertainty and vagueness markers not supported.
Extra, Crucial Restriction

(on the main tables)
No row can be repeated in a table. (I.e., no two rows can contain
exactly the same values.)
This is equivalent to saying:
Rows are uniquely determined (picked out) by the values in some set
of columns (possibly the whole set, but could be fewer).
That is, if you imagine some values for those columns, there is at
most one row that has exactly those values in those columns.
Table on next slide is closer to what

might be in a database
LAST Name
FIRST
Name
MI
ADDRESS
Home
Phone
Mobile
B year
B day
Porkypasta
Babloop
107 Worm Drive,

Hedgebarton,
Birmngham, B15
9ZZ
0121-9445677
07979888777
1969
Jan 11
OCrackpotham
Coriolanus
The Wellyboots,
Boring-underMosswood,
Berks, HP11
1XX
016-789997710
1999
May 20
Delfino
Johnny
Next to the
Tescos in Upper
Street
020-71112222
1957
June 1
Clinton
Hilary
The Old Black

House, 15768
Aplanalp St.,
Las Cruces, NM
880011, USA
0121-9545646
1997
Sep 16
07887842657
Summary
Database design defines the database structure
DBMS enforces data integrity and eliminates redundancy
Relational database rests heavily on associative linking rather than
pointing
DBMS imposes some restrictions Database Tables
Databases
2014/15
Week 2 (Wednesday)
Introduction to Database and Database Management System
Shereen Fouad
Teaching Fellow in School of Computer Science

Database technology has several practical applications
Database is a collection of logically coherent interrelated data as well
as a description of this data.
Information is the result of processing data to reveal its meaning
Overview
More about Databases.
Database Management Systems.
Database Systems.
Types of Databases.
Problems with File System Data Management
A Closer Look to a Database Definition

A database is a structured body of information about entities of various
specific, precisely defined types.
Generally there are many entities of at least some of the types
The entities are generally in various specific types of relationship to each
other.
Each entity has a specific set of (intrinsic) attributes of interest. Their
values are generally of fairly basic, simple sorts (e.g., numbers, dates,
names).
The entities of a given type are typically not in any special order other than
an order arising naturally from their attributes.
4
A Closer Look to a Database Definition (cont.)

The individual data elements held are directly meaningful & interesting to
such users
The data held and retrieved is generally of exact form (no vagueness
expressed) and of definite form (no uncertainty expressed or expected).
The operations provided to users for extracting, inserting and updating
data are of conceptually straightforward sorts, not requiring elaborate
reasoning, problem-solving or analysis.
However, aggregate/overview/statistical information (counts, averages,
maxima, etc.) often needs to be computed from the data.
5
Types of Databases
Databases can be classified according to various aspects, for
example:
1. Number of users
Single-user database: supports only one user at a time
Desktop database
Multi-user database: supports multiple users at the same time
Workgroup database
Enterprise database
2. Database location(s)
Centralized database: data located at a single site
Distributed database: data distributed across several different sites
6
Centralized vs. Distributed database

client
client
Database
Database
Database
client
client
client
client
Database
7
Types of Databases (cont.)

3. Time sensitivity
Operational database: supports a companys day-to-day operations

Online Transaction Processing
Analytical database: stores data used for tactical or strategic decisions
Data warehouse
4. Type of data stored in
General purpose database
Discipline specific database
Can you think of examples here??

8
Database Management System (DBMS)

A Database Management System
(DBMS) is a software system
designed to:
Define and create the database
structure
Manage and manipulate data
Control access to database
DBMS is the intermediary between

the user and the database.
Languages DBMS
The Data Definition Language (DDL)
used by Database Administrator (DBA)
used to describe/create external and logical schema
The Data Manipulation Language (DML)

used to retrieve, insert, delete and modify data
used interactively or embedded in a programming language
10
Database System
Organization of components
that control the collection,
storage, management and
use of data.
Five major parts of a database
system:
Hardware
Software
People
Procedures
Data
11
The DBMS acts as an interface between

(A) Data and Databases
(B) Database Application and Database
(C) Database and SQL
(D) Database and Users
DML is provided for
(A) Description of logical structure of database.

(B) Addition of new structures in the database system.
(C) Manipulation & processing of database.
(D) Definition of physical structure of database system.

Which of the following are the properties of entities?
(A) Groups
(B) Table
(C) Attributes
(D) Switchboards
12
A step back in time: Files and File Systems

University File System
Departments
Files
Academic
Student ID
Student Name
Courses
Finance
Student ID
Student Name
Student fees
Student
Services
Student ID
Student Name
Accommodation No.
13

1. Data Redundancy.
Replicating data in different places in a data repository.
(E.g. student data is replicated in several department files).
Data inconsistency: different and conflicting versions of same data
occur at different places
Data anomalies: abnormalities when all modifications/changes in
redundant data not made correctly
Update anomalies
Insertion anomalies
Deletion anomalies
14
Redundancy implies that if you want to modify/delete a student name, you need to:
know whether there is replication, or check for possible replications
go to the effort of repeating changes when the student name is replicated
avoid errors in such repeated changes.
University File System

Departments
Files
Academic
Student ID
Student Name
Courses
Finance
Student ID
Student Name
Student fees
Student
Services
Student ID
Student Name
Accommodation No.
15

2. Structural and data dependence.
Unlike in databases which store data as well as metadata (catalog), file systems
store data only.
The structure of the data is stored in the application that access the file.
Structural dependence: changing the file structure requires changing the
application that access that file.
E.g. adding student DoB field.
Data dependence: data access changes when data storage characteristics change
E.g changing a data field from integer to character.
Structural and data dependence make file systems very difficult to manage - High
Maintenance.
16
Other problems
Poor design and lack of standardized data modeling
Security features difficult to program

Requires extensive programming to perform ad hoc queries
System administration complex and difficult
Difficult and expensive to integrate various applications.
Impossible to have multiple people or applications working on the same file.
17
Alternative solution:
Database System Application
Departments
Academic
Finance
Student
Services
DBMS
Database
Data
Metadata
18
Advantages of the DBMS

Improved data sharing
Improved data security
Better data integration
Minimized data inconsistency
Improved data access
Improved decision making
Increased end-user productivity
19
When would it make sense not to use a database system?
20
When would it make sense not to use a database system?

It depends on the data application at hand,
if
you are designing a small scale data application and you wont really suffer from the former
limitations
then
using a collection of files may be a better solution because of the increased cost and overhead of
purchasing and maintaining a DBMS.
21
Data independence means

(A) data is defined separately and not included in programs.
(B) programs are not dependent on the physical attributes of data.

(C) programs are not dependent on the logical attributes of data.
(D) both (B) and (C).
An advantage of the database management approach is

(A) data is dependent on programs.
(B) data redundancy increases.
(C) data is integrated and can be accessed by multiple programs.

(D) none of the above.
The language used in application programs to request data from the DBMS is referred to as the
(A) DML
(B) DDL
(C) VDL
(D) SDL
22
Summary
Data are usually stored in a database.
Databases can be classified to different types according to various
aspects.
DBMS implements a database and manages its contents
Database systems is the combination of database and DBMS.
File System Data Management suffers from several limitations when
compared to Database Systems.
23
Announcement
Week 2 exercise is now available on the module canvas website (Nontechnical) and (non - assessed)
Hand in Electronic copy via canvas (submission is optional if you are
seeking feedback)
Hand-out for getting started with PostgreSQL is now available on canvas
Lab sessions are starting this week (Thursday from 2pm to 5 pm in LG04)
Documents is now available on the module canvas website
Databases
2014/15
Week 3 (Friday)
Advanced SQL
Shereen Fouad
Teaching Fellow
School of Computer Science

University of Birmingham, UK
Simple Queries Review

READING A TABLE
SELECT * FROM EMPLOYEE;
SELECT FNAME, LNAME FROM EMPLOYEE;
Source of Image: Fundamentals of Database Systems (6th Edition)

by Ramez Elmasri and Shamkant B. Navathe
DISTINCT OUTPUT VALUES

SELECT DISTINCT SALARY FROM EMPLOYEE;
RENAMING ATTRIBUTES
SELECT DISTINCT SALARY AS "MONTHLY PAYMENT" FROM EMPLOYEE;
COMPUTED ATTRIBUTES
SELECT SALARY AS "USD", (SALARY*0.78) AS "EUROS" FROM EMPLOYEE

SIMPLE-COMPLEX CONDITIONS (AND, OR, NOT)
SELECT FNAME, LNAME, SUPERSSN FROM EMPLOYEE
WHERE DNO = 4 AND SEX = 'F' AND NOT (SUPERSSN= 123456789)
Simple Queries Review

PARTIAL MATCHING (LIKE % _)
SELECT FNAME, LNAME, ADDRESS FROM EMPLOYEE

WHERE ADDRESS LIKE '%TX%' AND FNAME LIKE '_A%'
SELECT FNAME, LNAME FROM EMPLOYEE WHERE LNAME LIKE 'W%'

ASC-DESC ORDERING COMBINATIONS

SELECT FNAME, SALARY, DNO FROM EMPLOYEE ORDER BY SALARY
SELECT FNAME, SALARY, DNO FROM EMPLOYEE ORDER BY SALARY DESC
SELECT FNAME, SALARY, DNO FROM EMPLOYEE ORDER BY DNO ASC, SALARY DESC
CHECKING FOR NULLS
SELECT FNAME, LNAME FROM EMPLOYEE WHERE SUPER_SSN IS NULL;
BETWEEN
SELECT * FROM EMPLOYEE WHERE (SALARY BETWEEN 30000 AND 40000) ;
Overview
Aggregate Functions
GROUP BY and Having
Nested Query
Any and ALL
EXIST and NOT EXIST
Aggregate Functions
Summary information can easily be extracted from a table using one of the
operators COUNT, MAX, MIN, AVG, SUM,STDEV
Example:
Find the sum of the salaries of all employees, the maximum salary, the
minimum salary, and the average salary.
SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM EMPLOYEE;
Retrieve the total number of employees working in department number 5.

SELECT COUNT(*)
FROM EMPLOYEE
Where DNO=5;
Count the number of unique salary values in the database.

Can you work out the answer???
GROUP BY
Allows for categorical output.
Apply aggregate operators to each of several groups of tuples.
First select these rows
Syntax:
SELECT
FROM
[WHERE
[GROUP BY
[HAVING
columnlist
tablelist
conditionlist]
columnlist]
conditionlist];
GROUP BY
For each department, retrieve the department number, the number
of employees in the department, and their average salary.
SELECT Dno, COUNT (*), AVG (Salary)
FROM EMPLOYEE
GROUP BY Dno;
Note that: The GROUP BY clause

specifies the grouping attributes,
which should also appear in the
SELECT clause, so that the value
resulting from applying each
aggregate function to a group of
tuples appears along with the
value of the grouping attribute(s).

Grouping Data on the fundamentals database

allmarks03
What is the average mark for students in each individual course.
Grouping Data on the fundamentals database

What is the average mark for students in each individual course.
SELECT bc AS "Course Code", AVG(mark)
AS "Average mark"
FROM allmarks03
allmarks03
GROUP BY bc;
GROUP BY and HAVING

What if we want to exclude all those courses from our summary table
which had fewer than 5 students enrolled in them?
The SQL HAVING Clause is used in combination with the GROUP BY
Clause to restrict the groups of returned rows to only those whose
the condition is TRUE.
SELECT bc AS "Course Code", AVG(mark) AS "Average
mark"
FROM allmarks03
GROUP BY bc
HAVING COUNT(*) >= 5;
Exclude all those courses from our summary table which had fewer than 5 students enrolled in them.
STUDENT
STUDENT
ADDRESS
COURSE
NO.
Chopples 24
42 George St.
Kent
Danial
16
56 Malcom
St.
24
E54321
Michal
Blurp
21
5 Bristol St.
12
E5099
Amber
Rumpel
25
45 Lime St.
24
E54344
Lea
John
20
6 Dan St.
STUDEN
T ID
F
NAME
L
NAME
E12345
John
E12367
AGE
Find the names and age of

the youngest student with
age 20, for each course with
at least 2 such course.
STUDENT
STUDEN
T ID
FNAME LNAME
AGE
STUDENT
ADDRESS
COURSE
_NO.
E12345
John
Chopples
24
42 George St.
E12367
Kent
Danial
16
56 Malcom St.
24
E54321
Michal
Blurp
21
5 Bristol St.
12
E5099
Amber
Rumpel
25
45 Lime St.
24
E54344
Lea
John
20
6 Dan St.
Find the names and age of

the youngest student with
age 20, for each course with
at least 2 such course.
SELECT FNAME, LNAME,MIN(AGE),
COURSE_No
FROM STUDENT
WHERE AGE >= 20
GROUP BY COURSE_No
HAVING COUNT (*) > 1
WHERE and HAVING

WHERE refers to the rows of tables, and so cannot use aggregate
functions
HAVING refers to the groups of rows, and so cannot use columns
which are not in the GROUP BY
Nested Query (Subquery)

Find the first name and age of the oldest employee??
EMPLOYEE
F_NAME
L_NAME
PHONE
EMPL. ID
AGE
SALARY
John
Chopples
0121-414-3816
E22561
37
23,000
Alex
Blurp
01600-719975
E85704
21
21,000
Anbreen
Rumpel
07970-852657
E22561
88
40,000

SELECT F_NAME, MAX(AGE)
FROM EMPLOYEE ;
What will the result be?

Result
F_NAME
AGE
John
88
Alex
88
Anbreen
88
Not legal syntax; no other columns allowed in SELECT clause

without a GROUP BY clause
Remember aggregate functions can only be used in the SELECT

clause or in a HAVING clause.

SELECT F_NAME, AGE
FROM EMPLOYEE
WHERE AGE =
(SELECT MAX (AGE)
FROM EMPLOYEE)
And then find the

employee(s) of that age
The inner query is executed first
Find the maximum age
In Nested Queries the WHERE clause can itself contain a SQL query!
Also FROM and HAVING clauses can too
The above subquery returns a single value

Often a subquery will return a set of values rather than a single
value
You cant directly compare a single value to a set
Options
IN NOT IN checks to see if a value is in the set
ALL/ANY - checks to see if a relationship holds for every/one member of the
set
EXISTS NOT EXIST checks to see if the set is empty or not

Find the first and last names of employees who has a registered phone numbers.
PHONE_NUMBERS
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
PHONE_
ID
PHONE
TYPE
STATUS
798687
John
Chopples
E22561
37
0121-414-3816
office
OK
668768
Alex
Blurp
E85704
21
01600-719975
home
FAULT
978098
Anbreen
Rumpel
E22561
70
0121-440-5677
home
OK
07970-852657
mobile
UNPAID
Can you work out the answer ??

Find the first and last names of employees who has a registered phone numbers.
SELECT F_NAME, L_NAME
FROM EMPLOYEE
WHERE PHONE_ID IN
(SELECT PHONE
FROM PHONE_NUMBERS);
Result
F_NAME
L_NAME
John
Chopples
Alex
Blurp
Anbreen
Rumpel
Find the first and last names of employees who dont have a registered phone numbers.
SELECT F_NAME, L_NAME
FROM EMPLOYEE
WHERE PHONE_ID NOT IN
(SELECT PHONE
FROM PHONE_NUMBERS);
Result
NULL
ANY and ALL

ANY and ALL compare single value v to a set of values V.
ALL operator returns TRUE if the value v is equal to all values in the
set V.
ANY operator returns TRUE if the value v is equal to some value in the
set V and is hence equivalent to IN.
They are used with operators like >, >=, <, <=, and <>.
ALL
List the names of employees whose salary is greater than the salary of
all the employees in department 4:
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ALL
( SELECT Salary
FROM EMPLOYEE
WHERE Dno=4 );
Source of Image: Fundamentals of Database Systems (6th Edition) by Ramez Elmasri and Shamkant B. Navathe
ANY
Find the names of employees who earn more than someone else.
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ANY
( SELECT Salary
FROM EMPLOYEE);
EXISTS and NOT EXISTS

Used to check whether the result of a nested query is empty (contains
no tuples) or not.
The result of EXISTS is a Boolean value TRUE if the nested query result
contains at least one tuple, or FALSE if the nested query result
contains no tuples
SELECT <columns>
FROM <tables>
WHERE EXISTS <set>
SELECT <columns>
FROM <tables>
WHERE NOT EXISTS <set>
EXISTS and NOT EXISTS

Retrieve the names of employees who have no dependents.
SELECT Fname, Lname
FROM EMPLOYEE
WHERE NOT EXISTS
( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn );
References
Some of the SQL Examples presented in this lecture have been
obtained from the following text book:
Fundamentals of Database Systems (6th Edition)
Databases
2014/15
Week 3 (Wednesday)
Introduction to SQL
Shereen Fouad
Teaching Fellow

The importance of database Table Design.
Cross-references between places in a data repository (referential
integrity).
Associative linking versus pointing.
Remember:
Associative Linking
This is how the tables are linked.
Coordination between Tables

Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_
ID
AGE
798687
John
Chopples
E22561
37
668768
Alex
Blurp
E85704
21
978098
Anbreen
Rumpel
E22561
70
PHONE_
ID
PHONE
TYPE
STATUS
0121-414-3816
office
OK
01600-719975
home
FAULT
0121-440-5677
home
OK
07970-852657
mobile
UNPAID
EMPLOYER_ID
EMPLOYER
ADDRESS
NUM. EMPLS
SECTOR
E48693
BT
BT House, London,
1,234,5678
Private TCOM
E85704
Monmouth
School for Girls
Hereford Rd,
Monmouth,
245
Private 2E
E22561
University of
Birmingham
Edgbaston Park Rd,

.
4023
Public HE
Overview
The main Categories SQL commands
The basic DDL and MDL commands
How to use SQL to query a database for useful information
Introduction to Structured Query Language. (SQL)

SQL is a specially designed programming language for managing data
stored in a Relational Database Management System (RDBMS)
SQL functions fit into two broad categories:
The Data Definition Language (DDL):
used to describe/create database schema
The Data Manipulation Language (DML):

used for selecting, inserting, deleting and updating data items in a database
Data Definition Commands

CREATE
Creating a new database object. E.g. empty table of a particular shape (mainly, particular
column names and value-types for the columns)
DROP
Deleting an existing database object.
ALTER
Changing the shape of an existing database object (e.g., adding/deleting a column in table,
or changing the type of a column)
Rename
Giving a new name for an existing database object.
Referential integrity statements
Need to ensure consistency between related tables. E.g.:
Deletion of something in one table may require deletions from or other modifications to
other tables.
Data Manipulation Commands

INSERT
Adding a row or rows to a table
DELETE
Deleting a row or rows (question: how identified?)
UPDATE
Updating values in an individual cell (column specified by name; but how identify the row?)
SELECT
Retrieving values from an individual cell; doing calculations on them
Retrieving the values in the cells in some or all columns for some or all rows
Calculating statistics concerning values in particular columns across all rows, a subset of rows,
or several subsets of rows (count, max, min, average, standard deviation, )
Ordering rows in different ways in displays of a table.
COMMIT
Save a database transaction.
ROLLBACK
Rollback a database transaction.
Example Database
EMPLOYEE
PHONE_NUMBERS
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
798687
John
Chopples
E22561
37
668768
978098
Alex
Anbreen
EMPLOYER
Blurp
Rumpel
4
2
E85704
E22561
21
70
PHONE_
ID
PHONE
TYPE
STATUS
0121-414-3816
office
OK
01600-719975
home
FAULT
0121-440-5677
home
OK
07970-852657
mobile
UNPAID
EMPLOYER_ID
EMPLOYER
ADDRESS
NUM. EMPLS
SECTOR
E48693
BT
BT House, London.
1,234,5678
Private TCOM
E85704
Monmouth
School for Girls
Hereford Rd,
Monmouth.
245
Private 2E
E22561
University of
Birmingham
Edgbaston Park Rd.
4023
Public HE
SELECT Queries
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
798687
John
Chopples
E22561
37
668768
Alex
Blurp
E85704
21
978098
Anbreen
Rumpel
E22561
88
SELECT
Used to list contents of table
Syntax:
SELECT column_list
FROM table_name;
Represents one or more attributes, separated by commas (projection)

One or more joined tables, separated by commas (selection)
Listing table rows:

Asterisk can be used as wildcard character to list all attributes (columns)
Example: find all employees:
SELECT *
FROM EMPLOYEE
or
SELECT *
FROM EMPLOYEE E
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
798687
John
Chopples
E22561
37
668768
Alex
Blurp
E85704
21
978098
Anbreen
Rumpel
E22561
88
What if I want to ask the database to give us the number of records in

the Employee table?
SELECT count(*)
FROM EMPLOYEE
Result
3
Listing Unique Values

PHONE_NUMBERS
DISTINCT clause produces list of unique values

in a table
Example:
SELECT DISTINCT P.STATUS
FROM PHONE_NUMBERS P
Result
STATUS
OK
FAULT
UNPAID
PHONE
TYPE
STATUS
0121-414-3816
office
OK
01600-719975
home
FAULT
0121-440-5677
home
OK
07970-852657
mobile
UNPAID
Ordering a Listing
ORDER BY clause is used when listing order is important
ORDER BY clause
Used to sort output of SELECT statement

Can sort by one or more columns
Ascending (ASC) or descending order (DESC)
ASC is the default
Example:
SELECT *
FROM EMPLOYEE
ORDER BY AGE DESC;
Result
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
SALARY
Anbreen
Rumpel
E22561
70
40,000
John
Chopples
E22561
37
23,000
Alex
Blurp
E85704
21
21,000
SELECT Queries
Fine-tune SELECT command by adding restrictions to search
criteria using:
Conditional restrictions
e.g. ,,, , etc.
Arithmetic operators
e.g. power operations, multiplications, divisions, additions and
subtractions
Logical operators
Searching data involves multiple conditions
e.g. AND, OR and NOT
Special operators
e.g. BETWEEN, IS NULL, LIKE, IN and EXIST
Conditional Restrictions
Add conditional restrictions to SELECT statement, using WHERE clause
Syntax:
SELECT columnlist
FROM tablelist
[ WHERE conditionlist ] ;
The WHERE clause is evaluated for each row in the table
Conditional Restrictions
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
798687
John
Chopples
E22561
37
668768
Alex
Blurp
E85704
21
978098
Anbreen
Rumpel
E22561
70
Example: find all 70-year-old employees.

SELECT *
FROM EMPLOYEE
WHERE AGE=70;
SELECT *
FROM EMPLOYEE E
WHERE E.AGE=70;
or
Result
Aliases rename tables
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID AGE
SALARY
978098
Anbreen
Rumpel
E22561
978098
70
How does a DBMS evaluate a query?

The system goes through the stored table line by line, checking
each time whether the age field matches exactly the value 70.
If it does, then employee fields are printed to the output;
if it doesnt, then that line is ignored and the search continues.
On the technical level, DBMSs employ all kinds of clever tricks
to speed up the search but the end result will be same.
Since there could be more than one member of staff whose
age is 70, it is possible that the system has to output many
employees.
Conditional Restrictions (cont.)

To find just names and phones, replace the first line:
SELECT F_NAME,L_NAME,PHONE_ID
FROM EMPLOYEE
WHERE AGE=70;
or
Result
F_NAME
L_NAME
PHONE_ID
Anbreen
Rumpel
SELECT F_NAME AS FIRST NAME,L_NAME AS LAST NAME,PHONE_ID

FROM EMPLOYEE
WHERE AGE=70;
Result
As keyword is used to put Aliases
(rename columns) in the result set
FIRST NAME
LAST NAME
PHONE_ID
Anbreen
Rumpel
Arithmetic Operators
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
SALARY
798687
John
Chopples
E22561
37
25000
668768
Alex
Blurp
E85704
21
21000
978098
Anbreen
Rumpel
E22561
70
50000
Source of Image: Database Principles: Fundamentals of
Design, Implementation and Management, 2nd Ed.
SELECT E.L_NAME AS LAST NAME, E.AGE-5 AS NEW AGE

FROM
EMPLOYEE E
WHERE E.SALARY=E.AGE*1000;
Result
LAST NAME
NEW AGE
Blurp
16
Logical Operators: AND, OR, and NOT

PHONE_NUMBERS
SELECT P.PHONE,P.TYPE
FROM PHONE_NUMBERS P
WHERE P.TYPE=office AND P.STATUS=OK;
Result
PHONE
TYPE
0121-414-3816
office
PHONE
TYPE
STATUS
0121-414-3816
office
OK
01600-719975
home
FAULT
0121-440-5677
home
OK
07970-852657
mobile
UNPAID
Which of the following is correct:

(A) a SQL query automatically eliminates duplicates.
(B) SQL permits attribute names to be repeated in the same relation.
(C) a SQL query will not work if there are no indexes on the relations
(D) None of these
AS clause is used in SQL for
(A) Selection operation.
(B) Rename operation.
(C) Join operation.
(D) Projection operation.
Which of the following operation is used if we are interested in only certain columns of a table?
(A) PROJECTION
(B) SELECTION
(C) UNION
(D) JOIN
A file manipulation command that extracts some of the records from a file is called
(A) SELECT
(B) PROJECT
(C) JOIN
(D) PRODUCT
Special Operators
BETWEEN: checks whether attribute value is within a range
LIKE: checks whether attribute value matches given string pattern
IS NULL: checks whether attribute value is null
IN: checks whether attribute value matches any value within a value list
EXISTS: checks if subquery returns any rows
BETWEEN
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
798687
John
Chopples
E22561
37
668768
Alex
Blurp
E85704
21
978098
Anbreen
Rumpel
E22561
70
SELECT *
FROM EMPLOYEE
WHERE AGE BETWEEN 20 AND 40;
Result
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
798687
John
Chopples
E22561
37
668768
Alex
Blurp
E85704
21
LIKE
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
AGE
798687
John
Chopples
E22561
37
668768
Alex
Blurp
E85704
21
978098
Anbreen
Rumpel
E22561
70
SELECT F_NAME,L_NAME,AGE
FROM EMPLOYEE
WHERE F_NAME LIKE A%;
.%. is a wildcard for any substring (including the empty substring).
Result
F_NAME
L_NAME
AGE
Alex
Blurp
21
Anbreen
Rumpel
70
IS NULL
EMPLOYEE
Empl_ID
F_NAME
L_NAME
PHONE_ID
EMPLOYER_ID
SALARY
AGE
798687
John
Chopples
E22561
20000
37
668768
Alex
Blurp
E85704
978098
Anbreen
Rumpel
E22561
SELECT E.L_NAME AS LAST NAME"

FROM
EMPLOYEE E
WHERE E.SALARY IS NULL;
Result
LAST NAME
Blurp
21
25000
70
Special Operators
BETWEEN: checks whether attribute value is within a range
LIKE: checks whether attribute value matches given string pattern
IS NULL: checks whether attribute value is null
IN: checks whether attribute value matches any value within a value list
EXISTS: checks if subquery returns any rows
The remaining two operators will be discussed next lecture
Summary
SQL commands can be divided into two overall categories:
Data definition language commands
Data manipulation language commands
The basic DML commands:

SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK
SELECT statement is main data retrieval command in SQL
Databases
2014/15
Week 4 (Friday)
Relational Model, Keys and Integrity Rules
Shereen Fouad
Teaching Fellow

Database development comprises three main stages
Data modeling Improves the understanding of the organization for which the
database design is developed
Good design begins by identifying entities, attributes, and relationships
Entities are the main objects which data are to be collected and stored in a Table.
Attribute: describes a characteristic of an entity.
A relationship is an association between entities
Relationship
Connectivity (1:1, 1:M, M:N),
Cardinality (In a relationship from entity type A to entity type B, a minimum and a maximum
can be specified for the number of B entities for each A entity) and
Participation (optional or mandatory)
Overview
What business rules are and how they influence database design
The Evolution of Data Models
The relational database model offers a logical view of data
Database Keys
Superkey
Candidate key
Primary key
Foreign key
Database Integrity Rules
Business Rules
DB designer gains the main information about the organization which is considered
as the main blocks of building a data model.
Business Rules allow designer to:
understand the nature, role, and scope of data
understand business processes
develop appropriate relationship participation rules and constraints
Translating Business Rules into Data Model
Nouns translate into entities
Verbs into relationships among entities
Identify the relationship type and connectivity
The translation step should consider a comprehensive and unique object names.
The Development
of Data Models
Relational Model
Implemented through the Relational
Data Management System (RDBMS)
Relational database model offers a
logical view of data.
Hides complexity represented in
hierarchal and network models from
the user
Entity is mapped to a relational table
Relational table stores collection of
related entities
Attributes is mapped to a column
table
The Entity Relationship Model

Entity Relational
database model
offers a conceptual
view of data.
Graphical
representations to
model database
components
Keys
Each row in a table must be uniquely identifiable by a key
A superkey for a table is a collection of one or more attributes that determines all
the other attributes in the table, i.e. determines a whole row.
Trivially, the collection of all the attributes is a superkey.
A set of attributes in a relation is called a candidate key if, and only if,
Every tuple has a unique value for the set of attributes (uniqueness)
No proper subset of the set has the uniqueness property (minimality)
To determine what is a candidate key, use knowledge of the real world (what is
going to stay unique!)
Superkeys & Candidate Keys: Example

Candidate key: {STUDENT ID}; {FNAME, LNAME} looks acceptable but we may get
people with the same name
{STUDENT ID, FNAME}, {STUDENT ID, LNAME} and {STUDENT ID, FNAME, LNAME}
satisfy uniqueness, but are not minimal.
{FNAME} and {LNAME} do not give a unique identifier for each row
{STUDENT ID} will be the best candidate key.
STUDENT
STUDENT ID
F NAME
L NAME
E12345
John
Chopples
E12367
Kent
Danial
E54321
Michal
Blurp
E5099
Amber
Rumpel
E54344
Lea
John
Primary Keys
A primary key for a table (entity type) is a candidate key that the DB designer has
chosen as being the main way of uniquely identifying a row (entity).
Primary keys are the main way of identifying target entities in entity relationships,
e.g., the way to identify someones employing organization.
Cannot have null values (A null value is no value, it is NOT equal to a zero or a
blank space).
For efficiency (and correctness) reasons, the simpler that primary keys are, the
better.
Typical primary keys examples are Identity numbers (of people, companies,
products, courses, etc.), or combinations of them with one or two other
attributes.
Composite key: Composed of more than one attribute
Superkeys, Candidate Keys & Primary Keys
superkey
primary key
candidate
key
Functional dependence
Attribute B functionally dependent on A if all rows in table that agree in value for A
also agree in value for B
Keys role is based on determination

If you know the value of attribute A, you can determine the value of attribute B
E.g., the collection DAY-NUMBER, MONTH and YEAR specifying birthdate in a table about people could determine DAY-NAME,
We alternatively say that DAY-NAME is functionally dependent on DAYNUMBER, MONTH and YEAR.
Foreign Keys
Remember!! Relationships are represented by associative linking by means of shared
attributes
Standardly, a relationship is represented by means of Foreign keys.
Foreign key: an attribute whose values match primary key values in the related table
Referential integrity: a set of attributes in the first (referencing) relation is a Foreign
Key if its value always either
matches a Candidate Key value in the second (referenced) relation, or
is NULL
Primary & Foreign Keys Example
Primary keys are

underlined
Foreign keys are
in blue boarder
A key that is composed of more

than one attributes is known as
a Composite Key
Primary & Foreign Keys Example
Is this redundancy?
Multiple occurrences of values not redundant when needed to
make the relationship work
Redundancy occurs only when there is unnecessary duplication
of attribute values
Foreign keys control typical data redundancies by using common
attributes shared by tables
In case of entity integrity, the primary key may be

(A) not Null
(B) Null
(C) both Null & not Null. (D) any value.
Key to represent relationship between tables is called
(A) Primary key
(B) Secondary Key
(C) Foreign Key
(D) None of these
An instance of relational schema R (A, B, C) has distinct values of A
including NULL values. Which one of the following is true?
(A) A is a candidate key (B) A is not a candidate key
(C) A is a primary Key
(D) Both (A) and (C)
Referential Integrity
When relations are updated, referential integrity can be violated
This usually occurs when a referenced tuple is updated or deleted
There are a number of options:
RESTRICT - stop the user from doing it
CASCADE - let the changes flow on
NULLIFY - make values NULL
Referential Integrity - Example

What happens if Administration Dnumber is changed to 3 in DEPARTMENT?
The entry for Research is deleted from DEPARTMENT?
Integrity Rules
Many RDBMs enforce integrity rules automatically
Summary
Relational database model offers a logical view of data.
Keys are central to the use of relational tables
Keys define functional dependencies
Each table row must have a primary key that uniquely identifies all
attributes
Tables linked by common attributes (foreign keys)
Databases
2014/15
Week 4 (Wednesday)
Database Modeling
Shereen Fouad
Teaching Fellow
Overview
Stages of Database development
Understand the basic data modeling concepts and importance
What are the basic data-modeling building blocks?
Entities and Entity sets

Attribute, Attributes Domains and Attributes Determination
Relationship Connectivity, cardinality and participation
Constraint
Stages of Database Development

1. Requirement Analysis Stage
Understand the problems of organization in order to provide solutions
Sources of requirements include forms, interviews, reports, use case,
observations and business rules
2. Design Stage
Requirements information is processed into a data model (database design)
3. Implementation Stage
Physical implementation of the developed database design into a real world
database application
Data Model
Data model is the collection of concepts that can be used to describe
the structure/design of the database.
Designers, programmers, and end users see data in different ways
Different views of same data lead to designs that do not correctly
present organizations operation
Data modeling reduces complexities of database design and organizes
data for various users
It Improves the understanding of the organization for which the
database design is developed
Entity Relationship Model is the most successful database model
Data Model (cont.)

Data modeling is iterative and progressive process
Serves as a communications tool to facilitate the interaction among:
designer
application
programmer
end user
Data Model Basic Building Blocks

Entity
Attribute
Relationship
Entities
Entities: are real-world objects, distinct from other objects, for which
we intend to collect data (e.g. person, place, event)
Entities are just things which data are to be collected and stored in a
Table.
A row in a table corresponds to an entity instance.
Entity Set: a group of entities of the same type, e.g., all employees.
Examples of database entities in a company business environment.
Employee
Department
What else??
Attributes
Attribute: describes a characteristic of an entity.
Each Attributes has a data type and other properties
Attributes of entities of a given type are the names of the different
pieces of information that need to be stored for entities of that type.
Attributes just the column names for the table for the entity type.
E.g., entities of the type Employee could have the following attributes:
Employee ID number, last name, first name, phone number, ageetc.
Attributes have a domain -- the attributes set of possible values.

Each tuple assigns a value to each attribute from its domain
The ______ operator is used to compare a value to a list of literals values that have been specified.
(A) BETWEEN
(B) ANY
(C) IN
(D) ALL
A set of possible data values is called

(A) attribute.
(B) degree.
(C) tuple.
(D) domain.
Which of the following is a legal expression in SQL?

(A) SELECT NULL FROM EMPLOYEE;
(B) SELECT NAME FROM EMPLOYEE;
(C) SELECT NAME FROM EMPLOYEE WHERE SALARY = NULL;

(D) None of the above
Which of the following are the properties of entities?
(A) Groups
(B) Table
(C) Attributes
(D) Switchboards
Relationship
A relationship is an association between entities, e.g.:
An employee works in a single department
A department employs several employees
Note that they mostly described as verbs.
Relationship Set: Collection of similar relationships.
Same entity set can participate in different relationship sets.
Relationship Connectivity
Relationships are importantly categorized as to uniqueness or multiplicity of
entities at either end connectivity.
Has a big effect on DB design.
Enrolls
Student
(M:N) A student may be enrolled in more than one

class (or none) and a class enrols more than one
student.
Many-to-Many relationship
1
Teaches
Professor
(1:M) A professor teaches more than one class (or

none) and a class is taught by at most one
professor.
Class
One-to-Many relationship
1
(1:1) Each student has at most one graduation

report and each graduation report is provided to at
most one student.
Class
Student
Has
One-to-One relationship
Graduation
Report
Relationship Cardinality
Relationships can be further specified as to how many entities allowed or
required at either end cardinality.
Has significant effect on DB design.
This is determined by an organizations business policy.

In a relationship from entity type A to entity type B, a minimum and a
maximum can be specified for the number of B entities for each A entity.
Example:
1
Professor
(0,3)
Teaches
(1,1)
Class
Student
(1,6)
Enrolls in
(5,35)
Many-to-Many relationship
Class
Relationship Participation
Optional [in a particular direction, X to Y]:
an X entity does not require a corresponding Y entity occurrence

i.e. the minimum number of Ys per X is 0
E.g. Class is optional to Professor, every Professor may or may not teach a
course
Mandatory [in a particular direction, X to Y]:
an X entity requires a corresponding Y entity occurrence
i.e. the minimum number of Ys per X is 1 or more
E.g. Professor is mandatory to Class, every Class must have a Professor
assigned to it.
Relationship participation depends on the business rule of the organization.
Employee Department Relationship Example
Each employee works in single department and each

department employs several employees.
Relationships are represented by associative linking by means
of shared attributes
Summary
Database development comprises of three main stages
Data modeling Improves the understanding of the organization for
which the database design is developed
Entity, Attribute and Relationship are the main blocks for generating a
database model
Databases
2014/15
Week 5 (Friday)
Conceptual Data Model (Part 2)
Entity Relationship Diagrams (ERD)
Shereen Fouad
Teaching Fellow
Reminder of Previous Lecture

The Conceptual Database Model
The Entity Relationship Diagram (ERD) model
The main characteristics and notations of entity
relationship components
Classes of Attributes
Identifier attributes
Simple versus Composite Attribute
Single-Valued versus Multivalued Attribute
Derived Attribute
Phases of Database Design

Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Conceptual design begins with the collection of

requirements and results needed from the
database. It is high level description (often done
with Entity Relationship (ER) model)
Logical schema is a description of the structure of
the database (Relational, Network, etc.)
translate ERD into DBMS data model
Schema Refinement consistency, normalization
Logical Schema
Physical
Design
Physical Schema
Physical schema is a description of the

implementation (programs, tables, dictionaries
and catalogs)
Overview
How relationships between entities are defined and
graphically presented
Relationship Connectivity in ERD
Relationships Cardinality in an ERD
Relationship Participation in ERD
Relationship Degree in ERD
Weak Entities in ERD
Associative (Composite) Entities
Example of ERD that represents a business situation
Types of Relationship Connectivity

A relationship is an association between entities
how many entities allowed or required at either end
cardinality.
Established by business rules
Many-to-Many
1-to-Many
1-to-1
Relationships Connectivity in an ERD

Cardinality means count, and is expressed as a number (Min, Max)
Maximum cardinality is the maximum number of entity instances that
can participate in a relationship. [1 or M]
Minimum cardinality is the minimum number of entity instances that
must participate in a relationship. [1 or 0]
Established by business rules

Optional participation
One entity occurrence does not require corresponding entity occurrence in
particular relationship
As shown in the below examples Minimum cardinality of zero [0] indicating
optional participation is indicated by placing an oval next to the optional entity.
Mandatory participation
One entity occurrence requires corresponding entity occurrence in particular
relationship
As shown in the below examples Minimum cardinality of one [1] indicating
mandatory (required) participation and it is not indicated by the ERD Chen
Model.

Indicates number of entities or participants associated with a
relationship
Unary relationship (degree =1)
Association is maintained within single entity
Binary relationship (degree =2)
Two entities are associated
Ternary relationship (degree =3)
Three entities are associated

Weak entity meets two conditions
Existence-dependent, i.e. Entity exists in database only when it is
associated with another related entity occurrence
Primary key partially or totally derived from parent entity in
relationship
Database designer determines whether an entity is weak

based on business rules
Associative (Composite) Entities

Also known as bridge entities
Used to implement M:N relationships
Composed of primary keys of each of the entities to be
connected
May also contain additional attributes that play no role in
connective process
The Chen Representation of the Invoicing Problem
Bridging entity types are weak, but this is not normally shown
Create an ERD that represents this

business situation
Consider a database that is to represent a large business. In this
typical business, there is a Division that operates several Departments.
A Division is described by the name of its business sector. The Division
is run by one Employee and each Department is managed by one
Employee. The database needs to keep track of Employee (ID, First
name, Last name, Salary, title and data of birth). It also wants to keep
track of Department name and location. Of course the Department
employs many Employees who work on projects that are assigned to
them. Each Project has a certain budget. Everyone needs to be busy,
so it is not uncommon for an Employee to be assigned many Projects
and a Project may have many Employees assigned to it. However, we
need to keep track of the employee working hours in each project.
There is a special case of Employees that are not assigned to any
Department; they roam around looking for work from the various
Departments.
Steps to Complete an ERD

Step 1) Business Rules
Step 2) Listing Entities and Attributes (considering
the attribute class)
Step 3) Simple ERDs with relations(considering
Connectivities and Cardinalities and Participation)
Step 4) The Complete ERD
Can You Spot Entities and their

Attributes??
Create an ERD that represents this

business situation
Consider a database that is to represent a large business. In this
typical business, there is a Division that operates several
Departments. A Division is described by the name of its business
sector. The Division is run by one Employee and each Department is
managed by one Employee. The database needs to keep track of
Employee (ID, First name, Last name, Salary, title and data of birth). It
also wants to keep track of Department name and location. Of course
the Department employs many Employees who work on projects that
are assigned to them. Each Project has a certain budget. Everyone
needs to be busy, so it is not uncommon for an Employee to be
assigned many Projects and a Project may have many Employees
assigned to it. However, we need to keep track of the employee
working hours in each project. There is a special case of Employees
that are not assigned to any Department; they roam around looking
for work from the various Departments.
STEP 1) Identify the Business Rules

A department employs many employees, but each employee
is employed by one department.
Some employees, known as "rovers," are not assigned to any
department.
A division operates many departments, but each department
is operated by one division
An employee may be assigned to many projects and a project
may have many employees assigned to it.
A project must have at least one employee assigned to it.
One of the employees manages each department.
One of the employees runs each division.
Step 2) Make a list of the Entities and their

Attributes
Entity: EMPLOYEE
Attributes (ID, First name, Last name, Salary, title and
data of birth)
Entity: DIVISION
Attributes DIVISION ID, business sector name
Entity: DEPARTMENT
Attributes Department ID, Department name and
location
Entity: PROJECT
Attributes Project ID, Project name and Project
Budget.
Step 3) List ALL simple Relations

[DIVISION] 1
<operates>
M [DEPARTMENT]
[EMPLOYEE] 1
<runs>
1 [DIVISION]
[EMPLOYEE] 1
<manages>
1 [DEPARTMENT]
[EMPLOYEE] N
<assigned>
[DEPARTMENT] 1 <employs>
M [PROJECT]
M [EMPLOYEE]
Connectivities and Cardinalities and

Participation
My procedure fro determining the cardinality:
A DIVISION will operate a minimum of ____1____ DEPARTMENT
A DIVISION will operate a maximum of ____N____ DEPARTMENTs
Then reverse the order:
A DEPARTMENT is operated by a minimum of ___1_____ DIVISIONs
A DEPARTMENT is operated by a maximum of ____1____ DIVISIONs
Putting this information together you get:
ERD
BSName
ID
1
ID
M
operates
DIVISION
1
DEPARTMENT
(1,1)
(1,N)
1
Name
Location
(1,1)
(1,1)
(1,N)
Fname
Lname
employs
manages
Name
1
runs
(0,1)
EMPLOYEE
(0,1)
ID
(0,1)
(0,N)
(1,1)
1
M
DoB
ASSIGN
(1,1)
(1,N)
ID
Name
Budget
PROJECT
Title
M
1
Salary
One thing is missing!!

Where do I put the working hours??
ERD
BSName
ID
Location
1
1
DEPARTMENT
(1,1)
(1,N)
1
ID
M
operates
DIVISION
Name
(1,1)
(1,1)
(1,N)
Fname
Lname
employs
manages
Name
1
runs
(0,1)
EMPLOYEE
(0,1)
ID
Working_Hours
(0,1)
(0,N)
(1,1)
1
M
DoB
ASSIGN
(1,1)
(1,N)
ID
Name
Budget
PROJECT
Title
M
1
Salary
Announcements
Next Friday lecture 7th of Nov will be 13:00
14:00 in Main Lecture Theatre Arts.
Next week there will be no lab session as you
will have a non-technical exercise.
Databases
2014/15
Week 5 (Wednesday)
Conceptual Data Model
Entity Relationship Diagrams (ERD)
Shereen Fouad
Teaching Fellow

Superkey
Any key (set of attributes) that uniquely identifies each row
Candidate key
A superkey without unnecessary attributes
Primary Key
A candidate key selected to uniquely identify all other attributes and
cant contain Null entries.
Foreign key (FK)

An attribute whose values match primary key values in the related
table
Composite key
Composed of more than one key attributes
Overview
The Conceptual Database Model
The Entity Relationship Diagram (ERD) model
The main characteristics and notations of entity
relationship components
Classes of Attributes
Derived Attribute

Data
Requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design

with Entity Relationship Diagram (ERD))
Logical Schema
Physical
Design
Physical Schema

and catalogs)

Data
Requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design

with Entity Relationship Diagram (ERD))
Logical Schema
Physical
Design
Physical Schema

and catalogs)
The Entity Relationship Model

Introduced by Chen in 1976
Most widely used conceptual model of DBs.
Graphical representation of entities, attributes and the
relationships
among
entities
in
a
database
structure(depending on the diagram style) varying amounts of
other info such as connectivities, cardinalities, keys,
weakness,
An ER model of an environment forms the basis of an ER
diagram (ERD) or several ERDs.
Diagrams based on the/a model are a widely accepted and
adopted graphical approach to database design.
Quick Flavour of Two Styles of Entity Relationship

Diagram (ERD)
There are several markedly different styles of ERD, and for each main style there
are several variants.
In this module we will focus only on the Chen Model Style.
Entities and Attributes Notation in

Chen Model
Entity represented by rectangle with entitys name
Entity name, a noun, written in capital letters
Attributes represented by ovals connected to entity rectangle
with a line
Each oval contains the name of attribute it represents
Identifier Attributes
Identifier Attributes (primary Key) is underlined
In the below example Ssn (Social Security number) is underlined
as it represent the identifier attribute (primary Key)

A simple attribute cannot be subdivided:
e.g. employee has simple attributes like Salary, Gender, and Department.
A composite attribute can be subdivided to further additional

attributes.
e.g. :Name First name, Middle Initials, Last name
Simple and Single valued attribute

Composite attribute

A single-valued attribute can have only a single value.
e.g. : a car can have only one car year.
A multivalued attribute can have many values.
e.g. : a car may have several body parts colors (top color,
body color..etc)
Multivalued attributes are shown in ER diagram by a double
line connecting to the entity
Multivalued Attribute
One (usually poor) possibility: Use a variable-length string for

the attribute, and list all the values within the string.
Disadvantage: little support supplied by the DBMS

insertions and deletion require special extra programming.
Another possibility: Within original entity type, split the
attribute into several different attributes corresponding to
different natural components of the entity.
Disadvantages: The attribute
may in reality need to be split
differently for different entities
in the entity type (e.g. different
cars).
The attribute may not have
naturally namable aspects at
all. E.g., imagine blotches of
color in random places on a car.
Another possibility: Within original entity type, split the
attribute into several different attributes not corresponding
to specific components of the entity.
E.g., have attributes called Colour1, Colour2, , Colour6.
Advantage: copes with the no-identifiable-components problem and
the different-split problems.
Disadvantages:
Have to set aside enough columns to accommodate the conceivable
max, but if this max is large and not often approached then have a lot
of wasted space.
Searching for a colour, or doing insertions and deletions, can be very

cumbersome.
Often Better: Replace the attribute by a new 1:M relationship to a new
entity type holding the original attributes data.
If the components of the original attribute are conceptually
distinguishable in a natural way, the new entity can have an attribute
whose values identify those components.
Derived Attribute
A derived attribute its value is computed
from other attributes.
It is indicated in ER diagram using a
dotted line connecting the attribute with
the entity.
e.g.: employee age can be calculated from
the date of birth and current date.
What do you recommend???
Announcements
Next lecture Friday 31st of October (ONLY in
week 5) will be 13:00 14:00 in 101, Haworth
(Y2 in Edgbaston Campus Map ).
The following Friday lectures (week 6-11) will
be 13:00 14:00 in Main Lecture Theatre
Arts.
This week (week 5) Hand-out and exercise has
been released on canvas.
Summary
Uses ERD to represent conceptual database as
viewed by end user
ERMs main components:
Entities
Relationships
Attributes
Classes of Attributes include
Derived Attribute
Databases
2014/15
Week 6 (Friday)
Logical Data Model (Mapping E/R design to relational schema) Part 2
Shereen Fouad
Teaching Fellow
Announcements
So far we have covered most of Chapters (1, 3, 4, 5 and 7) in the
reference text book
C. Coronel, S. Morris, P. Rob & K. Crockett,
Management, 10th Edition, 2013.
Next week we will start Chapter 8.

Mapping E/R design to relational schema
Mapping entity sets
Mapping weak entity sets
Mapping Multivalued Attribute
Mapping relationship sets into the database relational schema
1:M Relationships
1:1 Relationships
N:M Relationships
Strong versus weak Relationships
Strong versus weak Entities

Data
Requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema
Conceptual design: begins with the collection of

requirements and results needed from the database. It is
high level description (often done with Entity
Relationship (ER) model). DBMS independent.
Logical design: description of the structure of the
Relational database map ERD into relational data model.
Closer to the actual implementation. DBMS specific.
Physical schema is a description of the implementation

(programs, tables, dictionaries and catalogs)
Relations, Entities, Tables

E/R Diagram
Relational model
SQL
Entity
Instance
Attribute
Relationship
(1:M,1:1,M:N)
Identifying Attribute
Relation
Tuple
Attribute
Foreign Key
Table
Row
Column or Field
Foreign Key
Primary Key
Primary Key
Overview
Relationship Degree (Revised)
What is the recursive Symmetry Relationships
Implementation of the non-symmetric 1:M recursive relationship
Implementation of the non-symmetric N:M recursive relationship
Implementation of the symmetric 1:1 recursive relationship & nonredundant implementations
The problem of Symmetry
Redundant Relationships

The number of entities that are joining in the relationship indicates a
relationships degree.
A unary (recursive) relationship: a single entity
association
A binary relationship: two entities

association (most common)
Employee
Manages
Works
Employee
Customer
Department
Issue invoice
A ternary relationship: three entities association

Item
Employee
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.
Mapping of Ternary Relationship

Mapping Ternary (and nary) Relationships
One relation for each
entity and one for the
associative entity
Associative entity has

foreign keys to each
entity in the relationship
Tables for a Ternary Relationship
CFR is just like the bridging entity types youve seen before,
but has 3 links to other types instead of 2
Tables for a Ternary Relationship
Unary (Recursive) Relationships

A recursive relationship links entities of the same type.
E.g.: marriage, management, parthood,
Can have partial recursion: just some of the entity types involved in a
relationship could be the same.
Recursive Relationships: Symmetry

A relationship R between entity types E,F (possibly the same) is symmetric iff:
if eRf then fRe (i.e., IF R relates entity e of type E to entity f of type F, then it
must ALSO relate f to e.)
E.g.: marriage, being-sibling-of.
Recursive relationships cause major redundancy problems when ALSO
symmetric.
Symmetry only makes sense in the 1:1 and M:N cases.

((Can generalize the points to partly-recursive cases.))
(necessarily non-symmetric) 1:M recursive: EMPLOYEE

Manages EMPLOYEE
Mapping Unary (1:M)

Relationships - Recursive
foreign key in the same
relation
Just a standard 1:M implementation except linking a table to

itself.
No redundancy problem.
non-symmetric M:N recursive:

PART Contains PART
Mapping Unary
(M:N) Relationships
Two relations:
- One for the
entity type
- One for an
associative
relation in which
the
primary key has two
attributes, both
taken
from the primary
The COMPONENT entity type is just a bridging type, linking PART to
key of the entity
itself. NB: its first two columns both refer to PARTs PK but must be
differently named.
No redundancy problem.
A primary key if combined with a foreign key creates

(A) Parent-Child relationship between the tables that connect them.
(B) Many to many relationship between the tables that connect them.
(C) Network model between the tables that connect them.
(D) None of the above
Mapping Unary (1:M) Relationships
(A) foreign key in the same relation
(B) foreign key in an associative relation
(C) foreign key in both (the same and an associative relation)
(D) no foreign key is required
Mapping Unary (N:M) Relationships

(A) foreign key in the same relation
(B) foreign key in the associative relation
(C) foreign key in both (the same and an associative relation)
(D) None of the above.
symmetric (1:1) recursive relationship: EMPLOYEE

Married to EMPLOYEE
Suppose you tried the following:
Mapping Unary (1:1)
Relationships - Recursive
foreign key in the same
relation
Redundancy problem!!
Symmetry is the Problem

A non-symmetric 1-1 relationship would not have the problem
shown on previous slide.
A symmetric M:N relationship would have a redundancy problem,
whether implemented as in the 1-1 case or by a bridging table.
E.g.: being-sibling-of.
symmetric (1:1) recursive relationship: redundant &

non-redundant implementations
1) As previouslyredundant .
2) MARRIED_V1 is just a bridging entity type: still redundant.
3) MARRIAGE together with MARPART act as a sort of bridge. Non-redundant.
Symmetric M:N, etc.

Method 3 on previous slide can straightforwardly be generalized to:
symmetric recursive M:N relationships
Occur when there are multiple relationship paths between related
entities
Main concern is that redundant relationships remain consistent
across model
Summary: Creating ERMs/ERDs

Designing an ER model for a database is an iterative process, because, e.g.:
As you proceed, you think of new ways of conceiving whats going on (much
as in ordinary programming)
Multivalued attributes need to be re-represented eventually
M:N relationships can be included as such at an early stage, but usually need
to be replaced by means of bridging entity types later
Implementation of 1:1 relationships varies deepening on the relationship
participation
1:1 relationships or N:M Symmetric recursive relationships usually need
special handling.
Weak entities usually need special handling.
Databases
2014/15
Week 6 (Wednesday)
Logical Data Model ( Mapping E/R design to relational schema) Part 1
Shereen Fouad
Teaching Fellow
Announcements
This week there will be no lab session.
Assignment 5 will be conceptual (designing an ERD) and it won't
involve any practical work.
Assignment 5 (unassessed) has been released on canvas.
In week 8 you will have an one line test on canvas
It will become available on Friday 21/11/2014 (on canvas) accounting
for 10% of the module mark. (The exact time will be announced on
canvas soon)
Once you start you have 60 minutes to complete it.

How relationships between entities (in both directions) are defined
and graphically presented in ERD,
Relationship Connectivity in ERD
Associative (Composite) Entities in ERD
Example of ERD that represents a business situation

Data
Requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema


Overview
Mapping E/R design to relational schema
Mapping entity sets
Mapping relationship sets
1:M Relationships
1:1 Relationships
N:M Relationships
Strong versus weak Relationships
Strong versus weak Entities
Logical Design
Logical design translates the conceptual design (ER mode) into the
internal model (relational schema) for a selected DBMS.
E/R Diagram
Entity
Relational model
Relation
SQL
Table
Instance
Attribute
Relationship
(1:M,1:1,M:N)
Tuple
Attribute
Foreign Key
Row
Column or Field
Foreign Key
Primary Key
Primary Key
Mapping entity sets

An entity set translates directly to a table
Attributes columns
Key attributes key columns

Weak Entities Becomes a separate relation with a foreign key taken from the
strong entity
Primary key composed of:
Partial identifier of weak entity
Primary key of identifying relation (strong entity)relationship
Primary keys are underlined
Foreign Keys are circled in red


Multi-valued Attribute - Becomes a separate
relation with a foreign key taken from the
superior entity

Foreign Keys are circled in red
EMPLOYEE (EMP_NUM,EMP_LNAME,EMP_FNAME, ..)

DEPENDANT(EMP_NUM, DEP_NUM, DEP_FNAME, DEP_DOB)
Logical Design
E/R Diagram
Relational model
SQL
Entity
Instance
Attribute
Relationship
(1:M,1:1,M:N)
Relation
Tuple
Attribute
Foreign Key
Table
Row
Column or Field
Foreign Key
Primary Key
Primary Key
Mapping 1:M Relationships

Connectivity of
the relationship
set determines
the key of the
table
Primary key on
the one side
becomes a foreign
key on the many
side
Parent
Child
Example 1:M Relationship

EMPLOYEE
Emp-ID
NAME
PHONE
ORG_ID
AGE
9568876
Chopples
0121-414-3816
E22561
37
2544799
Blurp
01600-719975
E85704
21
1698674
Rumpel
07970-852657
E22561
88
1800748
Dunston
0121-414-3886
E22561
29
ORGANIZATION
employs
EMPLOYEE
ORGANIZATION
ORG_ID
EMPL NAME
ADDRESS
NUM EMPLS
SECTOR
E48693
BT
BT House,
London,
1,234,5678
Private TCOM
E85704
Monmouth
School
Hereford Rd,
Monmouth,
245
Private 2E
University of
Birmingham
Edgbaston Park
Rd, .
3023
E22561
Each Organization employs many

Employees.
More than one employee allowed

per organization, but no more than
one employer per person.
Public HE
Foreign Keys are denoted in red
Mapping 1:1 Relationships

Primary key on the mandatory side becomes a foreign key on the optional side.
If both sides of relation are optional, it doesnt matter which table receives the
foreign key.
1:1: that is, no more than
1
1
one phone allowed per
Has
PEOPLE
PHONES
person, and vice versa.
PEOPLE
PHONES
PERS-ID
NAME
EMPL ID
AGE
PHONE
TYPE
PERS-ID
STATUS
9568876
Chopples
E22561
37
0121-414-3816
office
9568876
OK
2544799
Blurp
E85704
21
01600-719975
home
5099235
FAULT
1698674
Rumpel
E22561
88
0121-440-5677
home
1698674
OK
5099235
Biggles
E22561
29
07970-852657
mobile
2544799
UNPAID
Mapping M:N Relationships
Consider the shown relationship

IF we represent M:N connectivity in a similar way to 1:M, then we can expect that
in the STUDENT table: some students will each have several classes listed
or in the CLASS table: some classes will each have several students listed
or both.
This is a problem. Why?
The Problem with M:N Relationship
Example M:N Relationship

Because of this problem, an M:N
relationship is usually broken up into
two 1:M relationships.
This means introducing an extra
bridging or linking or composite
entity type (hence table) to stand
between the two original ones.
The composite entity ENROLL has a

primary key composed of the
primary keys of two entities
STUDENT and CLASS.
Example M:N Relationship
An entity set that does not have sufficient attributes to form a primary key is a
(A) strong entity set.
(B) weak entity set.
(C) simple entity set.

A logical schema
(D) primary entity set.
(A) is the entire database.

accessible parts.
(B) is a standard way of organizing information into
(C) describes how data is actually stored on disk.

(D) both (A) and (C)
E-R model uses this symbol to represent weak entity set ?
(A) Dotted rectangle.
(B) Diamond
(C) Doubly outlined rectangle

The conceptual model is
(D) None of these
(A) dependent on hardware.
(B) dependent on software.
(C) dependent on both hardware and software . (D) independent of both hardware and software.
Strong Relationships
Strong (identifying) relationships
Exists when PK of related entity contains PK component of parent entity
A relationship from entity type A to entity type B, mediated by having As primary key
(PK) as a foreign key in B, is strong when Bs PK contains As PK.
Includes the case of Bs PK just being the same as As PK.
E.g., A = Customers, B = Dependants, where
As PK is: CUST_ID
Bs PK is: CUST_ID, FIRST_NAME, CONNECTION.

So a PK value in B could be (1698674, Mary, child) , meaning that this entity is the child called Mary
of person 1698674 in the Customer table.
Dependants is weak entitity, because there is a strong relationship to it from Customers, and
Dependants is existence-dependent on Customers via this relationship.
Strong Relationship
CUSTOMERS (the A type)
CUST-ID
NAME
PHONE
EMPL ID
AGE
9568876
Chopples
0121-414-3816
E22561
37
2544799
Blurp
01600-719975
E85704
21
1698674
Rumpel
07970-852657
E22561
88
1800748
Dunston
0121-414-3886
E22561
29
Strong relationship going from

A to B
(we could say: B is strongly
dependent on A)
DEPENDANTS (the B type)

CUST-ID
FIRST NAME
CONNECTION
LIVES_WITH
2544799
John
civil partner
TRUE
1698674
Mary
child
FALSE
1698674
Mary
spouse
FALSE
1698674
David
child
TRUE
Weak (or Non-Identifying) Relationships

Exists if PK of related entity does not contain PK component of parent entity
A relationship is weak when it isnt strong!
So, most relationships are weak.
Note that strength/weakness is directional: the People to Dependants
relationship (above) is strong, but the Dependants to People relationship is weak.
Strong Entity Types

A strong entity type is one that is not weak! .
So, in particular, any entity type that receives only weak relationships from other
entity types is strong.
So the usual case is for an entity type to be strong.

And any entity type that is not existence-dependent on anything is strong.
Databases
2014/15
Week 7 (Wednesday)
SQL Data Definition
Shereen Fouad
Teaching Fellow

About the extended entity relationship (EER) models main constructs
Supertype and subtype relationships
Why and When to Consider Supertypes and Subtypes?
Relationships and Subtypes
Generalization and specialization
Completeness Constraint
Disjoint and Overlapping Constraints
Mapping Supertype/Subtype Relationships to Relational Data Model
Overview
How to use SQL for data administration to create databases and tables.
SQL Data types
SQL Constraints
NOT NULL constraint
UNIQUE constraint
DEFAULT constraint
CHECK constraint
Primary Key
Foreign Key
DROP TABLE
ALTER TABLE
INSERT, UPDATE, and DELETE

SQL functions fit into two broad categories:
used to describe/create database schema

Basic command set has vocabulary of less than 100 words
Creating the Database

It involves the followings:
Create the Database
CREATE DATABASE dbname;
Create DB Schema (Group of database objects that are related to each other)
Creating a Table
CREATE TABLE <name> (
<col-def-1>,
<col-def-2>,
:
<col-def-n>,
<constraint-1>,
:
<constraint-k>);
You supply A name for the table

A list of column definitions (including their names and data types [NOT] NULL,
DEFAULT values)
column_name1 data_type(size),
A list of constraints (Primary keys, Unique columns, Foreign keys)
For Better Table Structures

Use one line per column (attribute) definition
Use spaces to line up attribute characteristics and constraints
Table and attribute names are capitalized
NOT NULL specification
UNIQUE specification
Primary key attributes contain both a NOT NULL and a UNIQUE
specification
RDBMS will automatically enforce referential integrity for foreign keys
Command sequence ends with semicolon
Data Types
Data type selection is usually dictated by nature of data and by
intended use
Supported data types:
Number(L,D), Integer, Smallint, Decimal(L,D)

Char(L), Varchar(L), Varchar2(L)
Date, Time, Timestamp
Real, Double, Float
Interval day to hour
Many other types
Some of the Supported data types in Postgresql

Numeric Data types
Alphanumeric Data types
Date/time
Data types
SQL Constraints
Each constraint is given a name - Access requires a name, but some others dont
Constraints which refer to single columns can be included in their definition

NOT NULL constraint
Ensures that column does not accept nulls
UNIQUE constraint
Ensures that all values in column are unique
DEFAULT constraint
Assigns value to attribute when a new row is added to table
CHECK constraint
Validates data when attribute value is entered
Primary Keys
Primary Keys are defined through constraints
A PRIMARY KEY constraint also includes a UNIQUE constraint and makes the
columns involved NOT NULL
The <details> for a primary key is a list of columns which make up the key
CONSTRAINT <name>
PRIMARY KEY
(col1, col2, )
Example
CREATE TABLE distributors (
did integer,
name varchar(40),
PRIMARY KEY(did) );

did integer PRIMARY KEY,
name varchar(40) );
Unique Constraints
As well as a single primary key, any set of columns can be specified as UNIQUE
This has the effect of making candidate keys in the table

The <details> for a unique constraint are a list of columns which make up the
candidate key
CONSTRAINT <name>
UNIQUE
(col1, col2, )
Example
CREATE TABLE films (
code char(5) CONSTRAINT firstkey PRIMARY KEY,
title varchar(40) NOT NULL,
did integer NOT NULL,
date_prod date,
kind varchar(10),
CONSTRAINT production UNIQUE(date_prod));
did integer PRIMARY KEY DEFAULT nextval('serial'),
name varchar(40) NOT NULL CHECK (name <> '') );
Foreign Keys
Foreign Keys are also defined as constraints
You need to give

The columns which make up the FK
The referenced table
The columns which are referenced by the FK
CONSTRAINT <name>
FOREIGN KEY
(col1,col2,)
REFERENCES
<table>
[(ref1,ref2,)]
[ON DELETE action ] [ ON UPDATE action ] (table constraint)
Example
CREATE TABLE cities
( city varchar(80) primary key, location point );
CREATE TABLE weather

( city varchar(80) references cities(city),
temp_lo int, temp_hi int, prcp real, date date );
Example
CREATE TABLE Enrolment (

STU_NUM char(10),
CLASS_CODE integer,
ENROLL_GRADE char(6) NOT NULL,
PRIMARY KEY (STU_NUM,CLASS_CODE),
FOREIGN KEY (STU_NUM ) REFERENCES

STUDENT (STU_NUM),
FOREIGN KEY (CLASS_CODE)

REFERENCES CLASS (CLASS_CODE))
ON DELETE/ ON UPDATE
When the data in the referenced columns is changed, certain actions
are performed on the data in this table's columns.
The ON DELETE clause specifies the action to perform when a
referenced row in the referenced table is being deleted.
Likewise, the ON UPDATE clause specifies the action to perform when
a referenced column in the referenced table is being updated to a
new value.
Possible actions for each clause

NO ACTION
Produce an error indicating that the deletion or update would create a
foreign key constraint violation.
CASCADE
Delete any rows referencing the deleted row, or update the value of the
referencing column to the new value of the referenced column,
respectively.
SET NULL
Set the referencing column(s) to null.
SET DEFAULT
Set the referencing column(s) to their default values.
Example
CREATE TABLE Dept_Mgr(

did INTEGER,
dname CHAR(20),
budget REAL,
ssn CHAR(11) NOT NULL,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE NO ACTION)
Example of Weak Entity Sets

When the owner entity is deleted, all owned weak entities must also be
deleted.
CREATE TABLE Dep_Policy (

pname CHAR(20),
age INTEGER,
cost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
CREATE TABLE AS
Define a new table from the results of a query
CREATE TABLE films_recent AS
SELECT * FROM films WHERE date_prod >= '2002-01-01;
CREATE TABLE films2 AS

TABLE films;
Deleting Tables
To delete a table use
DROP TABLE
[IF EXISTS]
<name>
Example:
DROP TABLE Module

BE CAREFUL with any SQL statement with DROP in it
You will delete any information in the table as well
You wont normally be asked to confirm
There is no easy way to undo the changes
Changing Tables
Sometimes you want to change the structure of an existing table
One way is to DROP it then rebuild it
This is dangerous, so there is the ALTER TABLE command instead
ALTER TABLE can
Add a new column

Remove an existing column
Add a new constraint
Remove an existing constraint
ALTERing Columns
To add or remove columns use
ALTER TABLE <table>
ADD COLUMN <col>
ALTER TABLE <table>
DROP COLUMN <name>
Examples
ALTER TABLE Student
ADD COLUMN
Degree VARCHAR(50)
ALTER TABLE Student
DROP COLUMN Degree
ALTERing Constraints
To add or remove columns use
ALTER TABLE <table>
ADD CONSTRAINT
<definition>
ALTER TABLE <table>
DROP CONSTRAINT
<name>
Examples
ALTER TABLE Module
ADD CONSTRAINT
ck UNIQUE (title)
ALTER TABLE Module
DROP CONSTRAINT ck
The basic data type char(n) is a _____ length character string and varchar(n) is _____
length character.
A) Fixed, equal
B) Equal, variable
C) Fixed, variable
D) Variable, equal
Updates that violate __________ are disallowed .
A) Integrity constraints
B) Transaction control
C) Authorization
D) DDL constraints
Which of the following SQL command can be used to modify basic storage
characteristic of a database table?
A) MODIFY
B) UPDATE
C) CHANGE
D) ALTER
INSERT, UPDATE, DELETE

INSERT - add a row to a table
UPDATE and DELETE use WHERE
clauses to specify which rows to
change or remove
UPDATE - change row(s) in a table
BE CAREFUL with these - an incorrect
WHERE clause can destroy lots of data
DELETE - remove row(s) from a table
INSERT
INSERT INTO
<table>
(col1, col2, )
VALUES
(val1, val2, )
The number of columns and values must be the same

If you are adding a value to every column, you dont have to list them
SQL doesnt require that all rows are different (unless a constraint says so)
INSERT
Student
INSERT INTO Student
(ID, Name, Year)
VALUES (2, Mary, 3)
ID
Name
Year
1
2
John
Mary
1
3
Student
Student
ID
Name
Year
John
INSERT INTO Student

(Name, ID)
VALUES (Mary, 2)
ID
Name
Year
1
2
John
Mary
Student
INSERT INTO Student
VALUES (2, Mary, 3)
ID
Name
Year
1
2
John
Mary
1
3
UPDATE
UPDATE <table>
SET col1 = val1
[,col2 = val2]
[WHERE
<condition>]
All rows where the condition is true have

the columns set to the given values
If no condition is given all rows are
changed so BE CAREFUL
Values are constants or can be computed
from columns
UPDATE
Student
Student
ID
Name
Year
1
2
3
4
John
Mark
Anne
Mary
1
3
2
2
UPDATE Student
SET Year = 1,
Name = Jane
WHERE ID = 4
ID
Name
Year
1
2
3
4
John
Mark
Anne
Jane
1
3
2
1
Student
UPDATE Student
SET Year = Year + 1
ID
Name
Year
1
2
3
4
John
Mark
Anne
Mary
2
4
3
3
DELETE
Removes all rows which satisfy the
condition
DELETE FROM
<table>
[WHERE
<condition>]
If no condition is given then ALL rows are

deleted - BE CAREFUL
DELETE
Student
DELETE FROM
Student
WHERE Year = 2
Student
ID
Name
Year
1
2
3
4
John
Mark
Anne
Mary
1
3
2
2
DELETE FROM Student
ID
Name
Year
1
2
John
Mark
1
3
Student
ID
Name
Year
Databases
2014/15
Week 7 (Wednesday)
The Extended Entity Relationship (EER) Model
Shereen Fouad
Teaching Fellow
Announcements
The test will be online on canvas
You need to make sure that you are registered on the correct Module
(Fundamentals/ICY) or on canvas.
Test will be Multiple Choice Questions (20 questions)
It will cover all of the concepts that we have discussed so far including this weeks
lectures (week7) but not the next week (week 8).
The ONLINE TEST accounts for 10% of the module mark
Test will be available on Friday 21 of November at 3 pm and will close (same day) on
Friday 21 of November at 10 pm
The last opportunity for you to take the test is from 9pm to 10pm
Once you start the test it should take you only 60 minutes to complete the test.
Announcements
The test is marked automatically and you are going to see your mark
right away after you finish the test
Correct answers will be released on Saturday on canvas.
Last years class test and answers are released on canvas.
If you are entitled for extra time you need to contact welfare ASAP
Today we are staring chapter 8 in the book.
C. Coronel, S. Morris, P. Rob & K. Crockett, Database Principles: Fundamentals of Design,
Implementation and Management, 10th Edition, 2013.

What is the recursive Symmetry Relationships
Implementation of the non-symmetric 1:M recursive relationship
Implementation of the non-symmetric N:M recursive relationship
Implementation of the symmetric 1:1 recursive relationship & nonredundant implementations
The problem of Symmetry

Data
Requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema


Overview
About the extended entity relationship (EER) models main constructs
Supertype and subtype relationships
Why and When to Consider Supertypes and Subtypes?
Generalization and specialization
Mapping Supertype/Subtype Relationships to Relational Data Model
Summary
The Extended Entity Relationship Model

It aims at adding more semantic constructs to original entity
relationship (ER) model
Diagram using this model is called an Extended Entity Relationship
Diagram (EERD)
It depends on the idea of Entity supertype and Entity subtypes
Entity Supertypes and Subtypes
Entity Supertypes and Subtypes

Entity supertype
Generic entity type related to one or more entity subtypes
Contains common characteristics (attributes shared by all its subtypes)
Entity subtypes
Contains unique characteristics (special attributes) of each entity subtype
May participate in unique relationships
Primary key of a subtype is normally that of the supertype
Subtype exists only within context of supertype
Every subtype has only one supertype to which it is directly related
Can have many levels of supertype/subtype relationships
Why Consider Supertypes and Subtypes?
Why Consider Supertypes and Subtypes?

The grouping of Employees into various types provides two benefits:
It avoids unnecessary null values in some non-shared attributes
It enables a certain employee type to participate in relation ships that are unique
to that employee type.
When to Consider Supertypes and Subtypes?

If you have different kinds or types of the entity in the users environment.
The different kinds or types of instance should each have one or more attribute
that are unique to that particular type.
Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Relationships at the supertype level indicate that all subtypes will participate in the
relationship.
The instances of a subtype may participate in a relationship unique to that subtype.
In this situation, the relationship is shown at the subtype level.
Specialization and Generalization

Specialization
Identifies more specific entity subtypes from higher-level entity supertype
Top-down process
Based on grouping unique characteristics and relationships of the subtypes
Generalization
Identifies more generic/general entity supertype from lower-level entity
subtypes
Bottom-up process
Based on grouping common characteristics and relationships of the subtypes
Inheritance
Enables entity subtype to inherit attributes and relationships of
supertype
All entity subtypes inherit their primary key attribute from their
supertype
At implementation level, supertype and its subtype(s) maintain a 1:1
relationship
Specifies whether entity supertype occurrence must be a member of at least one
subtype
Partial completeness
Symbolized by a single line
Some supertype occurrences that are not members of any subtype
Total completeness
Symbolized by a double line
Every supertype occurrence must be member of at least one subtype
Examples of completeness constraints

Partial completeness
A vehicle could be a car, a truck,

or neither
Examples of completeness constraints

Total completeness
A patient must be either an

outpatient or a resident patient

Whether an instance of a supertype may simultaneously be a
member of two (or more) subtypes.
Disjoint subtypes
An instance of the supertype can be only ONE of the subtypes
Symbolized by a the letter d
Overlapping subtypes
An instance of the supertype could be more than one of the subtypes
Symbolized by a the letter o
Example of Disjoint constraints
A patient can either be outpatient or resident,

but not both
Example of Overlap constraint
A part may be both

purchased and
manufactured
Example of supertype/subtype hierarchy
When an entity instance must be a member of only one subtype, it is which of the following?
A) Disjoint with total specialization
B) Disjoint with partial specialization
C) Overlap with total specialization
D) Overlap with partial specialization
When an entity instance may be a member of multiple subtypes or it does not have to be a member of a
subtype, it is which of the following?
A) Disjoint with total specialization
B) Disjoint with partial specialization
C) Overlap with total specialization
D) Overlap with partial specialization
Use of a supertype/subtype relationship is necessary when which of the following exists?

A) An instance of a subtype participates in a relationship that is unique to that subtype.
B) An instance of a subtype participates in a relationship that is the same as the other subtypes
C) Attributes apply to all of the instances of an entity type.

D) No attributes apply to any of the instances of an entity type.
Mapping Supertype/Subtype Relationships
One relation for supertype and for each subtype

Supertype attributes (including identifier) go into supertype relation
Subtype attributes go into each subtype;
Primary key of supertype relation also becomes primary key and a foreign
key of subtype relation
There is no way to enforce completeness constraint or disjointness
(disjoint/overlap)
These must be enforced through application programming
You may consider it as 1:1 relationship established between supertype and
each subtype, with supertype as primary table
EMP_NUM is considered here as

Primary key for the table PILOT
Foreign key referring to the EMPLOYEE table
What do you suggest here??
Summary
Extended entity relationship (EER) model adds semantics to ER model via entity supertypes and
subtypes
Entity supertype is a generic entity type related to one or more entity subtypes
Specialization hierarchy depicts arrangement and relationships between entity supertypes and
entity subtypes
Inheritance means an entity subtype inherits attributes and relationships of supertype

Disjoint subtypes an instance of the supertype can be only ONE of the subtypes
Overlapping subtypes : An instance of the supertype could be more than one of the subtypes
Partial completeness : Some supertype occurrences that are not members of any subtype
Total completeness : Every supertype occurrence must be member of at least one subtype
Databases
2014/15
Week 8 (Friday)
Functional Dependencies and Normalization for Relational Databases
(part 2)
Shereen Fouad
Teaching Fellow

Normalization is the process of evaluating and correcting table
structures to minimize data redundancies and reduces data
anomalies
An un-normalized relation (table) stores redundant data, which can
cause
Insertion anomalies
Deletion anomalies
Update anomalies
Normal Forms
Normalization can be divided into a series of stages called normal
forms, giving more and more protection:
1NF Relations
2NF Relations
3NF Relations
BCNF Relations
4NF Relations
First Normal Form

Disallows
multivalued attributes
nested relations; attributes whose values for an individual tuple
are non-atomic
Considered to be part of the definition of relation
Normalization into 1NF
Composite Keys
Overview
Prime vs. Nonprime Attribute

If a relation schema has more than one key, each is called a candidate
key.
One of the candidate keys is arbitrarily designated to be the primary key, and
the others are called secondary keys.
A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributethat is, it is not a
member of any candidate key.
Second Normal Form (2NF)

An entity type is in second normal form (2NF) if:
It is in 1NF and
Every non-prime attribute A in R is fully functionally dependent on
the primary key
It includes no partial dependencies (No attribute is dependent
on only portion of primary key) if
1NF but not in 2NF because of a partial dependency
Conversion to Second Normal Form

Step 1:
For each determinant D involved in a partial dependency in the
original entity type T, use D as, also, the PK for a new entity type
NT(D)
Step 2:
Move out the attributes X determined by D into NT(D).
D itself stays in T as well as being copied into NT(D).

At this point, most anomalies have been eliminated
It is in 1NF
It includes no partial dependencies
Third Normal Form (3NF)

A table is in third normal form (3NF) if:
It is in 2NF
It contains no transitive dependencies
(no non-prime attribute A in R is transitively dependent on the
primary key)
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a
problem only if Y is not a candidate key.
When Y is a candidate key, there is no problem with the transitive
dependency .
2NF but not in 3NF because of a transitive dependency
A non-prime attribute is determined by another nonprime

attribute
Conversion to Third Normal Form

Step 1: Identify Each New Determinant
For each determinant D involved in a transitive dependency in the original

entity type T, use D as, also, the PK for a new entity type NT(D)
Step 2: Identify the Dependent Attributes
and move out the attributes X transitively determined by D into NT(D).
NB: the determinants themselves stay in T as well.
Name tables to reflect its contents and function
It is in 2NF
It contains no transitive dependencies
Figure 10.11 Normalization the following relation to 3NF
Figure 10.11 Normalization the following relation to 3NF
Figure 10.11 Normalization into 2NF
Figure 10.11Normalization into 2NF and 3NF
Figure 10.11Normalization into 2NF and 3NF
The Boyce-Codd Normal Form (BCNF)

Every determinant in table is a candidate key
Has same characteristics as primary key, but for some reason, not
chosen to be primary key
When table contains only one candidate key, the 3NF and the BCNF
are equivalent
BCNF can be violated only when table contains more than one
candidate key
The Boyce-Codd Normal Form (BCNF)

(continued)
Most designers consider the BCNF as special case of 3NF
Table is in 3NF when it is in 2NF and there are no transitive
dependencies
Table can be in 3NF and fail to meet BCNF
No partial dependencies, nor does it contain transitive
dependencies
A nonkey attribute is the determinant of a key attribute
A,B
A,C
C,D
B,D
This change is appropriate because

The dependency C--> B means that
C is effectively a superset of B
Student ID
Student ID
Staff ID
Class Code
Staff ID
Class Code
Class Code
Staff ID
Enroll_Grade
Enroll_Grade
Student ID
Class Code
Enroll_Grade
Class Code
Staff ID
Databases
2014/15
Week 8 (Wednesday)
(part 1)
Shereen Fouad
Teaching Fellow

How to use SQL for data administration to create databases and tables.
SQL Data types
SQL Constraints
NOT NULL constraint
UNIQUE constraint
DEFAULT constraint
CHECK constraint
Primary Key
Foreign Key
DROP TABLE
ALTER TABLE
INSERT, UPDATE, and DELETE
Overview
Motivation
What normalization is and what role it plays in the database design
process
Identify possible insertion, deletion, and update anomalies in a
relation
How normalization and ER modeling are used concurrently to
produce a good database design
What are the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF
Identify functional dependencies, determinants, and dependent
attributes
First Normal Form

Data
Requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema

requirements and results needed from the database. It
is high level description (often done with Entity
Relational database map ERD into relational data
model. Closer to the actual implementation. DBMS
specific.
implementation (programs, tables, dictionaries and
catalogs)
Motivation
Consider that you are requested to design a database from an
existing data from spreadsheets as given in the table below.
SKU implies Stock Keeping Unit
What is the best table

design??
Should this data be stored as

two separate tables??
Or join the tables together and
design the database with just
one table??
What are the criteria

for "good" base
relations?
How can we convert
a bad relation to a
better design
relation?
The process of
decomposing
unsatisfactory "bad"
relations by breaking
up their attributes
into smaller relations
is known as
Normalization
Database Tables and Normalization

Normalization is the process of evaluating and correcting table
structures to minimize data redundancies and reduces data
anomalies
It is often used within ER modeling, to help produce a good database
design.
It can be considered as an alternative approach for database
modeling.
It evaluates entity types, and when appropriate creates new entity
types and adjusts attributes in existing ones
Normalization generally increases the number of tables and makes
many queries more elaborate.
The Need for Normalization

An un-normalized relation (table) stores redundant data, which can
cause
Insertion anomalies
Deletion anomalies
Update anomalies
Deletion Anomaly
Suppose we delete the data for repair number 2100.
When we delete this row (the second one), we remove not only data about the
repair, but also data about the machine itself.
We will no longer know, for example, that the machine was a Lathe and that its
AcquisitionPrice was 4750.00.
When we delete one row, the structure of this table forces us to lose facts about
two different things, a machine and a repair.
Insertion Anomaly
Now suppose we want to enter the first repair for a piece of equipment.
To enter repair data, we need to know not just RepairNumber, RepairDate, and
RepairCost, but also ItemNumber, EquipmentType, and AcquisitionCost.
If we work in the repair department, this is a problem, because we are unlikely to
know the value of AcquisitionCost.
The structure of this table forces us to enter facts about two entities when we
just want to enter facts about one.
Update Anomaly
Suppose we update the last row of the following table using the data (100, Drill
Press, 5500, 2500, 08/17/09, 275).
The drill press has two different AcquisitionCosts (data inconsistency).
Equipment cannot be acquired at two different costs. If there were, say, 10,000
rows in the table, however, it might be very difficult to detect this error.
(100,
Drill Press,
5500,
2500,
08/17/09,
275)
Normal Forms
Normalization can be divided into a series of stages called normal
forms, giving more and more protection:
1NF Relations
2NF Relations
3NF Relations
BCNF Relations
4NF Relations
The Normalization Process
2NF is better than 1NF; 3NF is better than 2NF

Objective of normalization is to ensure all tables in at least 3NF
Each table represents a single subject
No data item will be unnecessarily stored in more than one table
All attributes in a table are dependent on the primary key
Each table void of insertion, update, deletion anomalies
Normalization works one relation at a time
Progressively breaks table into new set of relations based on
identified dependencies
The Normalization Process

For most business database design purposes, 3NF is as high as
needed in normalization
Highest level of normalization is not always most desirable
Price paid for increased performance is greater data redundancy
Some situations require non-normalization or denormalization for
efficiency reasons.
Denormalization produces a lower normal form
Normal Forms
Functional dependency are used to specify formal measures of the

"goodness" of relational designs
Functional Dependency (FD)

In general, a functional dependency exists when the value of one or more
attributes determines the value of another attribute.
Suppose you are buying boxes of cookies and each box costs 5.00.
Then the cost of several boxes with the formula:
CookieCost = NumberOfBoxes * 5
Then we can say that CookieCost is functionally dependent on NumberOfBoxes
and the UnitPrice (i.e., 5 ).
This expression can be read as NumberOfBoxes and UnitPrice determines
CookieCost.
(NumberOfBoxes , UnitPrice)
CookieCost
The variable on the left, here NumberOfBoxes and UnitPrice, are called the
determinant.
FDs are derived from the real-world constraints on the attributes

Can be displayed graphically on a relation schema as in the following slide.

Consider the relation
Student (ID, Name, Soc Sec Nbr, Major, Deptmt)
Assume a department offers several majors, e.g. INSY department offers, INSY,
MASI, and POMA majors.
How many determinants can you identify in Student?
(Soc Sec Nbr) (ID, Name, Major, Deptmt)
(ID)
(Name, Soc Sec Nbr, Major, Deptmt)
(Major)
(Deptmt)
A Dependency diagram
ID
Name
Soc_Sec_Nbr
Major
Dept

Full functional dependency
Attribute B is fully functionally dependent on attribute A if it is
functionally dependent on A and not functionally dependent on
any proper subset of A (partial dependency).
This becomes an issue only with composite keys.
Transitive dependency
A, B and C are attributes of a relation such that A B and B C,
then C is transitively dependent on A via B (provided that A is not
functionally dependent on B or C)
What are the functional dependencies in this table?

Can you spot any Partial dependency
Can you spot any Transitive dependency
Note that Primary Keys are underlined
Source of Image: Fundamentals of Database Systems by Ramez Elmasri , Shamkant B. Navathe
Partial dependency
What are the functional dependencies in this table?

Can you spot any Partial dependency
Can you spot any Transitive dependency
Note that Primary Keys are underlined
Transitive dependency
First Normal Form (1NF)

Just insists on some restrictions we have already explicitly or
implicitly imposed on entity types and tables:
A relation is in 1NF if all underlying domains contain atomic
values only, i.e., no repeating groups. (The relation must not
contain multivalued attribute)
In the entity type there is a candidate key whose attributes
never have NULL values, and one such key has been chosen as
the primary key.
Normalizing table structure will reduce data redundancies
A Sample Report Layout with repeated groups
Conversion to First Normal Form

Step 1: Eliminate the Repeating Groups
Eliminate nulls: each repeating group attribute contains an appropriate
data value
Step 2: Identify the Primary Key

Must uniquely identify attribute value
New key must be composed
Step 3: Identify All Dependencies

Dependencies depicted with a diagram
That Table put into 1NF (assuming there is a PK)

There are
no
repeating
groups in
the table.
Conversion to First Normal Form (continued)

All key
attributes
are defined
Dependency diagram:
No
repeated
Groups
Example of First
Normal Form
Databases
2014/15
Week 9 (Friday)
Mathematical background to tables
Shereen Fouad
Teaching Fellow
Motivation
Manipulation of data
(query
and
update
operations) corresponds
to operations on relations
A query is applied to relation

instances, and the result of a
query is also a relation
instance.
Relational algebra describes those operations

Data is represented in a Relational Model
Supports simple, powerful Query languages
Relational algebra
First described by Codd at IBM,
It is a family of algebra with a well-founded semantics used for
modelling the data stored in relational databases, and defining
queries on it.
Relational algebra contains two kinds of operators:
common a set operations (such as union, intersection, and
cartesian product),
operators specific to relations (for example projecting on one of
the columns) selection (keeping only some rows of a table)
Relational Model
Emp is a relation
and it is a set with eight members
A mathematical relation is a set of
tuples: sequences of values.
Each tuple corresponds to a row in a
table.
Ech tuple can be considered as a
member/element in the set.
Mathematical sets: Basics

A set is an unordered collection of items of any sorts (people, numbers,
numerals, shoes, atoms, strings of characters, databases, sets, blades of grass, )
without any duplication of items.
The items are called elements or members.
S = {34, SHF, 59, UoB}, where SHF is a name for me and UoB is a name for
this university,
means that
S is the set consisting of (exactly) the following four items:
the abstract number 34, me, the abstract number 59, this university.
Basics, contd
{34, SHF, 59, UoB} = {UoB, 59, 34, SHF, SHF, 34}
Order of writing the members doesnt matter; duplication in the writing

doesnt duplicate the member.
A set can be infinite (e.g., the set of all whole numbers).
A set can contain just one member. Singleton set.
Theres a set with no members at all: the empty set, usually notated as , but
can also be written { }.
Somewhat analogous to zero, or a new committee which has no members
yet.
Another Notation
{n | n is an integer, n > 301} =
The set of n such that n is an integer and n > 301.
(Actually, this notation is a slight simplification.)
The set is the same as that denoted by, for instance,

{n | n is an integer, n 302}.
Some More Examples

{SHF, SHF} has 2 members: me, and a 3-char string.
{3, {4,5}, 4, 6}has 4 members, one of which is a set.
{3, {5,4}, 4, 6} is that same set.
{ {4,5} } has 1 member, which is a set.
{4,5} has 2 members, both numbers.
{} is a singleton set. Its only member is the empty set.
{{}} is a different singleton set.
Membership Relationship
a A means that a is a member of A.
5
{4,5}
{5,4} {3, {4,5}, 4, 6}
a A means that a is not a member of A.

5
{3, {4,5}, 4, 6}
{5}
{3, {4,5}, 4, 6}
{4,6}
{3, {4,5}, 4, 6}
{3,4,5} {3, {4,5}, 4, 6}
Subsets and Supersets

A B means that A is a subset of B (and that B is a superset of A). I.e., every
member of A is also a member of B.
Carefully distinguish between subset-of and member-of !!!
The symbol means the same as
does NOT mean that there cannot be equality.
Examples:
{4,5}
{5} {4,5,6}, {6,4} {4,5,6,7}, {6,4,7,5} {4,5,6,7}
{n | n is an EVEN whole number} {n | n is a whole number}
Subsets and Supersets

A for any set A.
A A for any set A.
(Reflexivity)
If A B and B A then A = B.
(Antisymmetry)
If A B and B C then A C.
(Transitivity)
Some Operations on Sets

Union of sets A and B:
A B = the set of things that are in A or B (or both).
NB: no repetitions created.
Intersection of sets A and B:
A B = the set of things that are in both A and B.
Difference of sets A and B:
A B = the set of things that are in A but not B.
Note: also notated by a backslash instead of a minus sign.
The minus sign is also more standardly used as in A a to mean remove
member a from A (if its a member of A at all).
Some Properties of those Operations

Union and intersection are commutative (can switch):
AB=BA
AB=BA
Union and intersection are associative
(can group differently):
A (B C) = (A B) C
A (B C) = (A B) C
Because of associativity, we can omit parentheses for union-only or intersection-only
cases:
ABCD
ABCD
Bad Associations
Caution: if an operation is not associative, the position of
parentheses is normally important.
In arithmetic, division is non-associative.
(x/y)/z is usually a different value from x/(y/z).
Two Other Properties

Union distributes over intersection:
A (B C) = (A B) (A C)
Intersection distributes over union:

A (B C) = (A B) (A C)
People
Tuples in a Table
PERS-ID
NAME
AGE
9568876A
Chopples
37
2544799Z
Blurp
1698674F
Rumpel
88
The tuples are just lists representing the rows:
9568876A, Chopples, 37 >

2544799Z, Blurp, NULL >
1698674F, Rumpel, 88 >
Table Rows are Tuples

In a table, each attribute has a value domain the set of values
that the attribute can have. E.g., the set of integers, the set of all
character strings of any length, or the set of character strings of a
specific format and length.
If the attribute allows NULL values, we include NULL in the value
domain as well.
The values in a row form a tuple of values from the respective value
domains. Just a list of the values, one for each attribute.
Tuples in General
A tuple in general is an ordered sequence of items of any sort. We will only deal with
finite tuples. Items CAN be duplicated.
Can also be called a vector in other CS terminology.
Notation by angle brackets and commas:

6, JAB, 5, JAB, 5, , 9>
Singleton and empty tuples: <6>, <>
The concatenation ( ) of two tuples is just the result of putting them end to end to get
one tuple.
<6, JAB, 5> <5,6> = <6, JAB, 5, 5, 6>
<6, JAB, 5> <>
= <6, JAB, 5>
Cartesian Products
The set of all possible tuples formed from some list of sets is called the Cartesian
product of the sets.
Notation, e.g.:
DEFGH
if D, E, F, G, H are the setsnot necessarily different.
The tuples are all possible tuples of the form
<d, e, , h>
where
d D, e E, , h H
Examples
Let A = {3, 8, 2} and B = {jjj, bb}.
Then A B =
{ <3, jjj>, <3, bb>, <8, jjj>, <8, bb>, <2, jjj>, <2, bb> }.
B B = { <jjj, jjj>, <jjj, bb>, <bb, jjj>, <bb, bb>}.
A = = A
A {TRUE} = { <3, TRUE>, <8, TRUE>, <2, TRUE> }
Relations
Any subset at all of a Cartesian product is called a relation on the sets in question
(D, E, above)
even the whole of the product (even if infinite)
and even the empty set.
I.e., a relation on D, E, , H is just some set of tuples that are each of form <d,e,
, h> where d D, e E, , h H.
Examples
Let A = {3, 8, 2} and B = {jjj, bb}.
The Cartesian product A B =
{ <3, jjj>, <3, bb>, <8, jjj>, <8, bb>, <2, jjj>, <2, bb> }.
Some relations on A and B:

{ <3, jjj>, <3, bb>,
{ <2, bb> }
AB
<2, jjj>}
Rows as forming a Relation

So, for a given table, the tuples corresponding to all possible rows
that you could create using whatever values you like from the value
domains, forms the Cartesian product of the value domains of the
table.
And, provided the table does not have repeated rows:

AT ANY MOMENT the actual set of rows, considered as tuples, is a
relation on the tables value domains.
NB: crucial here that no row is exactly repeated, because a mathematical set
cannot have repeated elements.
Databases
2014/15
Week 9 (Wednesday)
(part 3)
Shereen Fouad
Teaching Fellow
Announcements
Assignment 9 (assessed accounts for 10% of the module mark) will be
available on canvas today.
No lab session this week because you will be starting the conceptual
phase of assignment 9. However, you are very welcomed to attend
the lab but no demonstrators will be available.
Syllabus for ICY Databases students will be finished today

The Master's students (but NOT the ICY students) still have the
following Learning Outcome (LO 5):
Apply relational algebra and the mathematical theory of relations
to describe databases, queries, and consistency conditions.
However, ICY students will be expected to come to lectures in full, as I
may make occasional additional comments that are not on LO5.
Overview
Fourth Normal Form (4NF)
Normalization and Database Design
Denormalization
Summary
Assignment 9 specifications.

About a different sort of issue from 2NF/3NF/BCNF.
Those NFs are concerned with the redundancy from functional
dependencies (FDs).
4NF is concerned with redundancy resulting from multivalued
dependencies (MVDs).

A relation is a 4 NF if it is BCNF and
There is no multivalued dependency in the relation or
There are multivalued dependency but the attributes, which are multivalued
dependent on a specific attribute, are dependent between themselves
What is a multivalued dependency (MVD)?
Definition of MVD
A multivalued dependency of some attribute X on an attribute-set D
is like a functional dependency except that X is allowed to have
several values for a given value of D.
The crucial point is that once the D value is specified, the X values are
independent of other attributes.
So, we can think of X as a multivalued attribute implemented by
putting different values in different rows, where the set of X values is
fully determined by just the value of D.
Not 4 NF Example
Assume the following relation with multivalued dependency:
Employee (Eid:pk1, Languages:pk2, Skills:pk3)
Recall that a relation is in BCNF if all its determinant are candidate
keys.
Because relation Employee has only one determinant (Eid, Language,
Skill), which is the composite primary key.
Since the primary is a candidate key, R is in BCNF.
However this relation has a MVD
Eid --->> Languages
Eid --->> Skills
Languages and Skills are independent.
Not 4 NF Example (conti...)

Eid
100
100
100
100
200
Language
English
Kurdish
English
Kurdish
Arabic
Skill
Teaching
Politics
Politics
Teaching
Singing
Insertion anomaly: To insert row (200 English Cooking) we have to insert

two extra rows (200 Arabic cooking), and (200 English Singing) otherwise
the database will be inconsistent.
Not 4 NF Example (conti...)

Here is the table after the insertion:
Eid
100
100
100
100
200
200
200
200
Language
English
Kurdish
English
Kurdish
Arabic
English
Arabic
English
Skill
Teaching
Politics
Politics
Teaching
Singing
Cooking
Cooking
Singing
Change to 4NF
By placing the multivalued attributes in tables by themselves we can
convert the table to the following:
Eid
Language
Eid --->> Languages
Skill
Eid --->> Skills
Eid
Language
Eid
Skill
4 NF Example
Assume the following relation:
Employee (Eid:pk1, Language:pk2, Skill:pk3)
Eid
100
100
100
200
200
Language
English
Kurdish
French
English
Arabic
Skill
Teaching
Politic
Cooking
Cooking
Singing
4 NF Example (conti...)
Assume the following relation with multi-value dependency:
Employee (Eid:pk1, Languages:pk2, Skills:pk3)
Eid --->> Languages
Eid --->> Skills
Languages and Skills are dependent.

This says an employee speaks several languages and has several
skills. However for each skill, a specific language is used when that
skill is practiced.
4 NF Example (conti...)
Thus employee 100 when she teaches, she uses English; but when she cooks, she
uses French. This relation is in fourth normal form.
Eid
100
100
100
200
200
Language
English
Kurdish
French
English
Arabic
Skill
Teaching
Politic
Cooking
Cooking
Singing
Normal Forms Overall

Normalization helps eliminate data redundancies and some other aspects
of poor structure.
Normalization focuses on problems in individual entity types.
Make sure that proposed entities meet required normal form before table
structures are created
Difficult to separate normalization from overall ER modelling process.
Normalization cannot, by itself, guarantee good designs.
Non-normalized entity types may be desirable in some cases, to increase
processing speed and/or reduce conceptual complexity of operations.
Normal Forms Overall

Let < mean provides less protection than. Then:
1NF < 2NF < 3NF < BCNF ((and 3NF < 4NF))
((Also BCNF < 4NF under the second definition of 4NF.
BCNF and 4NF guard against relatively unusual situations. BCNF is
more disruptive to achieve than 2NF or 3NF.
3NF is a reasonable target, but BCNF, 4NF etc. may also need to be
considered.
Non-Normalization/Denormalization
If tables decomposed to conform to normalization requirements:
Number of database tables expands
Joining larger number of tables takes additional disk input/output

(I/O) operations, additional manipulation complexity, and possibly
substantial communication delays.
Processing requirements should also be a goal
Conflicts among design principles, information requirements, and
processing speed are often resolved through compromises that may
include ending up with some non-normalized tables.
Summary
Normalization is used to minimize data redundancies
First three normal forms (1NF, 2NF, and 3NF) are most commonly
encountered
Table is in 1NF when it doesn't contain a repeated group attribute. All
key attributes are defined
Table is in 2NF when it is in 1NF and contains no partial dependencies
Table is in 3NF when it is in 2NF and contains no transitive
dependencies
Table that is not in 3NF may be split into new tables until all of the
tables meet 3NF requirements
Summary (continued)
Normalization is important partbut only partof the design
process
Table in 3NF may contain multivalued dependencies
Numerous null values or redundant data
Convert 3NF table to 4NF by:

Splitting table to remove multivalued dependencies
Tables are sometimes denormalized to yield less I/O, which increases

processing speed
Databases
2014/15
Week 10 (Wednesday)
Relational Algebra (part 1)
Shereen Fouad
Teaching Fellow
Relational Query Languages

Languages for describing queries on a relational database
Structured Query Language (SQL)
Predominant application-level query language
Declarative
Relational Algebra
Intermediate language used within DBMS
The basic set of operations for the relational model
Procedural
Relational Algebra
A formal language (based on operators and a domain of values) that
aims to perform queries in relational databases
It is often considered to be an integral part of the relational data
model.
Why is it important??
It provides a formal foundation for relational model operations.
It is used as a basis for implementing and optimizing queries in the
query processing and optimization modules that are integral parts of
relational database management systems (RDBMSs),
Relational Algebra in a DBMS

Relational
algebra
expression
SQL
query
Optimized
Relational
algebra
expression
Query
execution
plan
Executable
code
Code
generator
parser
Query optimizer
DBMS
Relational Algebra Operations

Set operations from mathematical set theory (each relation is considered
as a set of tuples)
Set-difference ( ) Tuples in r1, but not in r2.
Union ( ) Tuples in r1 or in r2.
Intersection () Tuples in r1 and in r2.
Cross-product ( ) Allows us to combine two relations.
Operations developed specifically for relational databases
Selection ( s ) Selects a subset of rows from relation (horizontal).
Projection ( p ) Retains only wanted columns from relation (vertical).
Join ( ) Joining two relations.
Use of relational algebra operators on existing tables produces new tables
Select Operator
Produce table containing subset of rows of argument table satisfying condition
Select Operator
SQL:
SELECT * FROM WHERE
Note: its the WHERE part that is actually doing the selection
according to a criterion.
Relational algebra notation

scondition relation
More compact than SQL notation. Avoids notation private to
particular versions of particular programming languages.
Select Operator
SQL:
SELECT *
FROM Person
WHERE Hobby=stamps
Relational Algebra: sHobby=stamps(Person)
Person
Id
1123
1123
5556
9876
Name
John
John
Mary
Bart
Address
123 Main
123 Main
7 Lake Dr
5 Pine St
Hobby
stamps
coins
hiking
stamps
Id
1123
9876
Name Address
John 123 Main
Bart
5 Pine St
Hobby
stamps
stamps
Selection Condition - Examples

s Id>3000 Or Hobby=hiking (Person)
s Id>3000 AND Id <3999 (Person)
s NOT(Hobby=hiking) (Person)
s Hobbyhiking (Person)
Project Operator
Produces table containing subset of columns of argument table
Project Operator
SQL:
SELECT column specs FROM
Relational algebra notation

attribute list(relation)
Retains only attributes that are in the projection list.
Schema of result:
exactly the fields in the projection list, with the same names that they had in the input relation.
Projection operator has to eliminate duplicates
Project Operator
SQL:
SELECT name, hobby

FROM Person
Relational Algebra: Name,Hobby(Person)
Person
Id
1123
1123
5556
9876
Name
Address
Hobby
John
John
Mary
Bart
123 Main
123 Main
7 Lake Dr
5 Pine St
stamps
coins
hiking
stamps
Name Hobby
John
John
Mary
Bart
stamps
coins
hiking
stamps
Expressions
Id, Name (s
Id
Name
Address
1123
1123
5556
9876
John
John
Mary
Bart
Hobby=stamps OR Hobby=coins
Hobby
123 Main
123 Main
7 Lake Dr
5 Pine St
stamps
coins
hiking
stamps
Id
(Person) )
Name
1123 John
9876 Bart
Result
Person
13
Relational Set Operations

Union of relations R and S:
R S = the set of tuples that are in R or S (or both).

NB: no repetitions created!
Intersection of relations R and S:
R S = the set of tuples that are in both R and S.
Difference of relations R and S:
R S = the set of tuples that are in R but not S.
Union-compatible relations
Result of combining two relations R and S with a set operator
is a relation => all its elements must be tuples having same
structure
Hence, scope of set operations limited to union compatible
relations
Two relations A and B are union-compatible if they have the
same
number of columns and corresponding
columns have the same domains.
Union
Let A and B be two union-compatible relations.
Result of A B, contains all rows A in and all rows in B, with duplicate rows eliminated
Which of these are union-compatible?
(B)
(A)
(C)
To retrieve the Social Security numbers of all employees who either work in
department 5 or directly supervise an employee who works in department 5,
using the UNION operation
Alternative soluion
Note renaming the result set.
Difference
Let R and S be two union-compatible relations.
Then their difference R - S is a relation which contains tuples which are in R but
not in S
Intersect
Let R and S be two union-compatible relations.
Then their intersection is a relation R S which contains tuples which are both in R and S
Note that INTERSECTION can be expressed in terms of union and set difference as follows:
R S = ((R S) (R S)) (S R)
Cross-Join or Product
SQL:
SELECT * FROM two [or more] tables
NB: its the mere listing of the tables that does the Product, but its possible
also to write:
SELECT * FROM T1 CROSS JOIN T2 CROSS JOIN ...
Relational algebra notation:
Result table is T1 T2 where T1 and T2 are the given tables.
Each row of T1 paired with each row of T1.
Yields a table containing all concatenations of whole rows from first given
table with whole rows from second given table.
If second table also had a PRICE attribute, then the product would have a Table1.PRICE
attr. and a Table2.PRICE attr.
Note that the two

tables need not be
union compatible
JOIN operation
Join (various types)
Allows us to join related rows from two or more tables
Its an important feature of the relational database idea
Joining has been implicitly important because of the use of mutli-table

queries and the use of WHERE to test for attribute equality between
tables.
Denoted by
Condition-Join
R c S s c ( R S)
Where R and S are relations and c is the condition applied.
The JOIN operation can be specified as a PRODUCT operation followed by a

SELECT operation.
Fewer tuples than PRODUCT.
Filters tuples not satisfying the join condition.
Sometimes called a theta-join.
A general join condition is of the form
<condition> AND <condition> AND...AND <condition>
Tuples whose join attributes are NULL or for which the join condition is
FALSE do not appear in the result.
Ramez A. Elmasri, Shankrant B. Navathe. 1999. Fundamentals of Database Systems (3rd ed.). Carter
Shanklin (Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
Fundamentals/ICY: (06-21923)/(06-21980) Databases

2014/15
Week 10 (Friday)
Shereen Fouad
Teaching Fellow
Announcements
Lab session next week to catch up with any of the exercise that you have
missed or to complete the practical requirements of assignment 9.
Next Wednesdays lecture (week 11) will be in LT2, Gisbett Kapp (G 8)
from 12:00 pm to 1:00 pm (last lecture)
No lecture on Friday (week 11).
Normalization hand-outs /exercise has been released on canvas.
Relational Algebra hand-outs /exercise has been released on canvas.
Module evaluation Forms are today (last 10 minutes of the lecture)
Relational Algebra in a DBMS

Relational
algebra
expression
SQL
query
Optimized
Relational
algebra
expression
Query
execution
plan
Executable
code
Code
generator
parser
Query optimizer
DBMS
Relational Algebra Operations

Set operations from mathematical set theory (each relation is considered
as a set of tuples)
Set-difference ( ) Tuples in r1, but not in r2.
Union ( ) Tuples in r1 or in r2.
Intersection () Tuples in r1 and in r2.
Cross-product ( ) Allows us to combine two relations.
Operations developed specifically for relational databases
Selection ( s ) Selects a subset of rows from relation (horizontal).
Projection ( p ) Retains only wanted columns from relation (vertical).
Join ( ) Joining two relations.
Use of relational algebra operators on existing tables produces new tables
JOIN operation
Join (various types)
Allows us to join related rows from two or more tables
Its an important feature of the relational database idea
Joining has been implicitly important because of the use of mutli-table queries
and the use of WHERE to test for attribute equality between tables.
Denoted by
Perform a cross join that yields specified attributes
Note that the two

tables need not be
union compatible
Overview
Review on Condition Join
Equijoin Join
Natural Join
Outer Joins
Left
Right
Full
Condition-Join
(Relation 1
condition
Relation 2)
The JOIN operation can be specified as a PRODUCT operation

followed by a SELECT operation.
Fewer tuples than PRODUCT.
Sometimes called a theta-join.
Condition-Join Example
Retrieve the department names of employees who earn more than
40000 pounds
DName(EMPLOYEE
Dno=Dnumber AND Salary > 40000 DEPARTMENT)
Equijoin Join
The most common use of JOIN involves join conditions with equality
comparisons only.
The only comparison operator used is =,
Example of EQUIJOINs.
So Condition join is just Like equijoin but using a non-equality join

condition
Equijoin Join -Example

Retrieve the name of the manager of each department.
Note that, in the result of an EQUIJOIN we always have one or more pairs of
attributes that have identical values in every tuple.
Mgr_ssn and Ssn are identical in every tuple of DEPT_MGR (the EQUIJOIN
result)
Natural Join
Because one of each pair of attributes with identical values is
superfluous, a new operation called NATURAL JOIN was created to get
rid of the second (superfluous) attribute in an EQUIJOIN condition.
NATURAL JOIN is basically an EQUIJOIN followed by the removal of
the duplicate columns from the result
Used when tables share one or more common attributes with same
names.
SQL Syntax:
SELECT column-list FROM table1 NATURAL JOIN table2
Two Tables That Will Be Used

to Illustrate the execution of a Natural Join
(CUSTOMER
AGENT)
The common attributes or columns are called the join attributes
Step 1: PRODUCT
Note the two AGENT_CODE columns
Step 2: SELECT
to get equal agent codes in each row
SELECT is performed on the resulting table to yield only the rows for which the
join-attribute values (e.g. AGENT_CODE values) are equal
Step 3: PROJECT
to get just one agent column
PROJECT is now performed to yield a single copy of each join attribute,

thereby eliminating duplicate columns
What if the two tables have no attributes in common?
What if the two tables have no attributes in common?

So in this case the result is the PRODUCT (CROSS JOIN) of the two
tables!!
My Plan
Get the John Smith information
Get project numbers he works on
Get employee names working on these projects (including John
Smith)
Exclude John Smith from the final result
Note the denotes an AND operator
Last week answers
Note the denotes an AND operator
References
Source of images in this lecture:
Management, by Stephen Morris, Peter Rob, Carlos Coronel
Fundamentals of Database Systems by Ramez Elmasri, Shamkant B. Navathe
Databases
2014/15
Week 11 (Wednesday)
Shereen Fouad
Teaching Fellow
Announcements
Lab session tomorrow to catch up with any of the exercise that you have
missed or to complete the practical requirements of assignment 9.
No lecture on Friday (week 11).
Today is the Last Lecture!!!

Condition Join
Equijoin Join
Natural Join
Overview
Outer Joins
Left
Right
Full
Division (optional, not included in the exam)
Outer Joins
Developed for the case where the user wants to keep all the tuples in
Relation 1, or all those in Relation 2, or all those in both relations in
the result of the JOIN, regardless of whether or not they have
matching tuples in the other relation.
So.
Returns rows matching the join condition
Also returns rows with unmatched attribute values for tables to be
joined
Three types (Left, Right and Full)
Left and right designate order in which tables are processed
Outer Joins (continued)

Left outer join
Returns rows matching the join condition
Returns rows in left side table with unmatched values
Right outer join
Returns rows matching join condition
Returns rows in right side table with unmatched values
SQL Syntax:
SELECT column-list
FROM table1 LEFT [OUTER] JOIN table2 ON join-condition
Outer Joins (continued)

Full outer join
Returns rows matching join condition
Returns all rows with unmatched values in either side table
Syntax:
SELECT column-list
FROM table1 FULL [OUTER] JOIN table2
ON join-condition
Outer Join - Example
Left Outer Join

(CUSTOMER
AGENT)
Left Outer join of CUSTOMER and AGENT, using equal AGENT_CODE
Uses all the rows in the CUSTOMER table, by doing equijoin on AGENT_CODE
but also including NON-matching CUSTOMER rows.
Right Outer Join

(CUSTOMER
AGENT)
Left Outer join of CUSTOMER and AGENT, using equal AGENT_CODE
Uses all the rows in the AGENT table, doing equijoin on AGENT_CODE but also
including NON-matching AGENT rows.
Full Outer Join (
Would have the extra row of this table as well as the extra row of
the Left Outer Join table
Using all the rows in the AGENT and CUSTOMER tables, doing equijoin on
AGENT_CODE but also including NON-matching rows from each table.
= Union of Left Outer Join result and Right Outer Join result.
list of all employee names as well as the name of the departments they manage if
they happen to manage a department; if they do not manage one, we can indicate it
with a NULL value
list of all employee names as well as the name of the departments they manage if
they happen to manage a department; if they do not manage one, we can indicate it
with a NULL value
Following stuff on DIVIDE is optional
Division
Goal: Produce the tuples in one relation, r, that match all tuples in
another relation, s
r (A1, An, B1, Bm)
s (B1 Bm)
r/s, with attributes A1, An, is the set of all tuples <a> such that for every
tuple <b> in s, <a,b> is in r
Can be expressed in terms of projection, set difference, and crossproduct
DIVIDE operation on DB tables

Simplest case: 2-col table by 1-col table (T/S)
Q
T
S
The only value of LOC that is associated in T with both values A and B of
CODE is 5.
Division - Example
Student_Records (StudId, CrsCode, Semester, Grade)
Teaching (ProfId, CrsCode, Semester)
List the Ids of students who have passed all courses that were taught
in summer 2013
Numerator: StudId and CrsCode for every course passed by every
student
StudId, CrsCode (Grade F (Student_Records ) )
Denominator: CrsCode of all courses taught in summer 2013

CrsCode (Semester=S2013 (Teaching) )
Result is numerator/denominator
References
Source of images in this lecture:
Management, by Stephen Morris, Peter Rob, Carlos Coronel
Fundamentals of Database Systems by Ramez Elmasri, Shamkant B. Navathe
THANK YOU
GOOD LUCK

DataBasesSlideCombined PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DataBasesSlideCombined PDF

Uploaded by

Copyright:

Available Formats

Fundamentals/ICY: (06-21923)/(06-21980)

The answer is Database Technology!!!

Data vs. Information

Raw data must be structured for storage, processing, and

Applications of Database Technology

Applications of Database Technology

Storage and retrieval of Web content (HTML, PDF, images,..).

Storage of huge data for analysis.

Monitoring data to take action when requires.

Why This Course?

The (digital) world runs on data.

What Well Mainly Study

SQL statements to define,

What Well Mainly Study (cont.)

Note about SQL Coverage

Lectures and Practical Sessions

One practical session a week

Database Management System - PostgreSQL

One ASSESSED exercise,

Assessment Differentiation between CS

Assessment Differentiation between CS

Reminder of previous lecture

Associative linking versus pointing

Student Table Example

A Table is composed of rows and columns.

Student Table Example

Each column represents an attribute and is identified by a distinct name.

Student Table Example

Student Table Example

Student Table Example

Student Table Example

Student Table Example

Ways of Doing Cross-Reference

Student Table Example

What are the disadvantages of using character strings

Disadvantages of using character strings as

Inefficiency of comparing such complex values.

Reduce such problems by:

An Analogy with Programming

A relational database developer refers to a data record as

The number of tuples in a table is called its

Tables must have ------------------- to uniquely identify each row

Problems with that Table

107 Worm Drive,

Next to the Tescos in Upper

Full Monty chip shop

Hilary R. Clinton (grr!)

The Old Black House, 15768

Problems with that Table

Restrictions on Database Tables:

Restrictions on Database Tables:

One data item per cell (but it can be a variable-length character

Extra, Crucial Restriction

Table on next slide is closer to what

107 Worm Drive,

The Old Black

Reminder of previous lecture

A Closer Look to a Database Definition

A Closer Look to a Database Definition (cont.)

Centralized vs. Distributed database

Types of Databases (cont.)

Operational database: supports a companys day-to-day operations