You are on page 1of 491

Fundamentals/ICY: (06-21923)/(06-21980)

Databases
2014/15
Week 1 (Wednesday)
Initial Orientation
Shereen Fouad
Teaching Fellow in School of Computer Science

Overview
Motivation
What is a database.
Applications of database technology.
Initial orientation about the course.
2

Motivation
Consider a supermarket business.
What do you want to keep track of?
What is the size of the data?
How can the business:
store and manage such data?
retrieve, manipulate and disseminate data?
take critical business decision?
How can we monitor the performance of the business?

The answer is Database Technology!!!

What is a Database?
The broad interpretation of a Database =
A collection of logically
coherent interrelated
data (raw facts of
interest to the end user)

Description of data
characteristics and
relationships
(Metadata: data about
data)

Data vs. Information


To understand what derives a database design you need to
differentiate between data and information.
Data are raw facts
Information produced by processing raw data to reveal meaning
Data are the foundation of information, which is the base of knowledge

Raw data must be structured for storage, processing, and


presentation
Database technology provides the most efficient data management.
Database technology is crucial for good decision making.
6

Applications of Database Technology


Storage and retrieval of numerical and
alphanumerical data.
E.g. data about a companys employees,
products, projects, customers, suppliers,
orders, sales, assets, etc.

Applications of Database Technology


Storage and retrieval of multimedia data.
E.g. You tube.

Storage and retrieval of Web content (HTML, PDF, images,..).


E.g. Google

Storage of huge data for analysis.


E.g. data warehouse.

Monitoring data to take action when requires.


E.g. be able to accurately keep track of, e.g., employee pay and tax, the status of items
that any given customer has ordered.
8

Why This Course?


Database systems are at the core of Computer Science.
It integrates various computer science concepts.
Languages, data structures, concurrency

The (digital) world runs on data.


The topic is intellectually rich.
It provides valued job skills

Teaching Staff

10

What Well Mainly Study


Techniques to
design and model a
conceptual/physical
database

Application
Development

Database design

SQL

SQL statements to define,


query and control a
relational database

Database Internal
11

What Well Mainly Study (cont.)


Key aspects of how to develop the conceptual/logical design of relational
databases.
The nature of relational databases, the central modern type of database.
Some basic mathematical concepts underpinning relational databases, and
useful also in many other branches of CS.
In particular, how to achieve certain types of good structuring, to help
achieve certain types of correctness and efficiency.
How to create and manipulate databases using a particular database
language, PostgreSQL (a version of SQL: very widely used in various forms).
12

Note about SQL Coverage


The main coverage of SQL will be via the very detailed weekly Additional Notes
and SQL exercises starting in Week 2 of the term.
Lectures will cover some basic concepts of SQL
Your learning of SQL is best done by
Reading the notes
Doing the exercises
Seeking help from the demonstrators, whether in the lab or in their office
hours.
The lecture material on concepts, theory, and design issues is essential for
designing good databases and writing good SQL.

13

Lectures and Practical Sessions


There are two lectures a week:
Every Wednesday (from week 1 -week 10) from 12:00 pm to 1:00 pm,
in WG5, Aston Webb and week 11 in LT2, Gisbett Kapp.
Every Friday (from Week 1 on-wards) from 1:00 pm to 2:00 pm, in LT1, Law.

One practical session a week


There is a PRACTICAL SESSION (LAB SESSION) every Thursday from Week 2
onwards at 2:00-5:00 in the Lower Ground floor lab (LG04) in the CS building.
For the practical work you will be using a database management system
called PostgreSQL.
14

Database Management System - PostgreSQL


PostgreSQL is the relational database management system (RDBMS)
that we will be using for practical exercises in the module.
It contains a database definition/manipulation language that is one of
many versions of Structured Query Language "SQL".
It has a simple command-line interface and works on the School Unix
system (Linux system).
Many database systems exist with fancy interfaces, but I want to
concentrate in the module on the core technical detail.

15

Course Text
C. Coronel, S. Morris, P. Rob & K. Crockett,
Database Principles: Fundamentals of Design, Implementation and
Management, 2nd Ed or 10th Edition, 2013.
CHAPTERS you need are published on the module website.
You can find the book in the Cs and University library.

16

Exercises
Every week I will give you some exercises to do in the lab session.
You need to submit the exercise electronically via canvas.
The ones up to and including Week 8 will be UNASSESSED.
You will get feedback from demonstrators via canvas.
The ones in Week 9 will be ASSESSED and will be due to submission in
week 11, accounting for 10% of the module mark.
Late submission on assessed exercise will lead to penalties.

17

Grading
Your final grade will be based on:
ONE CLASS TEST,
in week 8, Friday 21/11/2014 in the lecture hall (LT1 LAW)
accounting for 10% of the module mark.

One ASSESSED exercise,


announced in week 9 of the term :Friday 28/11/2014
submission is due in week 11:Friday 12/12/2014
accounting for 10% of the module mark

Final Examination,
accounting for 80% of the module mark.
18

Assessment Differentiation between CS


Master's Students and Year-in-CS Students
The Master's students but NOT the Year in CS students have the
following Learning Outcome: (LO 5):
Apply relational algebra and the mathematical theory of relations
to describe databases, queries, and consistency conditions.
Some lectures (starting from week 8) will be partly or wholly on LO5
topics.
Year-in-CS students will be expected to come to these lectures in full.
Note that, in these lectures I may make occasional additional
comments that are not on LO5.
19

Assessment Differentiation between CS


Master's Students and Year-in-CS Students
Class test will not contain the Learning Outcome: (LO 5) so it will
contain mandatory questions on everyone.
Unassessed exercise sets (from week 9 to week 11) will contain work
on the (LO 5) topic.
If assessed work items (Class Test, Week 9-11 Exercises, or
Examination) contain questions on LO5 stuff, then these questions
will be optional for Year-in-CS students even when compulsory for
Master's students.

20

Summary
Database technologies are all over the place.
Database is a collection of logically coherent interrelated data as well
as a description of this data.
Information is the result of processing data to reveal its meaning

21

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 2 (Friday)
Introduction to Tables

Shereen Fouad
Teaching Fellow

Reminder of previous lecture


File System Data Management incorporates from data redundancy,
Structural and data dependence and inadequate security features
Difference between Databases, Database Management Systems and
Database systems.
A database (DB) consists of a DB schema and a DB state.
A database management system (DBMS) is a collection of programs that
manage the database structure and control access to database.
A database system (DBS) consists of a DBMS and a database.

Overview
Table Structure Example
Cross-references between places in a data repository (Referential
integrity)

Associative linking versus pointing


Restrictions on Database tables

Student Table Example


Imagine that this table (or relation) has been defined to help keep track student details .
The name of the Table (relation)
STUDENT

A Table is composed of rows and columns.


A Table contains a group of related entities -- i.e. an entity set.

STUDENT ID

F NAME

L NAME

AGE

STUDENT
ADDRESS

COURSE
NO.

COURSE
NAME

DEPARTMENT
LOCATION

E12345

John

Chopples

37

42 George
St.

Finance

Building b1

E12367

Kent

Danial

42

56 Malcom
St.

24

CS

Building c2

E54321

Michal

Blurp

21

5 Bristol St.

12

Marketing

Building b2

E5099

Amber

Rumpel

60

45 Lime St.

24

CS

Building c2

E54344

Lea

John

34

6 Dan St.

Finance

Building b1

Student Table Example


Imagine that this table (or relation) has been defined to help keep track student details .

STUDENT

Each column represents an attribute and is identified by a distinct name.


Tables must have an attribute to uniquely identify each row
The number of columns is known as its degree.

STUDENT ID

F NAME

L NAME

AGE

STUDENT
ADDRESS

COURSE
NO.

COURSE
NAME

DEPARTMENT
LOCATION

E12345

John

Chopples

37

42 George
St.

Finance

Building b1

E12367

Kent

Danial

42

56 Malcom
St.

24

CS

Building c2

E54321

Michal

Blurp

21

5 Bristol St.

12

Marketing

Building b2

E5099

Amber

Rumpel

60

45 Lime St.

24

CS

Building c2

E54344

Lea

John

34

6 Dan St.

Finance

Building b1

Student Table Example


Imagine that this table (or relation) has been defined to help keep track student details .
The schema (structure) for the table
STUDENT
STUDENT ID

F NAME

L NAME

AGE

STUDENT
ADDRESS

COURSE
NO.

COURSE
NAME

DEPARTMENT
LOCATION

E12345

John

Chopples

37

42 George
St.

Finance

Building b1

E12367

Kent

Danial

42

56 Malcom
St.

24

CS

Building c2

E54321

Michal

Blurp

21

5 Bristol St.

12

Marketing

Building b2

E5099

Amber

Rumpel

60

45 Lime St.

24

CS

Building c2

E54344

Lea

John

34

6 Dan St.

Finance

Building b1

Student Table Example


Imagine that this table (or relation) has been defined to help keep track student details .
Each entry in the table is called a row (tuple).
Sometimes an entry in the table is called a data record.
The number of tuples in a table is called its cardinality.
STUDENT
STUDENT ID

F NAME

L NAME

AGE

STUDENT
ADDRESS

COURSE
NO.

COURSE
NAME

DEPARTMENT
LOCATION

E12345

John

Chopples

37

42 George
St.

Finance

Building b1

E12367

Kent

Danial

42

56 Malcom
St.

24

CS

Building c2

E54321

Michal

Blurp

21

5 Bristol St.

12

Marketing

Building b2

E5099

Amber

Rumpel

60

45 Lime St.

24

CS

Building c2

E54344

Lea

John

34

6 Dan St.

Finance

Building b1

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10nd Ed.

Student Table Example


Imagine that this table (or relation) has been defined to help keep track student details .

STUDENT
STUDENT ID

F NAME

L NAME

AGE

STUDENT
ADDRESS

COURSE
NO.

COURSE
NAME

DEPARTMENT
LOCATION

E12345

John

Chopples

37

42 George
St.

Finance

Building b1

E12367

Kent

Danial

42

56 Malcom
St.

24

CS

Building c2

E54321

Michal

Blurp

21

5 Bristol St.

12

Marketing

Building b2

E5099

Amber

Rumpel

60

45 Lime St.

24

CS

Building c2

E54344

Lea

John

34

6 Dan St.

Finance

Building b1

Student Table Example


Imagine that this table (or relation) has been defined to help keep track student details .

STUDENT
STUDENT ID

F NAME

L NAME

AGE

STUDENT
ADDRESS

COURSE
NO.

COURSE
NAME

DEPARTMENT
LOCATION

E12345

John

Chopples

15

42 George
St.

Finance

Building b1

E12367

Kent

Danial

18

56 Malcom
St.

24

CS

Building c2

E54321

Michal

Blurp

19

5 Bristol St.

12

Marketing

Building b2

E5099

Amber

Rumpel

22

45 Lime St.

24

CS

Building c2

E54344

Lea

John

32

6 Dan St.

Finance

Building b1

Student Table Example


STUDENT
STUDENT
ADDRESS

COURSE
NO.

Chopples 15

42 George
St.

Kent

Danial

18

56 Malcom
St.

24

E54321

Michal

Blurp

19

5 Bristol St.

12

E5099

Amber

Rumpel

22

45 Lime St.

24

E54344

Lea

John

32

6 Dan St.

STUDEN
T ID

F
NAME

L
NAME

E12345

John

E12367

AGE

COURSE
COURSE NO.

COURSE
NAME

COURSE
LOCATION

Finance

Building b1

24

CS

Building c2

12

Marketing

Building b2

Referential Integrity
Referential integrity is relevant when one place in a data repository
needs to refer to something in another place: cross-references.
Referential integrity is achieved when every such referring place
contains a successful reference to another place or place-occupant
(or no reference at all).
Successful there just means that the reference succeeds in
specifying some other place(-occupant).

Ways of Doing Cross-Reference


Notice distinction above between referring to places or to placeoccupants: i.e., where or what, respectively
Pointing or associative linking, respectively.
Your party-attendance plan for the month would use
pointing if it referred to the party-givers by position, e.g. by page and line
number in your address book,
associative linking if it referred by means of party-givers names.
Labels in a diagram are a means for associative linking between the diagram
and the legend (= explanation of the labels, etc.) or other text.

Associative Linking
The notion of relational database rests heavily on associative
linking.
Notice that associative linkages between different places constitute a
specialized sort of needed redundancy.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Student Table Example


STUDENT
STUDENT
ADDRESS

COURSE
NO.

Chopples 15

42 George
St.

Finance

Danial

56 Malcom
St.

CS

5 Bristol St.

Marketing

STUDEN
T ID

F
NAME

L
NAME

E12345

John

E12367

Kent

E54321

Michal

Blurp

AGE

18
19

E5099

Amber

Rumpel

22

45 Lime St.

CS

E54344

Lea

John

32

6 Dan St.

Finance

COURSE
COURSE
NAME

COURSE
LOCATION

Finance

Building b1

CS

Building c2

Marketing

Building b2

What are the disadvantages of using character strings


like COURSE NAME as linking values?

Disadvantages of using character strings as


linking values
In entering values, have to ensure exactly the same string of
characters on each occasion
avoid typos e.g. Finance, Finace

Inefficiency of comparing such complex values.

Reduce such problems by:


Using artificial linking values that are simpler in form and easier to
make distinct ..

An Analogy with Programming


Analogous redundancy/anomaly issues arise in program text. E.g.:
If a constant numerical value such as or g (gravitational acceleration) needs
to be used in several places, best to give it a name and replicate the name,
not the value. Aids consistency and maintainability.
If a sequence of operations needs to be invoked in many different places in
the program, package it as a named procedure (function, method, ).

A relational database developer refers to a data record as

(A) a criteria.

(B) a relation.

(C) a tuple.

(D) an attribute.

Specifying the location of a particular information in a book, e.g. by page and line number, is
considered as
(A) associative linking .

(B) pointing .

(C) degree.

(D) tuple.

The number of tuples in a table is called its


(A) cardinality

(B) degree

(B) attribute

(D) relation

Tables must have ------------------- to uniquely identify each row


(A) a cardinality

(B) a tuple

(B) an attribute

(D) a relation

Problems with that Table


NAME

ADDRESS

PHONES

BIRTHDAY

Babloop Porkypasta

107 Worm Drive,


Hedgebarton, Birmngham,
B15 9ZZ

0121-944-5677
07979-888777

11 January 1969

Coriolanus
Zebedee
OCrackpotham

The Wellyboots,
Boring-under-Mosswood,
Berks, HP11 1XX

016789-997710

Johnny

Next to the Tescos in Upper


Street

H: 020-7111-2222
W: 020-7111-2255
M: 07887-842657

Full Monty chip shop

Harborne

Hilary R. Clinton (grr!)

The Old Black House, 15768


Aplanalp St.,
Las Cruces, NM 880011,
USA

???

Oct 05
ex-dir

16 Sep?
(refused to tell me
how old she was)

Problems with that Table


Although that table illustrates the sort of table used in databases in some sense, it has many
tricky features:
Empty entries whats the interpretation?
Spelling error (Birmngham)
Names/addresses of different forms (perhaps unavoidably)
Different numbers of alternatives in different cells
Different interpretations of birthday field
(per year, or when born, or when shop opened)
Vague entries (next to the Tescos in Upper St.; Harborne)
Expressed uncertainty (the question marks, alone or attached)
Additional comments (grr!, refused )
Exceptional entry types (ex-dir, and the contents of the chip-shop row)

Restrictions on Database Tables:


Overall Structure
Regular overall shape: rows all same length, similarly columns.
No division into different regions (with a certain exception).
No labels for rows, as opposed to columns.
Mostly no significance to the order of rows.
No additional comments, footnotes, etc.

Restrictions on Database Tables:


Nature of Entries
All cells in any one column are given the same intuitive interpretation.
Each cells item restricted to a pre-specified, usually fairly simple
value range (data type), and all cells in any given column restricted to
same data type.
No exceptional entries with one exception!:
empty entries

One data item per cell (but it can be a variable-length character


string, containing anything).
Uncertainty and vagueness markers not supported.

Extra, Crucial Restriction


(on the main tables)
No row can be repeated in a table. (I.e., no two rows can contain
exactly the same values.)
This is equivalent to saying:

Rows are uniquely determined (picked out) by the values in some set
of columns (possibly the whole set, but could be fewer).
That is, if you imagine some values for those columns, there is at
most one row that has exactly those values in those columns.

Table on next slide is closer to what


might be in a database

LAST Name

FIRST
Name

MI

ADDRESS

Home
Phone

Mobile

B year

B day

Porkypasta

Babloop

107 Worm Drive,


Hedgebarton,
Birmngham, B15
9ZZ

0121-9445677

07979888777

1969

Jan 11

OCrackpotham

Coriolanus

The Wellyboots,
Boring-underMosswood,
Berks, HP11
1XX

016-789997710

1999

May 20

Delfino

Johnny

Next to the
Tescos in Upper
Street

020-71112222

1957

June 1

Clinton

Hilary

The Old Black


House, 15768
Aplanalp St.,
Las Cruces, NM
880011, USA

0121-9545646

1997

Sep 16

07887842657

Summary
Database design defines the database structure
DBMS enforces data integrity and eliminates redundancy
Relational database rests heavily on associative linking rather than
pointing
DBMS imposes some restrictions Database Tables

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 2 (Wednesday)
Introduction to Database and Database Management System
Shereen Fouad
Teaching Fellow in School of Computer Science

Reminder of previous lecture


Database technology has several practical applications
Database is a collection of logically coherent interrelated data as well
as a description of this data.
Information is the result of processing data to reveal its meaning

Overview
More about Databases.
Database Management Systems.
Database Systems.
Types of Databases.
Problems with File System Data Management

A Closer Look to a Database Definition


A database is a structured body of information about entities of various
specific, precisely defined types.
Generally there are many entities of at least some of the types
The entities are generally in various specific types of relationship to each
other.
Each entity has a specific set of (intrinsic) attributes of interest. Their
values are generally of fairly basic, simple sorts (e.g., numbers, dates,
names).
The entities of a given type are typically not in any special order other than
an order arising naturally from their attributes.
4

A Closer Look to a Database Definition (cont.)


The individual data elements held are directly meaningful & interesting to
such users
The data held and retrieved is generally of exact form (no vagueness
expressed) and of definite form (no uncertainty expressed or expected).
The operations provided to users for extracting, inserting and updating
data are of conceptually straightforward sorts, not requiring elaborate
reasoning, problem-solving or analysis.
However, aggregate/overview/statistical information (counts, averages,
maxima, etc.) often needs to be computed from the data.
5

Types of Databases
Databases can be classified according to various aspects, for
example:

1. Number of users
Single-user database: supports only one user at a time
Desktop database
Multi-user database: supports multiple users at the same time
Workgroup database
Enterprise database
2. Database location(s)
Centralized database: data located at a single site
Distributed database: data distributed across several different sites
6

Centralized vs. Distributed database


client

client

Database

Database

Database
client

client

client

client
Database
7

Types of Databases (cont.)


3. Time sensitivity

Operational database: supports a companys day-to-day operations


Online Transaction Processing
Analytical database: stores data used for tactical or strategic decisions
Data warehouse
4. Type of data stored in
General purpose database
Discipline specific database

Can you think of examples here??


8

Database Management System (DBMS)


A Database Management System
(DBMS) is a software system
designed to:
Define and create the database
structure
Manage and manipulate data
Control access to database

DBMS is the intermediary between


the user and the database.
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10nd Ed.

Languages DBMS
The Data Definition Language (DDL)
used by Database Administrator (DBA)
used to describe/create external and logical schema

The Data Manipulation Language (DML)


used to retrieve, insert, delete and modify data
used interactively or embedded in a programming language

10

Database System
Organization of components
that control the collection,
storage, management and
use of data.
Five major parts of a database
system:

Hardware
Software
People
Procedures
Data

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10nd Ed.

11

The DBMS acts as an interface between


(A) Data and Databases

(B) Database Application and Database

(C) Database and SQL

(D) Database and Users

DML is provided for

(A) Description of logical structure of database.


(B) Addition of new structures in the database system.
(C) Manipulation & processing of database.

(D) Definition of physical structure of database system.


Which of the following are the properties of entities?
(A) Groups

(B) Table

(C) Attributes

(D) Switchboards

12

A step back in time: Files and File Systems


University File System
Departments

Files

Academic

Student ID
Student Name
Courses

Finance

Student ID
Student Name
Student fees

Student
Services

Student ID
Student Name
Accommodation No.

13

Problems with File System Data Management


1. Data Redundancy.
Replicating data in different places in a data repository.
(E.g. student data is replicated in several department files).
Data inconsistency: different and conflicting versions of same data
occur at different places
Data anomalies: abnormalities when all modifications/changes in
redundant data not made correctly
Update anomalies
Insertion anomalies
Deletion anomalies
14

Redundancy implies that if you want to modify/delete a student name, you need to:
know whether there is replication, or check for possible replications
go to the effort of repeating changes when the student name is replicated
avoid errors in such repeated changes.

University File System


Departments

Files

Academic

Student ID
Student Name
Courses

Finance

Student ID
Student Name
Student fees

Student
Services

Student ID
Student Name
Accommodation No.
15

Problems with File System Data Management


2. Structural and data dependence.
Unlike in databases which store data as well as metadata (catalog), file systems
store data only.
The structure of the data is stored in the application that access the file.
Structural dependence: changing the file structure requires changing the
application that access that file.
E.g. adding student DoB field.
Data dependence: data access changes when data storage characteristics change
E.g changing a data field from integer to character.
Structural and data dependence make file systems very difficult to manage - High
Maintenance.
16

Other problems
Poor design and lack of standardized data modeling

Security features difficult to program


Requires extensive programming to perform ad hoc queries
System administration complex and difficult
Difficult and expensive to integrate various applications.
Impossible to have multiple people or applications working on the same file.

17

Alternative solution:
Database System Application
Departments

Academic

Finance

Student
Services

DBMS

Database

Data
Metadata
18

Advantages of the DBMS


Improved data sharing
Improved data security
Better data integration
Minimized data inconsistency
Improved data access
Improved decision making
Increased end-user productivity

19

When would it make sense not to use a database system?

20

When would it make sense not to use a database system?


It depends on the data application at hand,
if
you are designing a small scale data application and you wont really suffer from the former
limitations
then
using a collection of files may be a better solution because of the increased cost and overhead of
purchasing and maintaining a DBMS.

21

Data independence means


(A) data is defined separately and not included in programs.

(B) programs are not dependent on the physical attributes of data.


(C) programs are not dependent on the logical attributes of data.
(D) both (B) and (C).

An advantage of the database management approach is


(A) data is dependent on programs.
(B) data redundancy increases.

(C) data is integrated and can be accessed by multiple programs.


(D) none of the above.
The language used in application programs to request data from the DBMS is referred to as the

(A) DML

(B) DDL

(C) VDL

(D) SDL
22

Summary
Data are usually stored in a database.
Databases can be classified to different types according to various
aspects.
DBMS implements a database and manages its contents
Database systems is the combination of database and DBMS.
File System Data Management suffers from several limitations when
compared to Database Systems.

23

Announcement
Week 2 exercise is now available on the module canvas website (Nontechnical) and (non - assessed)
Hand in Electronic copy via canvas (submission is optional if you are
seeking feedback)
Hand-out for getting started with PostgreSQL is now available on canvas

Lab sessions are starting this week (Thursday from 2pm to 5 pm in LG04)
Documents is now available on the module canvas website

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 3 (Friday)
Advanced SQL

Shereen Fouad
Teaching Fellow

School of Computer Science


University of Birmingham, UK

Simple Queries Review


READING A TABLE
SELECT * FROM EMPLOYEE;
SELECT FNAME, LNAME FROM EMPLOYEE;

Source of Image: Fundamentals of Database Systems (6th Edition)


by Ramez Elmasri and Shamkant B. Navathe

DISTINCT OUTPUT VALUES


SELECT DISTINCT SALARY FROM EMPLOYEE;

RENAMING ATTRIBUTES
SELECT DISTINCT SALARY AS "MONTHLY PAYMENT" FROM EMPLOYEE;
COMPUTED ATTRIBUTES

SELECT SALARY AS "USD", (SALARY*0.78) AS "EUROS" FROM EMPLOYEE


SIMPLE-COMPLEX CONDITIONS (AND, OR, NOT)
SELECT FNAME, LNAME, SUPERSSN FROM EMPLOYEE

WHERE DNO = 4 AND SEX = 'F' AND NOT (SUPERSSN= 123456789)

Simple Queries Review


PARTIAL MATCHING (LIKE % _)

SELECT FNAME, LNAME, ADDRESS FROM EMPLOYEE


WHERE ADDRESS LIKE '%TX%' AND FNAME LIKE '_A%'
SELECT FNAME, LNAME FROM EMPLOYEE WHERE LNAME LIKE 'W%'

Source of Image: Fundamentals of Database Systems (6th Edition)


by Ramez Elmasri and Shamkant B. Navathe

ASC-DESC ORDERING COMBINATIONS


SELECT FNAME, SALARY, DNO FROM EMPLOYEE ORDER BY SALARY
SELECT FNAME, SALARY, DNO FROM EMPLOYEE ORDER BY SALARY DESC

SELECT FNAME, SALARY, DNO FROM EMPLOYEE ORDER BY DNO ASC, SALARY DESC
CHECKING FOR NULLS
SELECT FNAME, LNAME FROM EMPLOYEE WHERE SUPER_SSN IS NULL;

BETWEEN
SELECT * FROM EMPLOYEE WHERE (SALARY BETWEEN 30000 AND 40000) ;

Overview
Aggregate Functions
GROUP BY and Having
Nested Query
Any and ALL
EXIST and NOT EXIST

Aggregate Functions
Summary information can easily be extracted from a table using one of the
operators COUNT, MAX, MIN, AVG, SUM,STDEV
Example:
Find the sum of the salaries of all employees, the maximum salary, the
minimum salary, and the average salary.
SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM EMPLOYEE;

Retrieve the total number of employees working in department number 5.


SELECT COUNT(*)
FROM EMPLOYEE
Where DNO=5;

Count the number of unique salary values in the database.


Can you work out the answer???

GROUP BY
Allows for categorical output.
Apply aggregate operators to each of several groups of tuples.
First select these rows
Syntax:
SELECT
FROM
[WHERE
[GROUP BY
[HAVING

columnlist
tablelist
conditionlist]
columnlist]
conditionlist];

GROUP BY
For each department, retrieve the department number, the number
of employees in the department, and their average salary.
SELECT Dno, COUNT (*), AVG (Salary)
FROM EMPLOYEE
GROUP BY Dno;

Note that: The GROUP BY clause


specifies the grouping attributes,
which should also appear in the
SELECT clause, so that the value
resulting from applying each
aggregate function to a group of
tuples appears along with the
value of the grouping attribute(s).

Source of Image: Fundamentals of Database Systems (6th Edition)


by Ramez Elmasri and Shamkant B. Navathe

Grouping Data on the fundamentals database


allmarks03

What is the average mark for students in each individual course.

Grouping Data on the fundamentals database


What is the average mark for students in each individual course.
SELECT bc AS "Course Code", AVG(mark)
AS "Average mark"
FROM allmarks03
allmarks03
GROUP BY bc;

GROUP BY and HAVING


What if we want to exclude all those courses from our summary table
which had fewer than 5 students enrolled in them?
The SQL HAVING Clause is used in combination with the GROUP BY
Clause to restrict the groups of returned rows to only those whose
the condition is TRUE.
SELECT bc AS "Course Code", AVG(mark) AS "Average
mark"
FROM allmarks03
GROUP BY bc
HAVING COUNT(*) >= 5;
Exclude all those courses from our summary table which had fewer than 5 students enrolled in them.

STUDENT
STUDENT
ADDRESS

COURSE
NO.

Chopples 24

42 George St.

Kent

Danial

16

56 Malcom
St.

24

E54321

Michal

Blurp

21

5 Bristol St.

12

E5099

Amber

Rumpel

25

45 Lime St.

24

E54344

Lea

John

20

6 Dan St.

STUDEN
T ID

F
NAME

L
NAME

E12345

John

E12367

AGE

Find the names and age of


the youngest student with
age 20, for each course with
at least 2 such course.

STUDENT
STUDEN
T ID

FNAME LNAME

AGE

STUDENT
ADDRESS

COURSE
_NO.

E12345

John

Chopples

24

42 George St.

E12367

Kent

Danial

16

56 Malcom St.

24

E54321

Michal

Blurp

21

5 Bristol St.

12

E5099

Amber

Rumpel

25

45 Lime St.

24

E54344

Lea

John

20

6 Dan St.

Find the names and age of


the youngest student with
age 20, for each course with
at least 2 such course.
SELECT FNAME, LNAME,MIN(AGE),
COURSE_No
FROM STUDENT
WHERE AGE >= 20
GROUP BY COURSE_No
HAVING COUNT (*) > 1

WHERE and HAVING


WHERE refers to the rows of tables, and so cannot use aggregate
functions
HAVING refers to the groups of rows, and so cannot use columns
which are not in the GROUP BY

Nested Query (Subquery)


Find the first name and age of the oldest employee??
EMPLOYEE
F_NAME

L_NAME

PHONE

EMPL. ID

AGE

SALARY

John

Chopples

0121-414-3816

E22561

37

23,000

Alex

Blurp

01600-719975

E85704

21

21,000

Anbreen

Rumpel

07970-852657

E22561

88

40,000

Nested Query (Subquery)


Find the first name and age of the oldest employee??
SELECT F_NAME, MAX(AGE)
FROM EMPLOYEE ;

What will the result be?


Result
F_NAME

AGE

John

88

Alex

88

Anbreen

88

Not legal syntax; no other columns allowed in SELECT clause


without a GROUP BY clause

Remember aggregate functions can only be used in the SELECT


clause or in a HAVING clause.

Nested Query (Subquery)


Find the first name and age of the oldest employee??
SELECT F_NAME, AGE
FROM EMPLOYEE
WHERE AGE =
(SELECT MAX (AGE)
FROM EMPLOYEE)

And then find the


employee(s) of that age
The inner query is executed first
Find the maximum age

In Nested Queries the WHERE clause can itself contain a SQL query!
Also FROM and HAVING clauses can too
The above subquery returns a single value

Nested Query (Subquery)


Often a subquery will return a set of values rather than a single
value
You cant directly compare a single value to a set
Options
IN NOT IN checks to see if a value is in the set
ALL/ANY - checks to see if a relationship holds for every/one member of the
set
EXISTS NOT EXIST checks to see if the set is empty or not

Nested Query (Subquery)


Find the first and last names of employees who has a registered phone numbers.
PHONE_NUMBERS

EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

PHONE_
ID

PHONE

TYPE

STATUS

798687

John

Chopples

E22561

37

0121-414-3816

office

OK

668768

Alex

Blurp

E85704

21

01600-719975

home

FAULT

978098

Anbreen

Rumpel

E22561

70

0121-440-5677

home

OK

07970-852657

mobile

UNPAID

Can you work out the answer ??

Nested Query (Subquery)


Find the first and last names of employees who has a registered phone numbers.
SELECT F_NAME, L_NAME
FROM EMPLOYEE
WHERE PHONE_ID IN
(SELECT PHONE
FROM PHONE_NUMBERS);

Result
F_NAME

L_NAME

John

Chopples

Alex

Blurp

Anbreen

Rumpel

Find the first and last names of employees who dont have a registered phone numbers.
SELECT F_NAME, L_NAME
FROM EMPLOYEE
WHERE PHONE_ID NOT IN
(SELECT PHONE
FROM PHONE_NUMBERS);

Result
NULL

ANY and ALL


ANY and ALL compare single value v to a set of values V.
ALL operator returns TRUE if the value v is equal to all values in the
set V.
ANY operator returns TRUE if the value v is equal to some value in the
set V and is hence equivalent to IN.
They are used with operators like >, >=, <, <=, and <>.

ALL
List the names of employees whose salary is greater than the salary of
all the employees in department 4:
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ALL
( SELECT Salary
FROM EMPLOYEE
WHERE Dno=4 );

Source of Image: Fundamentals of Database Systems (6th Edition) by Ramez Elmasri and Shamkant B. Navathe

ANY
Find the names of employees who earn more than someone else.
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ANY
( SELECT Salary
FROM EMPLOYEE);

Source of Image: Fundamentals of Database Systems (6th Edition) by Ramez Elmasri and Shamkant B. Navathe

EXISTS and NOT EXISTS


Used to check whether the result of a nested query is empty (contains
no tuples) or not.
The result of EXISTS is a Boolean value TRUE if the nested query result
contains at least one tuple, or FALSE if the nested query result
contains no tuples
SELECT <columns>
FROM <tables>
WHERE EXISTS <set>

SELECT <columns>
FROM <tables>
WHERE NOT EXISTS <set>

EXISTS and NOT EXISTS


Retrieve the names of employees who have no dependents.
SELECT Fname, Lname
FROM EMPLOYEE
WHERE NOT EXISTS
( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn );

Source of Image: Fundamentals of Database Systems (6th Edition) by Ramez Elmasri and Shamkant B. Navathe

References
Some of the SQL Examples presented in this lecture have been
obtained from the following text book:
Fundamentals of Database Systems (6th Edition)
by Ramez Elmasri and Shamkant B. Navathe

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 3 (Wednesday)
Introduction to SQL

Shereen Fouad
Teaching Fellow

Reminder of previous lecture


The importance of database Table Design.
Cross-references between places in a data repository (referential
integrity).

Associative linking versus pointing.

Remember:
Associative Linking
This is how the tables are linked.

Coordination between Tables


Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_
ID

AGE

798687

John

Chopples

E22561

37

668768

Alex

Blurp

E85704

21

978098

Anbreen

Rumpel

E22561

70

PHONE_
ID

PHONE

TYPE

STATUS

0121-414-3816

office

OK

01600-719975

home

FAULT

0121-440-5677

home

OK

07970-852657

mobile

UNPAID

EMPLOYER_ID

EMPLOYER

ADDRESS

NUM. EMPLS

SECTOR

E48693

BT

BT House, London,

1,234,5678

Private TCOM

E85704

Monmouth
School for Girls

Hereford Rd,
Monmouth,

245

Private 2E

E22561

University of
Birmingham

Edgbaston Park Rd,


.

4023

Public HE

Overview
The main Categories SQL commands
The basic DDL and MDL commands
How to use SQL to query a database for useful information

Introduction to Structured Query Language. (SQL)


SQL is a specially designed programming language for managing data
stored in a Relational Database Management System (RDBMS)
SQL functions fit into two broad categories:
The Data Definition Language (DDL):
used to describe/create database schema

The Data Manipulation Language (DML):


used for selecting, inserting, deleting and updating data items in a database

Data Definition Commands


CREATE
Creating a new database object. E.g. empty table of a particular shape (mainly, particular
column names and value-types for the columns)
DROP
Deleting an existing database object.
ALTER
Changing the shape of an existing database object (e.g., adding/deleting a column in table,
or changing the type of a column)
Rename
Giving a new name for an existing database object.
Referential integrity statements
Need to ensure consistency between related tables. E.g.:
Deletion of something in one table may require deletions from or other modifications to
other tables.

Data Manipulation Commands


INSERT
Adding a row or rows to a table
DELETE
Deleting a row or rows (question: how identified?)

UPDATE
Updating values in an individual cell (column specified by name; but how identify the row?)
SELECT
Retrieving values from an individual cell; doing calculations on them
Retrieving the values in the cells in some or all columns for some or all rows
Calculating statistics concerning values in particular columns across all rows, a subset of rows,
or several subsets of rows (count, max, min, average, standard deviation, )
Ordering rows in different ways in displays of a table.
COMMIT
Save a database transaction.
ROLLBACK
Rollback a database transaction.

Example Database
EMPLOYEE

PHONE_NUMBERS

Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

798687

John

Chopples

E22561

37

668768
978098

Alex
Anbreen

EMPLOYER

Blurp
Rumpel

4
2

E85704
E22561

21
70

PHONE_
ID

PHONE

TYPE

STATUS

0121-414-3816

office

OK

01600-719975

home

FAULT

0121-440-5677

home

OK

07970-852657

mobile

UNPAID

EMPLOYER_ID

EMPLOYER

ADDRESS

NUM. EMPLS

SECTOR

E48693

BT

BT House, London.

1,234,5678

Private TCOM

E85704

Monmouth
School for Girls

Hereford Rd,
Monmouth.

245

Private 2E

E22561

University of
Birmingham

Edgbaston Park Rd.

4023

Public HE

SELECT Queries

EMPLOYEE

Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

798687

John

Chopples

E22561

37

668768

Alex

Blurp

E85704

21

978098

Anbreen

Rumpel

E22561

88

SELECT
Used to list contents of table
Syntax:
SELECT column_list
FROM table_name;

Represents one or more attributes, separated by commas (projection)


One or more joined tables, separated by commas (selection)

Listing table rows:


Asterisk can be used as wildcard character to list all attributes (columns)
Example: find all employees:
SELECT *
FROM EMPLOYEE

or

SELECT *
FROM EMPLOYEE E

EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

798687

John

Chopples

E22561

37

668768

Alex

Blurp

E85704

21

978098

Anbreen

Rumpel

E22561

88

What if I want to ask the database to give us the number of records in


the Employee table?
SELECT count(*)
FROM EMPLOYEE

Result
3

Listing Unique Values


PHONE_NUMBERS

DISTINCT clause produces list of unique values


in a table
Example:
SELECT DISTINCT P.STATUS
FROM PHONE_NUMBERS P

Result
STATUS
OK
FAULT
UNPAID

PHONE

TYPE

STATUS

0121-414-3816

office

OK

01600-719975

home

FAULT

0121-440-5677

home

OK

07970-852657

mobile

UNPAID

Ordering a Listing
ORDER BY clause is used when listing order is important
ORDER BY clause

Used to sort output of SELECT statement


Can sort by one or more columns
Ascending (ASC) or descending order (DESC)
ASC is the default

Example:
SELECT *
FROM EMPLOYEE
ORDER BY AGE DESC;

Result
F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

SALARY

Anbreen

Rumpel

E22561

70

40,000

John

Chopples

E22561

37

23,000

Alex

Blurp

E85704

21

21,000

SELECT Queries
Fine-tune SELECT command by adding restrictions to search
criteria using:
Conditional restrictions
e.g. ,,, , etc.

Arithmetic operators
e.g. power operations, multiplications, divisions, additions and
subtractions

Logical operators
Searching data involves multiple conditions
e.g. AND, OR and NOT

Special operators
e.g. BETWEEN, IS NULL, LIKE, IN and EXIST

Conditional Restrictions
Add conditional restrictions to SELECT statement, using WHERE clause
Syntax:
SELECT columnlist
FROM tablelist
[ WHERE conditionlist ] ;

The WHERE clause is evaluated for each row in the table

Conditional Restrictions
EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

798687

John

Chopples

E22561

37

668768

Alex

Blurp

E85704

21

978098

Anbreen

Rumpel

E22561

70

Example: find all 70-year-old employees.


SELECT *
FROM EMPLOYEE
WHERE AGE=70;

SELECT *
FROM EMPLOYEE E
WHERE E.AGE=70;

or

Result

Aliases rename tables

EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID AGE

SALARY

978098

Anbreen

Rumpel

E22561

978098

70

How does a DBMS evaluate a query?


The system goes through the stored table line by line, checking
each time whether the age field matches exactly the value 70.
If it does, then employee fields are printed to the output;
if it doesnt, then that line is ignored and the search continues.
On the technical level, DBMSs employ all kinds of clever tricks
to speed up the search but the end result will be same.
Since there could be more than one member of staff whose
age is 70, it is possible that the system has to output many
employees.

Conditional Restrictions (cont.)


To find just names and phones, replace the first line:
SELECT F_NAME,L_NAME,PHONE_ID
FROM EMPLOYEE
WHERE AGE=70;

or

Result
F_NAME

L_NAME

PHONE_ID

Anbreen

Rumpel

SELECT F_NAME AS FIRST NAME,L_NAME AS LAST NAME,PHONE_ID


FROM EMPLOYEE
WHERE AGE=70;

Result
As keyword is used to put Aliases
(rename columns) in the result set

FIRST NAME

LAST NAME

PHONE_ID

Anbreen

Rumpel

Arithmetic Operators
EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

SALARY

798687

John

Chopples

E22561

37

25000

668768

Alex

Blurp

E85704

21

21000

978098

Anbreen

Rumpel

E22561

70

50000
Source of Image: Database Principles: Fundamentals of
Design, Implementation and Management, 2nd Ed.

SELECT E.L_NAME AS LAST NAME, E.AGE-5 AS NEW AGE


FROM
EMPLOYEE E
WHERE E.SALARY=E.AGE*1000;
Result
LAST NAME

NEW AGE

Blurp

16

Logical Operators: AND, OR, and NOT


PHONE_NUMBERS

SELECT P.PHONE,P.TYPE
FROM PHONE_NUMBERS P
WHERE P.TYPE=office AND P.STATUS=OK;

Result
PHONE

TYPE

0121-414-3816

office

PHONE

TYPE

STATUS

0121-414-3816

office

OK

01600-719975

home

FAULT

0121-440-5677

home

OK

07970-852657

mobile

UNPAID

Which of the following is correct:


(A) a SQL query automatically eliminates duplicates.
(B) SQL permits attribute names to be repeated in the same relation.

(C) a SQL query will not work if there are no indexes on the relations
(D) None of these
AS clause is used in SQL for

(A) Selection operation.

(B) Rename operation.

(C) Join operation.

(D) Projection operation.

Which of the following operation is used if we are interested in only certain columns of a table?

(A) PROJECTION

(B) SELECTION

(C) UNION

(D) JOIN

A file manipulation command that extracts some of the records from a file is called

(A) SELECT

(B) PROJECT

(C) JOIN

(D) PRODUCT

Special Operators
BETWEEN: checks whether attribute value is within a range
LIKE: checks whether attribute value matches given string pattern
IS NULL: checks whether attribute value is null
IN: checks whether attribute value matches any value within a value list
EXISTS: checks if subquery returns any rows

BETWEEN
EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

798687

John

Chopples

E22561

37

668768

Alex

Blurp

E85704

21

978098

Anbreen

Rumpel

E22561

70

SELECT *
FROM EMPLOYEE
WHERE AGE BETWEEN 20 AND 40;

Result
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

798687

John

Chopples

E22561

37

668768

Alex

Blurp

E85704

21

LIKE
EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

AGE

798687

John

Chopples

E22561

37

668768

Alex

Blurp

E85704

21

978098

Anbreen

Rumpel

E22561

70

SELECT F_NAME,L_NAME,AGE
FROM EMPLOYEE
WHERE F_NAME LIKE A%;
.%. is a wildcard for any substring (including the empty substring).

Result
F_NAME

L_NAME

AGE

Alex

Blurp

21

Anbreen

Rumpel

70

IS NULL
EMPLOYEE
Empl_ID

F_NAME

L_NAME

PHONE_ID

EMPLOYER_ID

SALARY

AGE

798687

John

Chopples

E22561

20000

37

668768

Alex

Blurp

E85704

978098

Anbreen

Rumpel

E22561

SELECT E.L_NAME AS LAST NAME"


FROM
EMPLOYEE E
WHERE E.SALARY IS NULL;
Result
LAST NAME

Blurp

21
25000

70

Special Operators
BETWEEN: checks whether attribute value is within a range
LIKE: checks whether attribute value matches given string pattern
IS NULL: checks whether attribute value is null
IN: checks whether attribute value matches any value within a value list
EXISTS: checks if subquery returns any rows

The remaining two operators will be discussed next lecture

Summary
SQL commands can be divided into two overall categories:
Data definition language commands
Data manipulation language commands

The basic DML commands:


SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK

SELECT statement is main data retrieval command in SQL

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 4 (Friday)
Relational Model, Keys and Integrity Rules
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Reminder of previous lecture


Database development comprises three main stages
Data modeling Improves the understanding of the organization for which the
database design is developed
Good design begins by identifying entities, attributes, and relationships
Entities are the main objects which data are to be collected and stored in a Table.
Attribute: describes a characteristic of an entity.
A relationship is an association between entities
Relationship
Connectivity (1:1, 1:M, M:N),
Cardinality (In a relationship from entity type A to entity type B, a minimum and a maximum
can be specified for the number of B entities for each A entity) and
Participation (optional or mandatory)

Overview
What business rules are and how they influence database design
The Evolution of Data Models
The relational database model offers a logical view of data
Database Keys

Superkey
Candidate key
Primary key
Foreign key

Database Integrity Rules

Business Rules
DB designer gains the main information about the organization which is considered
as the main blocks of building a data model.
Business Rules allow designer to:
understand the nature, role, and scope of data
understand business processes
develop appropriate relationship participation rules and constraints
Translating Business Rules into Data Model
Nouns translate into entities
Verbs into relationships among entities
Identify the relationship type and connectivity
The translation step should consider a comprehensive and unique object names.

The Development
of Data Models

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Relational Model
Implemented through the Relational
Data Management System (RDBMS)
Relational database model offers a
logical view of data.
Hides complexity represented in
hierarchal and network models from
the user
Entity is mapped to a relational table
Relational table stores collection of
related entities
Attributes is mapped to a column
table

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

The Entity Relationship Model


Entity Relational
database model
offers a conceptual
view of data.
Graphical
representations to
model database
components

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10nd Ed.

Keys
Each row in a table must be uniquely identifiable by a key

A superkey for a table is a collection of one or more attributes that determines all
the other attributes in the table, i.e. determines a whole row.
Trivially, the collection of all the attributes is a superkey.
A set of attributes in a relation is called a candidate key if, and only if,
Every tuple has a unique value for the set of attributes (uniqueness)
No proper subset of the set has the uniqueness property (minimality)
To determine what is a candidate key, use knowledge of the real world (what is
going to stay unique!)

Superkeys & Candidate Keys: Example


Candidate key: {STUDENT ID}; {FNAME, LNAME} looks acceptable but we may get
people with the same name
{STUDENT ID, FNAME}, {STUDENT ID, LNAME} and {STUDENT ID, FNAME, LNAME}
satisfy uniqueness, but are not minimal.
{FNAME} and {LNAME} do not give a unique identifier for each row
{STUDENT ID} will be the best candidate key.
STUDENT
STUDENT ID

F NAME

L NAME

E12345

John

Chopples

E12367

Kent

Danial

E54321

Michal

Blurp

E5099

Amber

Rumpel

E54344

Lea

John

Primary Keys
A primary key for a table (entity type) is a candidate key that the DB designer has
chosen as being the main way of uniquely identifying a row (entity).
Primary keys are the main way of identifying target entities in entity relationships,
e.g., the way to identify someones employing organization.
Cannot have null values (A null value is no value, it is NOT equal to a zero or a
blank space).
For efficiency (and correctness) reasons, the simpler that primary keys are, the
better.

Typical primary keys examples are Identity numbers (of people, companies,
products, courses, etc.), or combinations of them with one or two other
attributes.
Composite key: Composed of more than one attribute

Superkeys, Candidate Keys & Primary Keys

superkey

primary key

candidate
key

Functional dependence
Attribute B functionally dependent on A if all rows in table that agree in value for A
also agree in value for B

Keys role is based on determination


If you know the value of attribute A, you can determine the value of attribute B

E.g., the collection DAY-NUMBER, MONTH and YEAR specifying birthdate in a table about people could determine DAY-NAME,
We alternatively say that DAY-NAME is functionally dependent on DAYNUMBER, MONTH and YEAR.

Foreign Keys
Remember!! Relationships are represented by associative linking by means of shared
attributes
Standardly, a relationship is represented by means of Foreign keys.

Foreign key: an attribute whose values match primary key values in the related table
Referential integrity: a set of attributes in the first (referencing) relation is a Foreign
Key if its value always either
matches a Candidate Key value in the second (referenced) relation, or
is NULL

Primary & Foreign Keys Example

Primary keys are


underlined
Foreign keys are
in blue boarder

A key that is composed of more


than one attributes is known as
a Composite Key

Primary & Foreign Keys Example

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Is this redundancy?
Multiple occurrences of values not redundant when needed to
make the relationship work
Redundancy occurs only when there is unnecessary duplication
of attribute values
Foreign keys control typical data redundancies by using common
attributes shared by tables

In case of entity integrity, the primary key may be


(A) not Null
(B) Null
(C) both Null & not Null. (D) any value.
Key to represent relationship between tables is called
(A) Primary key
(B) Secondary Key
(C) Foreign Key
(D) None of these
An instance of relational schema R (A, B, C) has distinct values of A
including NULL values. Which one of the following is true?
(A) A is a candidate key (B) A is not a candidate key
(C) A is a primary Key
(D) Both (A) and (C)

Referential Integrity
When relations are updated, referential integrity can be violated
This usually occurs when a referenced tuple is updated or deleted
There are a number of options:
RESTRICT - stop the user from doing it
CASCADE - let the changes flow on
NULLIFY - make values NULL

Referential Integrity - Example


What happens if Administration Dnumber is changed to 3 in DEPARTMENT?
The entry for Research is deleted from DEPARTMENT?

Integrity Rules
Many RDBMs enforce integrity rules automatically

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Summary
Relational database model offers a logical view of data.
Keys are central to the use of relational tables
Keys define functional dependencies
Each table row must have a primary key that uniquely identifies all
attributes
Tables linked by common attributes (foreign keys)

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 4 (Wednesday)
Database Modeling
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Overview
Stages of Database development
Understand the basic data modeling concepts and importance
What are the basic data-modeling building blocks?

Entities and Entity sets


Attribute, Attributes Domains and Attributes Determination
Relationship Connectivity, cardinality and participation
Constraint

Stages of Database Development


1. Requirement Analysis Stage
Understand the problems of organization in order to provide solutions
Sources of requirements include forms, interviews, reports, use case,
observations and business rules

2. Design Stage
Requirements information is processed into a data model (database design)

3. Implementation Stage
Physical implementation of the developed database design into a real world
database application

Data Model
Data model is the collection of concepts that can be used to describe
the structure/design of the database.
Designers, programmers, and end users see data in different ways
Different views of same data lead to designs that do not correctly
present organizations operation
Data modeling reduces complexities of database design and organizes
data for various users
It Improves the understanding of the organization for which the
database design is developed
Entity Relationship Model is the most successful database model

Data Model (cont.)


Data modeling is iterative and progressive process
Serves as a communications tool to facilitate the interaction among:

designer

application
programmer

end user

Data Model Basic Building Blocks


Entity
Attribute
Relationship

Entities
Entities: are real-world objects, distinct from other objects, for which
we intend to collect data (e.g. person, place, event)
Entities are just things which data are to be collected and stored in a
Table.
A row in a table corresponds to an entity instance.
Entity Set: a group of entities of the same type, e.g., all employees.
Examples of database entities in a company business environment.
Employee

Department

What else??

Attributes
Attribute: describes a characteristic of an entity.
Each Attributes has a data type and other properties
Attributes of entities of a given type are the names of the different
pieces of information that need to be stored for entities of that type.
Attributes just the column names for the table for the entity type.
E.g., entities of the type Employee could have the following attributes:
Employee ID number, last name, first name, phone number, ageetc.

Attributes have a domain -- the attributes set of possible values.


Each tuple assigns a value to each attribute from its domain

The ______ operator is used to compare a value to a list of literals values that have been specified.
(A) BETWEEN

(B) ANY

(C) IN

(D) ALL

A set of possible data values is called


(A) attribute.

(B) degree.

(C) tuple.

(D) domain.

Which of the following is a legal expression in SQL?


(A) SELECT NULL FROM EMPLOYEE;
(B) SELECT NAME FROM EMPLOYEE;

(C) SELECT NAME FROM EMPLOYEE WHERE SALARY = NULL;


(D) None of the above
Which of the following are the properties of entities?

(A) Groups

(B) Table

(C) Attributes

(D) Switchboards

Relationship
A relationship is an association between entities, e.g.:
An employee works in a single department
A department employs several employees
Note that they mostly described as verbs.
Relationship Set: Collection of similar relationships.
Same entity set can participate in different relationship sets.

Relationship Connectivity
Relationships are importantly categorized as to uniqueness or multiplicity of
entities at either end connectivity.
Has a big effect on DB design.

Enrolls

Student

(M:N) A student may be enrolled in more than one


class (or none) and a class enrols more than one
student.

Many-to-Many relationship
1

Teaches

Professor

(1:M) A professor teaches more than one class (or


none) and a class is taught by at most one
professor.

Class

One-to-Many relationship
1

(1:1) Each student has at most one graduation


report and each graduation report is provided to at
most one student.

Class

Student

Has
One-to-One relationship

Graduation
Report

Relationship Cardinality
Relationships can be further specified as to how many entities allowed or
required at either end cardinality.
Has significant effect on DB design.

This is determined by an organizations business policy.


In a relationship from entity type A to entity type B, a minimum and a
maximum can be specified for the number of B entities for each A entity.
Example:
1

Professor

(0,3)

Teaches

(1,1)

One-to-Many relationship

Class

Student

(1,6)

Enrolls in

(5,35)

Many-to-Many relationship

Class

Relationship Participation
Optional [in a particular direction, X to Y]:

an X entity does not require a corresponding Y entity occurrence


i.e. the minimum number of Ys per X is 0
E.g. Class is optional to Professor, every Professor may or may not teach a
course
Mandatory [in a particular direction, X to Y]:
an X entity requires a corresponding Y entity occurrence
i.e. the minimum number of Ys per X is 1 or more
E.g. Professor is mandatory to Class, every Class must have a Professor
assigned to it.
Relationship participation depends on the business rule of the organization.

Employee Department Relationship Example

Each employee works in single department and each


department employs several employees.
Relationships are represented by associative linking by means
of shared attributes

Summary
Database development comprises of three main stages
Data modeling Improves the understanding of the organization for
which the database design is developed
Entity, Attribute and Relationship are the main blocks for generating a
database model

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 5 (Friday)
Conceptual Data Model (Part 2)
Entity Relationship Diagrams (ERD)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Reminder of Previous Lecture


The Conceptual Database Model
The Entity Relationship Diagram (ERD) model
The main characteristics and notations of entity
relationship components
Classes of Attributes

Identifier attributes
Simple versus Composite Attribute
Single-Valued versus Multivalued Attribute
Derived Attribute

Phases of Database Design


Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design

Conceptual design begins with the collection of


requirements and results needed from the
database. It is high level description (often done
with Entity Relationship (ER) model)
Logical schema is a description of the structure of
the database (Relational, Network, etc.)
translate ERD into DBMS data model
Schema Refinement consistency, normalization

Logical Schema
Physical
Design
Physical Schema

Physical schema is a description of the


implementation (programs, tables, dictionaries
and catalogs)

Overview
How relationships between entities are defined and
graphically presented
Relationship Connectivity in ERD
Relationships Cardinality in an ERD
Relationship Participation in ERD
Relationship Degree in ERD
Weak Entities in ERD
Associative (Composite) Entities
Example of ERD that represents a business situation

Types of Relationship Connectivity


A relationship is an association between entities
how many entities allowed or required at either end
cardinality.
Established by business rules

Many-to-Many

1-to-Many

1-to-1

Relationships Connectivity in an ERD

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Relationships Cardinality in an ERD


Cardinality means count, and is expressed as a number (Min, Max)
Maximum cardinality is the maximum number of entity instances that
can participate in a relationship. [1 or M]
Minimum cardinality is the minimum number of entity instances that
must participate in a relationship. [1 or 0]
Established by business rules

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Relationship Participation in ERD


Optional participation
One entity occurrence does not require corresponding entity occurrence in
particular relationship
As shown in the below examples Minimum cardinality of zero [0] indicating
optional participation is indicated by placing an oval next to the optional entity.
Mandatory participation
One entity occurrence requires corresponding entity occurrence in particular
relationship
As shown in the below examples Minimum cardinality of one [1] indicating
mandatory (required) participation and it is not indicated by the ERD Chen
Model.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Relationship Degree in ERD


Indicates number of entities or participants associated with a
relationship
Unary relationship (degree =1)
Association is maintained within single entity
Binary relationship (degree =2)
Two entities are associated
Ternary relationship (degree =3)
Three entities are associated

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Weak Entities in ERD


Weak entity meets two conditions
Existence-dependent, i.e. Entity exists in database only when it is
associated with another related entity occurrence
Primary key partially or totally derived from parent entity in
relationship

Database designer determines whether an entity is weak


based on business rules

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Associative (Composite) Entities


Also known as bridge entities
Used to implement M:N relationships
Composed of primary keys of each of the entities to be
connected
May also contain additional attributes that play no role in
connective process

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

The Chen Representation of the Invoicing Problem

Bridging entity types are weak, but this is not normally shown

Create an ERD that represents this


business situation
Consider a database that is to represent a large business. In this
typical business, there is a Division that operates several Departments.
A Division is described by the name of its business sector. The Division
is run by one Employee and each Department is managed by one
Employee. The database needs to keep track of Employee (ID, First
name, Last name, Salary, title and data of birth). It also wants to keep
track of Department name and location. Of course the Department
employs many Employees who work on projects that are assigned to
them. Each Project has a certain budget. Everyone needs to be busy,
so it is not uncommon for an Employee to be assigned many Projects
and a Project may have many Employees assigned to it. However, we
need to keep track of the employee working hours in each project.
There is a special case of Employees that are not assigned to any
Department; they roam around looking for work from the various
Departments.

Steps to Complete an ERD


Step 1) Business Rules
Step 2) Listing Entities and Attributes (considering
the attribute class)
Step 3) Simple ERDs with relations(considering
Connectivities and Cardinalities and Participation)
Step 4) The Complete ERD

Can You Spot Entities and their


Attributes??

Create an ERD that represents this


business situation
Consider a database that is to represent a large business. In this
typical business, there is a Division that operates several
Departments. A Division is described by the name of its business
sector. The Division is run by one Employee and each Department is
managed by one Employee. The database needs to keep track of
Employee (ID, First name, Last name, Salary, title and data of birth). It
also wants to keep track of Department name and location. Of course
the Department employs many Employees who work on projects that
are assigned to them. Each Project has a certain budget. Everyone
needs to be busy, so it is not uncommon for an Employee to be
assigned many Projects and a Project may have many Employees
assigned to it. However, we need to keep track of the employee
working hours in each project. There is a special case of Employees
that are not assigned to any Department; they roam around looking
for work from the various Departments.

STEP 1) Identify the Business Rules


A department employs many employees, but each employee
is employed by one department.
Some employees, known as "rovers," are not assigned to any
department.
A division operates many departments, but each department
is operated by one division
An employee may be assigned to many projects and a project
may have many employees assigned to it.
A project must have at least one employee assigned to it.
One of the employees manages each department.
One of the employees runs each division.

Step 2) Make a list of the Entities and their


Attributes
Entity: EMPLOYEE
Attributes (ID, First name, Last name, Salary, title and
data of birth)

Entity: DIVISION
Attributes DIVISION ID, business sector name

Entity: DEPARTMENT
Attributes Department ID, Department name and
location

Entity: PROJECT
Attributes Project ID, Project name and Project
Budget.

Step 3) List ALL simple Relations


[DIVISION] 1

<operates>

M [DEPARTMENT]

[EMPLOYEE] 1

<runs>

1 [DIVISION]

[EMPLOYEE] 1

<manages>

1 [DEPARTMENT]

[EMPLOYEE] N

<assigned>

[DEPARTMENT] 1 <employs>

M [PROJECT]
M [EMPLOYEE]

Connectivities and Cardinalities and


Participation
My procedure fro determining the cardinality:
A DIVISION will operate a minimum of ____1____ DEPARTMENT
A DIVISION will operate a maximum of ____N____ DEPARTMENTs
Then reverse the order:
A DEPARTMENT is operated by a minimum of ___1_____ DIVISIONs
A DEPARTMENT is operated by a maximum of ____1____ DIVISIONs
Putting this information together you get:

ERD
BSName

ID
1

ID

M
operates

DIVISION

1
DEPARTMENT

(1,1)

(1,N)
1

Name

Location

(1,1)

(1,1)
(1,N)

Fname

Lname

employs

manages

Name
1
runs

(0,1)

EMPLOYEE
(0,1)

ID

(0,1)
(0,N)
(1,1)

1
M

DoB

ASSIGN
(1,1)
(1,N)

ID
Name
Budget

PROJECT

Title

M
1

Salary

One thing is missing!!


Where do I put the working hours??

ERD
BSName

ID

Location
1

1
DEPARTMENT

(1,1)

(1,N)
1

ID

M
operates

DIVISION

Name

(1,1)

(1,1)
(1,N)

Fname

Lname

employs

manages

Name
1
runs

(0,1)

EMPLOYEE
(0,1)

ID
Working_Hours

(0,1)
(0,N)
(1,1)

1
M

DoB

ASSIGN
(1,1)
(1,N)

ID
Name
Budget

PROJECT

Title

M
1

Salary

Announcements
Next Friday lecture 7th of Nov will be 13:00
14:00 in Main Lecture Theatre Arts.
Next week there will be no lab session as you
will have a non-technical exercise.

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 5 (Wednesday)
Conceptual Data Model
Entity Relationship Diagrams (ERD)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Reminder of previous lecture


Superkey
Any key (set of attributes) that uniquely identifies each row

Candidate key
A superkey without unnecessary attributes

Primary Key
A candidate key selected to uniquely identify all other attributes and
cant contain Null entries.

Foreign key (FK)


An attribute whose values match primary key values in the related
table

Composite key
Composed of more than one key attributes

Overview
The Conceptual Database Model
The Entity Relationship Diagram (ERD) model
The main characteristics and notations of entity
relationship components
Classes of Attributes

Identifier attributes
Simple versus Composite Attribute
Single-Valued versus Multivalued Attribute
Derived Attribute

Phases of Database Design


Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design

Conceptual design begins with the collection of


requirements and results needed from the
database. It is high level description (often done
with Entity Relationship Diagram (ERD))
Logical schema is a description of the structure of
the database (Relational, Network, etc.)
translate ERD into DBMS data model
Schema Refinement consistency, normalization

Logical Schema
Physical
Design
Physical Schema

Physical schema is a description of the


implementation (programs, tables, dictionaries
and catalogs)

Phases of Database Design


Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design

Conceptual design begins with the collection of


requirements and results needed from the
database. It is high level description (often done
with Entity Relationship Diagram (ERD))
Logical schema is a description of the structure of
the database (Relational, Network, etc.)
translate ERD into DBMS data model
Schema Refinement consistency, normalization

Logical Schema
Physical
Design
Physical Schema

Physical schema is a description of the


implementation (programs, tables, dictionaries
and catalogs)

The Entity Relationship Model


Introduced by Chen in 1976
Most widely used conceptual model of DBs.
Graphical representation of entities, attributes and the
relationships
among
entities
in
a
database
structure(depending on the diagram style) varying amounts of
other info such as connectivities, cardinalities, keys,
weakness,
An ER model of an environment forms the basis of an ER
diagram (ERD) or several ERDs.
Diagrams based on the/a model are a widely accepted and
adopted graphical approach to database design.

Quick Flavour of Two Styles of Entity Relationship


Diagram (ERD)
There are several markedly different styles of ERD, and for each main style there
are several variants.
In this module we will focus only on the Chen Model Style.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Entities and Attributes Notation in


Chen Model
Entity represented by rectangle with entitys name
Entity name, a noun, written in capital letters
Attributes represented by ovals connected to entity rectangle
with a line
Each oval contains the name of attribute it represents

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Identifier Attributes
Identifier Attributes (primary Key) is underlined
In the below example Ssn (Social Security number) is underlined
as it represent the identifier attribute (primary Key)

Simple versus Composite Attribute


A simple attribute cannot be subdivided:
e.g. employee has simple attributes like Salary, Gender, and Department.

A composite attribute can be subdivided to further additional


attributes.
e.g. :Name First name, Middle Initials, Last name

Simple and Single valued attribute


Composite attribute

Single-Valued versus Multivalued Attribute


A single-valued attribute can have only a single value.
e.g. : a car can have only one car year.
A multivalued attribute can have many values.
e.g. : a car may have several body parts colors (top color,
body color..etc)
Multivalued attributes are shown in ER diagram by a double
line connecting to the entity

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Multivalued Attribute

One (usually poor) possibility: Use a variable-length string for


the attribute, and list all the values within the string.

Disadvantage: little support supplied by the DBMS


insertions and deletion require special extra programming.

Multivalued Attribute
Another possibility: Within original entity type, split the
attribute into several different attributes corresponding to
different natural components of the entity.
Disadvantages: The attribute
may in reality need to be split
differently for different entities
in the entity type (e.g. different
cars).
The attribute may not have
naturally namable aspects at
all. E.g., imagine blotches of
color in random places on a car.
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Multivalued Attribute
Another possibility: Within original entity type, split the
attribute into several different attributes not corresponding
to specific components of the entity.
E.g., have attributes called Colour1, Colour2, , Colour6.
Advantage: copes with the no-identifiable-components problem and
the different-split problems.
Disadvantages:
Have to set aside enough columns to accommodate the conceivable
max, but if this max is large and not often approached then have a lot
of wasted space.

Searching for a colour, or doing insertions and deletions, can be very


cumbersome.

Multivalued Attribute
Often Better: Replace the attribute by a new 1:M relationship to a new
entity type holding the original attributes data.
If the components of the original attribute are conceptually
distinguishable in a natural way, the new entity can have an attribute
whose values identify those components.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Derived Attribute
A derived attribute its value is computed
from other attributes.
It is indicated in ER diagram using a
dotted line connecting the attribute with
the entity.
e.g.: employee age can be calculated from
the date of birth and current date.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

What do you recommend???

Announcements
Next lecture Friday 31st of October (ONLY in
week 5) will be 13:00 14:00 in 101, Haworth
(Y2 in Edgbaston Campus Map ).
The following Friday lectures (week 6-11) will
be 13:00 14:00 in Main Lecture Theatre
Arts.
This week (week 5) Hand-out and exercise has
been released on canvas.

Summary
Uses ERD to represent conceptual database as
viewed by end user
ERMs main components:
Entities
Relationships
Attributes

Classes of Attributes include

Identifier attributes
Simple versus Composite Attribute
Single-Valued versus Multivalued Attribute
Derived Attribute

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 6 (Friday)
Logical Data Model (Mapping E/R design to relational schema) Part 2
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Announcements
So far we have covered most of Chapters (1, 3, 4, 5 and 7) in the
reference text book
C. Coronel, S. Morris, P. Rob & K. Crockett,
Database Principles: Fundamentals of Design, Implementation and
Management, 10th Edition, 2013.
Next week we will start Chapter 8.

Reminder of Previous Lecture


Mapping E/R design to relational schema
Mapping entity sets
Mapping weak entity sets
Mapping Multivalued Attribute
Mapping relationship sets into the database relational schema
1:M Relationships
1:1 Relationships
N:M Relationships
Strong versus weak Relationships
Strong versus weak Entities

Phases of Database Design


Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema

Conceptual design: begins with the collection of


requirements and results needed from the database. It is
high level description (often done with Entity
Relationship (ER) model). DBMS independent.
Logical design: description of the structure of the
Relational database map ERD into relational data model.
Closer to the actual implementation. DBMS specific.
Schema Refinement consistency, normalization

Physical schema is a description of the implementation


(programs, tables, dictionaries and catalogs)

Relations, Entities, Tables


E/R Diagram

Relational model

SQL

Entity
Instance
Attribute
Relationship
(1:M,1:1,M:N)
Identifying Attribute

Relation
Tuple
Attribute
Foreign Key

Table
Row
Column or Field
Foreign Key

Primary Key

Primary Key

Overview
Relationship Degree (Revised)
What is the recursive Symmetry Relationships
Implementation of the non-symmetric 1:M recursive relationship
Implementation of the non-symmetric N:M recursive relationship
Implementation of the symmetric 1:1 recursive relationship & nonredundant implementations
The problem of Symmetry
Redundant Relationships

Relationship Degree (Revised)


The number of entities that are joining in the relationship indicates a
relationships degree.
A unary (recursive) relationship: a single entity
association

A binary relationship: two entities


association (most common)

Employee

Manages

Works

Employee

Customer

Department

Issue invoice

A ternary relationship: three entities association


Item

Employee

Relationship Degree (Revised)

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Mapping of Ternary Relationship


Mapping Ternary (and nary) Relationships
One relation for each
entity and one for the
associative entity

Associative entity has


foreign keys to each
entity in the relationship

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Tables for a Ternary Relationship

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

CFR is just like the bridging entity types youve seen before,
but has 3 links to other types instead of 2

Tables for a Ternary Relationship

Unary (Recursive) Relationships


A recursive relationship links entities of the same type.
E.g.: marriage, management, parthood,

Can have partial recursion: just some of the entity types involved in a
relationship could be the same.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Recursive Relationships: Symmetry


A relationship R between entity types E,F (possibly the same) is symmetric iff:

if eRf then fRe (i.e., IF R relates entity e of type E to entity f of type F, then it
must ALSO relate f to e.)
E.g.: marriage, being-sibling-of.
Recursive relationships cause major redundancy problems when ALSO
symmetric.

Symmetry only makes sense in the 1:1 and M:N cases.


((Can generalize the points to partly-recursive cases.))

(necessarily non-symmetric) 1:M recursive: EMPLOYEE


Manages EMPLOYEE

Mapping Unary (1:M)


Relationships - Recursive
foreign key in the same
relation

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Just a standard 1:M implementation except linking a table to


itself.
No redundancy problem.

non-symmetric M:N recursive:


PART Contains PART
Mapping Unary
(M:N) Relationships
Two relations:
- One for the
entity type
- One for an
associative
relation in which
the
primary key has two
attributes, both
taken
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.
from the primary
The COMPONENT entity type is just a bridging type, linking PART to
key of the entity

itself. NB: its first two columns both refer to PARTs PK but must be
differently named.
No redundancy problem.

A primary key if combined with a foreign key creates


(A) Parent-Child relationship between the tables that connect them.
(B) Many to many relationship between the tables that connect them.
(C) Network model between the tables that connect them.
(D) None of the above
Mapping Unary (1:M) Relationships
(A) foreign key in the same relation

(B) foreign key in an associative relation

(C) foreign key in both (the same and an associative relation)

(D) no foreign key is required

Mapping Unary (N:M) Relationships


(A) foreign key in the same relation
(B) foreign key in the associative relation
(C) foreign key in both (the same and an associative relation)
(D) None of the above.

symmetric (1:1) recursive relationship: EMPLOYEE


Married to EMPLOYEE
Suppose you tried the following:
Mapping Unary (1:1)
Relationships - Recursive
foreign key in the same
relation
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Redundancy problem!!

Symmetry is the Problem


A non-symmetric 1-1 relationship would not have the problem
shown on previous slide.
A symmetric M:N relationship would have a redundancy problem,
whether implemented as in the 1-1 case or by a bridging table.
E.g.: being-sibling-of.

symmetric (1:1) recursive relationship: redundant &


non-redundant implementations

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

1) As previouslyredundant .
2) MARRIED_V1 is just a bridging entity type: still redundant.
3) MARRIAGE together with MARPART act as a sort of bridge. Non-redundant.

Symmetric M:N, etc.


Method 3 on previous slide can straightforwardly be generalized to:
symmetric recursive M:N relationships

Redundant Relationships
Occur when there are multiple relationship paths between related
entities
Main concern is that redundant relationships remain consistent
across model

Summary: Creating ERMs/ERDs


Designing an ER model for a database is an iterative process, because, e.g.:
As you proceed, you think of new ways of conceiving whats going on (much
as in ordinary programming)
Multivalued attributes need to be re-represented eventually
M:N relationships can be included as such at an early stage, but usually need
to be replaced by means of bridging entity types later
Implementation of 1:1 relationships varies deepening on the relationship
participation
1:1 relationships or N:M Symmetric recursive relationships usually need
special handling.
Weak entities usually need special handling.

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 6 (Wednesday)
Logical Data Model ( Mapping E/R design to relational schema) Part 1
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Announcements
This week there will be no lab session.
Assignment 5 will be conceptual (designing an ERD) and it won't
involve any practical work.
Assignment 5 (unassessed) has been released on canvas.
In week 8 you will have an one line test on canvas
It will become available on Friday 21/11/2014 (on canvas) accounting
for 10% of the module mark. (The exact time will be announced on
canvas soon)
Once you start you have 60 minutes to complete it.

Reminder of Previous Lecture


How relationships between entities (in both directions) are defined
and graphically presented in ERD,
Relationship Connectivity in ERD
Relationships Cardinality in an ERD
Relationship Participation in ERD
Relationship Degree in ERD
Weak Entities in ERD
Associative (Composite) Entities in ERD
Example of ERD that represents a business situation

Phases of Database Design


Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema

Conceptual design: begins with the collection of


requirements and results needed from the database. It is
high level description (often done with Entity
Relationship (ER) model). DBMS independent.
Logical design: description of the structure of the
Relational database map ERD into relational data model.
Closer to the actual implementation. DBMS specific.
Schema Refinement consistency, normalization

Physical schema is a description of the implementation


(programs, tables, dictionaries and catalogs)

Overview
Mapping E/R design to relational schema
Mapping entity sets
Mapping weak entity sets
Mapping Multivalued Attribute
Mapping relationship sets
1:M Relationships
1:1 Relationships
N:M Relationships
Strong versus weak Relationships
Strong versus weak Entities

Logical Design
Logical design translates the conceptual design (ER mode) into the
internal model (relational schema) for a selected DBMS.
E/R Diagram
Entity

Relational model
Relation

SQL
Table

Instance
Attribute
Relationship
(1:M,1:1,M:N)
Identifying Attribute

Tuple
Attribute
Foreign Key

Row
Column or Field
Foreign Key

Primary Key

Primary Key

Mapping entity sets


An entity set translates directly to a table
Attributes columns
Key attributes key columns

Mapping weak entity sets


Weak Entities Becomes a separate relation with a foreign key taken from the
strong entity
Primary key composed of:
Partial identifier of weak entity
Primary key of identifying relation (strong entity)relationship

Primary keys are underlined

Foreign Keys are circled in red


Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Mapping Multivalued Attribute


Multi-valued Attribute - Becomes a separate
relation with a foreign key taken from the
superior entity

Primary keys are underlined


Foreign Keys are circled in red
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Mapping weak entity sets

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

EMPLOYEE (EMP_NUM,EMP_LNAME,EMP_FNAME, ..)


DEPENDANT(EMP_NUM, DEP_NUM, DEP_FNAME, DEP_DOB)

Logical Design
E/R Diagram

Relational model

SQL

Entity
Instance
Attribute
Relationship
(1:M,1:1,M:N)
Identifying Attribute

Relation
Tuple
Attribute
Foreign Key

Table
Row
Column or Field
Foreign Key

Primary Key

Primary Key

Mapping 1:M Relationships


Connectivity of
the relationship
set determines
the key of the
table
Primary key on
the one side
becomes a foreign
key on the many
side

Parent

Child

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Example 1:M Relationship


EMPLOYEE
Emp-ID

NAME

PHONE

ORG_ID

AGE

9568876

Chopples

0121-414-3816

E22561

37

2544799

Blurp

01600-719975

E85704

21

1698674

Rumpel

07970-852657

E22561

88

1800748

Dunston

0121-414-3886

E22561

29

ORGANIZATION

employs

EMPLOYEE

One-to-Many relationship

ORGANIZATION
ORG_ID

EMPL NAME

ADDRESS

NUM EMPLS

SECTOR

E48693

BT

BT House,
London,

1,234,5678

Private TCOM

E85704

Monmouth
School

Hereford Rd,
Monmouth,

245

Private 2E

University of
Birmingham

Edgbaston Park
Rd, .

3023

E22561

Each Organization employs many


Employees.

More than one employee allowed


per organization, but no more than
one employer per person.
Primary keys are underlined

Public HE

Foreign Keys are denoted in red

Mapping 1:1 Relationships


Primary key on the mandatory side becomes a foreign key on the optional side.
If both sides of relation are optional, it doesnt matter which table receives the
foreign key.
1:1: that is, no more than
1
1
one phone allowed per
Has
PEOPLE
PHONES
person, and vice versa.
PEOPLE

PHONES

PERS-ID

NAME

EMPL ID

AGE

PHONE

TYPE

PERS-ID

STATUS

9568876

Chopples

E22561

37

0121-414-3816

office

9568876

OK

2544799

Blurp

E85704

21

01600-719975

home

5099235

FAULT

1698674

Rumpel

E22561

88

0121-440-5677

home

1698674

OK

5099235

Biggles

E22561

29

07970-852657

mobile

2544799

UNPAID

Mapping M:N Relationships

Consider the shown relationship


IF we represent M:N connectivity in a similar way to 1:M, then we can expect that
in the STUDENT table: some students will each have several classes listed

or in the CLASS table: some classes will each have several students listed
or both.

This is a problem. Why?

The Problem with M:N Relationship

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Example M:N Relationship


Because of this problem, an M:N
relationship is usually broken up into
two 1:M relationships.
This means introducing an extra
bridging or linking or composite
entity type (hence table) to stand
between the two original ones.
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

The composite entity ENROLL has a


primary key composed of the
primary keys of two entities
STUDENT and CLASS.

Example M:N Relationship

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

An entity set that does not have sufficient attributes to form a primary key is a
(A) strong entity set.

(B) weak entity set.

(C) simple entity set.


A logical schema

(D) primary entity set.

(A) is the entire database.


accessible parts.

(B) is a standard way of organizing information into

(C) describes how data is actually stored on disk.


(D) both (A) and (C)
E-R model uses this symbol to represent weak entity set ?
(A) Dotted rectangle.

(B) Diamond

(C) Doubly outlined rectangle


The conceptual model is

(D) None of these

(A) dependent on hardware.

(B) dependent on software.

(C) dependent on both hardware and software . (D) independent of both hardware and software.

Strong Relationships
Strong (identifying) relationships
Exists when PK of related entity contains PK component of parent entity
A relationship from entity type A to entity type B, mediated by having As primary key
(PK) as a foreign key in B, is strong when Bs PK contains As PK.
Includes the case of Bs PK just being the same as As PK.
E.g., A = Customers, B = Dependants, where
As PK is: CUST_ID

Bs PK is: CUST_ID, FIRST_NAME, CONNECTION.


So a PK value in B could be (1698674, Mary, child) , meaning that this entity is the child called Mary
of person 1698674 in the Customer table.

Dependants is weak entitity, because there is a strong relationship to it from Customers, and
Dependants is existence-dependent on Customers via this relationship.

Strong Relationship
CUSTOMERS (the A type)
CUST-ID

NAME

PHONE

EMPL ID

AGE

9568876

Chopples

0121-414-3816

E22561

37

2544799

Blurp

01600-719975

E85704

21

1698674

Rumpel

07970-852657

E22561

88

1800748

Dunston

0121-414-3886

E22561

29

Strong relationship going from


A to B
(we could say: B is strongly
dependent on A)

DEPENDANTS (the B type)


CUST-ID

FIRST NAME

CONNECTION

LIVES_WITH

2544799

John

civil partner

TRUE

1698674

Mary

child

FALSE

1698674

Mary

spouse

FALSE

1698674

David

child

TRUE

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Weak (or Non-Identifying) Relationships


Exists if PK of related entity does not contain PK component of parent entity
A relationship is weak when it isnt strong!
So, most relationships are weak.
Note that strength/weakness is directional: the People to Dependants
relationship (above) is strong, but the Dependants to People relationship is weak.

Strong Entity Types


A strong entity type is one that is not weak! .
So, in particular, any entity type that receives only weak relationships from other
entity types is strong.

So the usual case is for an entity type to be strong.


And any entity type that is not existence-dependent on anything is strong.

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 7 (Wednesday)
SQL Data Definition
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Reminder of Previous Lecture


About the extended entity relationship (EER) models main constructs
Supertype and subtype relationships
Why and When to Consider Supertypes and Subtypes?
Relationships and Subtypes
Generalization and specialization
Completeness Constraint
Disjoint and Overlapping Constraints
Mapping Supertype/Subtype Relationships to Relational Data Model

Overview
How to use SQL for data administration to create databases and tables.
SQL Data types
SQL Constraints
NOT NULL constraint
UNIQUE constraint
DEFAULT constraint
CHECK constraint
Primary Key
Foreign Key
DROP TABLE
ALTER TABLE
INSERT, UPDATE, and DELETE

The Data Definition Language (DDL):


SQL functions fit into two broad categories:
The Data Definition Language (DDL):
used to describe/create database schema

The Data Manipulation Language (DML):


used for selecting, inserting, deleting and updating data items in a database

Basic command set has vocabulary of less than 100 words

Creating the Database


It involves the followings:
Create the Database
CREATE DATABASE dbname;

Create DB Schema (Group of database objects that are related to each other)

Creating a Table
CREATE TABLE <name> (
<col-def-1>,

<col-def-2>,
:
<col-def-n>,

<constraint-1>,
:
<constraint-k>);

You supply A name for the table


A list of column definitions (including their names and data types [NOT] NULL,
DEFAULT values)
column_name1 data_type(size),
A list of constraints (Primary keys, Unique columns, Foreign keys)

For Better Table Structures


Use one line per column (attribute) definition
Use spaces to line up attribute characteristics and constraints
Table and attribute names are capitalized
NOT NULL specification
UNIQUE specification
Primary key attributes contain both a NOT NULL and a UNIQUE
specification
RDBMS will automatically enforce referential integrity for foreign keys
Command sequence ends with semicolon

Data Types
Data type selection is usually dictated by nature of data and by
intended use
Supported data types:

Number(L,D), Integer, Smallint, Decimal(L,D)


Char(L), Varchar(L), Varchar2(L)
Date, Time, Timestamp
Real, Double, Float
Interval day to hour
Many other types

Some of the Supported data types in Postgresql


Numeric Data types

Alphanumeric Data types

Date/time
Data types

SQL Constraints
Each constraint is given a name - Access requires a name, but some others dont

Constraints which refer to single columns can be included in their definition


NOT NULL constraint
Ensures that column does not accept nulls
UNIQUE constraint
Ensures that all values in column are unique
DEFAULT constraint
Assigns value to attribute when a new row is added to table
CHECK constraint
Validates data when attribute value is entered

Primary Keys
Primary Keys are defined through constraints
A PRIMARY KEY constraint also includes a UNIQUE constraint and makes the
columns involved NOT NULL
The <details> for a primary key is a list of columns which make up the key
CONSTRAINT <name>
PRIMARY KEY
(col1, col2, )

Example
CREATE TABLE distributors (

did integer,
name varchar(40),
PRIMARY KEY(did) );

CREATE TABLE distributors (


did integer PRIMARY KEY,
name varchar(40) );

Unique Constraints
As well as a single primary key, any set of columns can be specified as UNIQUE

This has the effect of making candidate keys in the table


The <details> for a unique constraint are a list of columns which make up the
candidate key
CONSTRAINT <name>
UNIQUE
(col1, col2, )

Example
CREATE TABLE films (
code char(5) CONSTRAINT firstkey PRIMARY KEY,
title varchar(40) NOT NULL,
did integer NOT NULL,
date_prod date,
kind varchar(10),
CONSTRAINT production UNIQUE(date_prod));
CREATE TABLE distributors (
did integer PRIMARY KEY DEFAULT nextval('serial'),

name varchar(40) NOT NULL CHECK (name <> '') );

Foreign Keys
Foreign Keys are also defined as constraints

You need to give


The columns which make up the FK
The referenced table
The columns which are referenced by the FK
CONSTRAINT <name>
FOREIGN KEY
(col1,col2,)
REFERENCES
<table>
[(ref1,ref2,)]
[ON DELETE action ] [ ON UPDATE action ] (table constraint)

Example
CREATE TABLE cities

( city varchar(80) primary key, location point );

CREATE TABLE weather


( city varchar(80) references cities(city),
temp_lo int, temp_hi int, prcp real, date date );

Example

CREATE TABLE Enrolment (


STU_NUM char(10),

CLASS_CODE integer,
ENROLL_GRADE char(6) NOT NULL,
PRIMARY KEY (STU_NUM,CLASS_CODE),

FOREIGN KEY (STU_NUM ) REFERENCES


STUDENT (STU_NUM),
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

FOREIGN KEY (CLASS_CODE)


REFERENCES CLASS (CLASS_CODE))

ON DELETE/ ON UPDATE
When the data in the referenced columns is changed, certain actions
are performed on the data in this table's columns.
The ON DELETE clause specifies the action to perform when a
referenced row in the referenced table is being deleted.
Likewise, the ON UPDATE clause specifies the action to perform when
a referenced column in the referenced table is being updated to a
new value.

Possible actions for each clause


NO ACTION
Produce an error indicating that the deletion or update would create a
foreign key constraint violation.
CASCADE
Delete any rows referencing the deleted row, or update the value of the
referencing column to the new value of the referenced column,
respectively.
SET NULL
Set the referencing column(s) to null.
SET DEFAULT
Set the referencing column(s) to their default values.

Example

CREATE TABLE Dept_Mgr(


did INTEGER,
dname CHAR(20),
budget REAL,
ssn CHAR(11) NOT NULL,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE NO ACTION)

Example of Weak Entity Sets


When the owner entity is deleted, all owned weak entities must also be
deleted.

CREATE TABLE Dep_Policy (


pname CHAR(20),
age INTEGER,
cost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)

CREATE TABLE AS
Define a new table from the results of a query
CREATE TABLE films_recent AS
SELECT * FROM films WHERE date_prod >= '2002-01-01;

CREATE TABLE films2 AS


TABLE films;

Deleting Tables
To delete a table use

DROP TABLE

[IF EXISTS]

<name>
Example:

DROP TABLE Module


BE CAREFUL with any SQL statement with DROP in it
You will delete any information in the table as well
You wont normally be asked to confirm
There is no easy way to undo the changes

Changing Tables
Sometimes you want to change the structure of an existing table
One way is to DROP it then rebuild it
This is dangerous, so there is the ALTER TABLE command instead

ALTER TABLE can

Add a new column


Remove an existing column
Add a new constraint
Remove an existing constraint

ALTERing Columns
To add or remove columns use
ALTER TABLE <table>
ADD COLUMN <col>
ALTER TABLE <table>
DROP COLUMN <name>

Examples
ALTER TABLE Student
ADD COLUMN
Degree VARCHAR(50)
ALTER TABLE Student

DROP COLUMN Degree

ALTERing Constraints
To add or remove columns use
ALTER TABLE <table>
ADD CONSTRAINT

<definition>
ALTER TABLE <table>
DROP CONSTRAINT
<name>

Examples
ALTER TABLE Module
ADD CONSTRAINT
ck UNIQUE (title)
ALTER TABLE Module

DROP CONSTRAINT ck

The basic data type char(n) is a _____ length character string and varchar(n) is _____
length character.
A) Fixed, equal
B) Equal, variable
C) Fixed, variable
D) Variable, equal
Updates that violate __________ are disallowed .
A) Integrity constraints
B) Transaction control
C) Authorization
D) DDL constraints
Which of the following SQL command can be used to modify basic storage
characteristic of a database table?
A) MODIFY
B) UPDATE
C) CHANGE
D) ALTER

INSERT, UPDATE, DELETE


The Data Manipulation Language (DML):
used for selecting, inserting, deleting and updating data items in a database
INSERT - add a row to a table
UPDATE and DELETE use WHERE
clauses to specify which rows to
change or remove
UPDATE - change row(s) in a table
BE CAREFUL with these - an incorrect
WHERE clause can destroy lots of data
DELETE - remove row(s) from a table

INSERT
INSERT INTO

<table>
(col1, col2, )
VALUES
(val1, val2, )

The number of columns and values must be the same


If you are adding a value to every column, you dont have to list them
SQL doesnt require that all rows are different (unless a constraint says so)

INSERT
Student
INSERT INTO Student
(ID, Name, Year)
VALUES (2, Mary, 3)

ID

Name

Year

1
2

John
Mary

1
3

Student

Student
ID

Name

Year

John

INSERT INTO Student


(Name, ID)
VALUES (Mary, 2)

ID

Name

Year

1
2

John
Mary

Student
INSERT INTO Student
VALUES (2, Mary, 3)

ID

Name

Year

1
2

John
Mary

1
3

UPDATE
UPDATE <table>
SET col1 = val1
[,col2 = val2]
[WHERE
<condition>]

All rows where the condition is true have


the columns set to the given values
If no condition is given all rows are
changed so BE CAREFUL
Values are constants or can be computed
from columns

UPDATE
Student

Student
ID

Name

Year

1
2
3
4

John
Mark
Anne
Mary

1
3
2
2

UPDATE Student
SET Year = 1,
Name = Jane
WHERE ID = 4

ID

Name

Year

1
2
3
4

John
Mark
Anne
Jane

1
3
2
1

Student
UPDATE Student
SET Year = Year + 1

ID

Name

Year

1
2
3
4

John
Mark
Anne
Mary

2
4
3
3

DELETE
Removes all rows which satisfy the
condition
DELETE FROM
<table>
[WHERE
<condition>]

If no condition is given then ALL rows are


deleted - BE CAREFUL

DELETE
Student
DELETE FROM
Student
WHERE Year = 2

Student
ID

Name

Year

1
2
3
4

John
Mark
Anne
Mary

1
3
2
2
DELETE FROM Student

ID

Name

Year

1
2

John
Mark

1
3

Student
ID

Name

Year

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 7 (Wednesday)
The Extended Entity Relationship (EER) Model
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Announcements
The test will be online on canvas
You need to make sure that you are registered on the correct Module
(Fundamentals/ICY) or on canvas.
Test will be Multiple Choice Questions (20 questions)
It will cover all of the concepts that we have discussed so far including this weeks
lectures (week7) but not the next week (week 8).
The ONLINE TEST accounts for 10% of the module mark
Test will be available on Friday 21 of November at 3 pm and will close (same day) on
Friday 21 of November at 10 pm
The last opportunity for you to take the test is from 9pm to 10pm
Once you start the test it should take you only 60 minutes to complete the test.

Announcements
The test is marked automatically and you are going to see your mark
right away after you finish the test
Correct answers will be released on Saturday on canvas.
Last years class test and answers are released on canvas.
If you are entitled for extra time you need to contact welfare ASAP
Today we are staring chapter 8 in the book.
C. Coronel, S. Morris, P. Rob & K. Crockett, Database Principles: Fundamentals of Design,
Implementation and Management, 10th Edition, 2013.

Reminder of Previous Lecture


Relationship Degree (Revised)
What is the recursive Symmetry Relationships
Implementation of the non-symmetric 1:M recursive relationship
Implementation of the non-symmetric N:M recursive relationship
Implementation of the symmetric 1:1 recursive relationship & nonredundant implementations
The problem of Symmetry
Redundant Relationships

Phases of Database Design


Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema

Conceptual design: begins with the collection of


requirements and results needed from the database. It is
high level description (often done with Entity
Relationship (ER) model). DBMS independent.
Logical design: description of the structure of the
Relational database map ERD into relational data model.
Closer to the actual implementation. DBMS specific.
Schema Refinement consistency, normalization

Physical schema is a description of the implementation


(programs, tables, dictionaries and catalogs)

Overview
About the extended entity relationship (EER) models main constructs
Supertype and subtype relationships
Why and When to Consider Supertypes and Subtypes?
Relationships and Subtypes
Generalization and specialization
Completeness Constraint
Disjoint and Overlapping Constraints
Mapping Supertype/Subtype Relationships to Relational Data Model
Summary

The Extended Entity Relationship Model


It aims at adding more semantic constructs to original entity
relationship (ER) model
Diagram using this model is called an Extended Entity Relationship
Diagram (EERD)
It depends on the idea of Entity supertype and Entity subtypes

Entity Supertypes and Subtypes

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Entity Supertypes and Subtypes


Entity supertype
Generic entity type related to one or more entity subtypes
Contains common characteristics (attributes shared by all its subtypes)
Entity subtypes
Contains unique characteristics (special attributes) of each entity subtype
May participate in unique relationships
Primary key of a subtype is normally that of the supertype
Subtype exists only within context of supertype
Every subtype has only one supertype to which it is directly related
Can have many levels of supertype/subtype relationships

Why Consider Supertypes and Subtypes?

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Why Consider Supertypes and Subtypes?


The grouping of Employees into various types provides two benefits:
It avoids unnecessary null values in some non-shared attributes
It enables a certain employee type to participate in relation ships that are unique
to that employee type.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

When to Consider Supertypes and Subtypes?


If you have different kinds or types of the entity in the users environment.
The different kinds or types of instance should each have one or more attribute
that are unique to that particular type.

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Relationships and Subtypes


Relationships at the supertype level indicate that all subtypes will participate in the
relationship.
The instances of a subtype may participate in a relationship unique to that subtype.
In this situation, the relationship is shown at the subtype level.

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Specialization and Generalization


Specialization
Identifies more specific entity subtypes from higher-level entity supertype
Top-down process
Based on grouping unique characteristics and relationships of the subtypes

Generalization
Identifies more generic/general entity supertype from lower-level entity
subtypes
Bottom-up process
Based on grouping common characteristics and relationships of the subtypes

Inheritance
Enables entity subtype to inherit attributes and relationships of
supertype
All entity subtypes inherit their primary key attribute from their
supertype
At implementation level, supertype and its subtype(s) maintain a 1:1
relationship

Completeness Constraint
Specifies whether entity supertype occurrence must be a member of at least one
subtype
Partial completeness
Symbolized by a single line
Some supertype occurrences that are not members of any subtype
Total completeness
Symbolized by a double line
Every supertype occurrence must be member of at least one subtype

Examples of completeness constraints


Partial completeness

A vehicle could be a car, a truck,


or neither

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Examples of completeness constraints


Total completeness

A patient must be either an


outpatient or a resident patient

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Disjoint and Overlapping Constraints


Whether an instance of a supertype may simultaneously be a
member of two (or more) subtypes.
Disjoint subtypes
An instance of the supertype can be only ONE of the subtypes
Symbolized by a the letter d

Overlapping subtypes
An instance of the supertype could be more than one of the subtypes
Symbolized by a the letter o

Example of Disjoint constraints

A patient can either be outpatient or resident,


but not both

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Example of Overlap constraint

A part may be both


purchased and
manufactured

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Example of supertype/subtype hierarchy

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

When an entity instance must be a member of only one subtype, it is which of the following?
A) Disjoint with total specialization

B) Disjoint with partial specialization

C) Overlap with total specialization

D) Overlap with partial specialization

When an entity instance may be a member of multiple subtypes or it does not have to be a member of a
subtype, it is which of the following?
A) Disjoint with total specialization

B) Disjoint with partial specialization

C) Overlap with total specialization

D) Overlap with partial specialization

Use of a supertype/subtype relationship is necessary when which of the following exists?


A) An instance of a subtype participates in a relationship that is unique to that subtype.
B) An instance of a subtype participates in a relationship that is the same as the other subtypes

C) Attributes apply to all of the instances of an entity type.


D) No attributes apply to any of the instances of an entity type.

Mapping Supertype/Subtype Relationships

One relation for supertype and for each subtype


Supertype attributes (including identifier) go into supertype relation
Subtype attributes go into each subtype;
Primary key of supertype relation also becomes primary key and a foreign
key of subtype relation
There is no way to enforce completeness constraint or disjointness
(disjoint/overlap)
These must be enforced through application programming
You may consider it as 1:1 relationship established between supertype and
each subtype, with supertype as primary table

Mapping Supertype/Subtype Relationships

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Mapping Supertype/Subtype Relationships

EMP_NUM is considered here as


Primary key for the table PILOT
Foreign key referring to the EMPLOYEE table
Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 10th Ed.

Mapping Supertype/Subtype Relationships

What do you suggest here??

Source of Image: Modern Database Management 6th Edition, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

Summary
Extended entity relationship (EER) model adds semantics to ER model via entity supertypes and
subtypes
Entity supertype is a generic entity type related to one or more entity subtypes
Specialization hierarchy depicts arrangement and relationships between entity supertypes and
entity subtypes

Inheritance means an entity subtype inherits attributes and relationships of supertype


Disjoint subtypes an instance of the supertype can be only ONE of the subtypes
Overlapping subtypes : An instance of the supertype could be more than one of the subtypes

Partial completeness : Some supertype occurrences that are not members of any subtype
Total completeness : Every supertype occurrence must be member of at least one subtype

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 8 (Friday)
Functional Dependencies and Normalization for Relational Databases
(part 2)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Reminder of Previous Lecture


Normalization is the process of evaluating and correcting table
structures to minimize data redundancies and reduces data
anomalies
An un-normalized relation (table) stores redundant data, which can
cause
Insertion anomalies
Deletion anomalies
Update anomalies

Normal Forms
Normalization can be divided into a series of stages called normal
forms, giving more and more protection:
1NF Relations
2NF Relations
3NF Relations

BCNF Relations
4NF Relations

First Normal Form


Disallows
multivalued attributes
nested relations; attributes whose values for an individual tuple
are non-atomic
Considered to be part of the definition of relation

Normalization into 1NF

Composite Keys

Overview

Prime vs. Nonprime Attribute


If a relation schema has more than one key, each is called a candidate
key.
One of the candidate keys is arbitrarily designated to be the primary key, and
the others are called secondary keys.

A Prime attribute must be a member of some candidate key


A Nonprime attribute is not a prime attributethat is, it is not a
member of any candidate key.

Second Normal Form (2NF)


An entity type is in second normal form (2NF) if:

It is in 1NF and
Every non-prime attribute A in R is fully functionally dependent on
the primary key
It includes no partial dependencies (No attribute is dependent
on only portion of primary key) if

1NF but not in 2NF because of a partial dependency

Conversion to Second Normal Form


Step 1:
For each determinant D involved in a partial dependency in the
original entity type T, use D as, also, the PK for a new entity type
NT(D)
Step 2:
Move out the attributes X determined by D into NT(D).

D itself stays in T as well as being copied into NT(D).


At this point, most anomalies have been eliminated

It is in 1NF
It includes no partial dependencies

Third Normal Form (3NF)


A table is in third normal form (3NF) if:
It is in 2NF
It contains no transitive dependencies
(no non-prime attribute A in R is transitively dependent on the
primary key)
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a
problem only if Y is not a candidate key.
When Y is a candidate key, there is no problem with the transitive
dependency .

2NF but not in 3NF because of a transitive dependency

A non-prime attribute is determined by another nonprime


attribute

Conversion to Third Normal Form


Step 1: Identify Each New Determinant

For each determinant D involved in a transitive dependency in the original


entity type T, use D as, also, the PK for a new entity type NT(D)
Step 2: Identify the Dependent Attributes
and move out the attributes X transitively determined by D into NT(D).
NB: the determinants themselves stay in T as well.
Name tables to reflect its contents and function

It is in 2NF
It contains no transitive dependencies

Figure 10.11 Normalization the following relation to 3NF

Figure 10.11 Normalization the following relation to 3NF

Figure 10.11 Normalization into 2NF

Figure 10.11Normalization into 2NF and 3NF

Figure 10.11Normalization into 2NF and 3NF

The Boyce-Codd Normal Form (BCNF)


Every determinant in table is a candidate key
Has same characteristics as primary key, but for some reason, not
chosen to be primary key
When table contains only one candidate key, the 3NF and the BCNF
are equivalent
BCNF can be violated only when table contains more than one
candidate key

The Boyce-Codd Normal Form (BCNF)


(continued)
Most designers consider the BCNF as special case of 3NF
Table is in 3NF when it is in 2NF and there are no transitive
dependencies
Table can be in 3NF and fail to meet BCNF
No partial dependencies, nor does it contain transitive
dependencies
A nonkey attribute is the determinant of a key attribute

A,B
A,C

C,D
B,D

This change is appropriate because


The dependency C--> B means that
C is effectively a superset of B

Student ID

Student ID

Staff ID

Class Code

Staff ID

Class Code

Class Code

Staff ID

Enroll_Grade

Enroll_Grade

Student ID

Class Code

Enroll_Grade

Class Code

Staff ID

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 8 (Wednesday)
Functional Dependencies and Normalization for Relational Databases
(part 1)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Reminder of Previous Lecture


How to use SQL for data administration to create databases and tables.
SQL Data types
SQL Constraints
NOT NULL constraint
UNIQUE constraint
DEFAULT constraint
CHECK constraint
Primary Key
Foreign Key
DROP TABLE
ALTER TABLE
INSERT, UPDATE, and DELETE

Overview
Motivation
What normalization is and what role it plays in the database design
process
Identify possible insertion, deletion, and update anomalies in a
relation
How normalization and ER modeling are used concurrently to
produce a good database design
What are the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF
Identify functional dependencies, determinants, and dependent
attributes
First Normal Form

Phases of Database Design


Data
Requirements
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema

Conceptual design: begins with the collection of


requirements and results needed from the database. It
is high level description (often done with Entity
Relationship (ER) model). DBMS independent.
Logical design: description of the structure of the
Relational database map ERD into relational data
model. Closer to the actual implementation. DBMS
specific.
Schema Refinement consistency, normalization
Physical schema is a description of the
implementation (programs, tables, dictionaries and
catalogs)

Motivation
Consider that you are requested to design a database from an
existing data from spreadsheets as given in the table below.
SKU implies Stock Keeping Unit

What is the best table


design??

Should this data be stored as


two separate tables??
Or join the tables together and
design the database with just
one table??

What are the criteria


for "good" base
relations?
How can we convert
a bad relation to a
better design
relation?

The process of
decomposing
unsatisfactory "bad"
relations by breaking
up their attributes
into smaller relations
is known as
Normalization

Database Tables and Normalization


Normalization is the process of evaluating and correcting table
structures to minimize data redundancies and reduces data
anomalies
It is often used within ER modeling, to help produce a good database
design.
It can be considered as an alternative approach for database
modeling.
It evaluates entity types, and when appropriate creates new entity
types and adjusts attributes in existing ones
Normalization generally increases the number of tables and makes
many queries more elaborate.

The Need for Normalization


An un-normalized relation (table) stores redundant data, which can
cause
Insertion anomalies
Deletion anomalies
Update anomalies

Deletion Anomaly
Suppose we delete the data for repair number 2100.
When we delete this row (the second one), we remove not only data about the
repair, but also data about the machine itself.
We will no longer know, for example, that the machine was a Lathe and that its
AcquisitionPrice was 4750.00.
When we delete one row, the structure of this table forces us to lose facts about
two different things, a machine and a repair.

Insertion Anomaly
Now suppose we want to enter the first repair for a piece of equipment.
To enter repair data, we need to know not just RepairNumber, RepairDate, and
RepairCost, but also ItemNumber, EquipmentType, and AcquisitionCost.
If we work in the repair department, this is a problem, because we are unlikely to
know the value of AcquisitionCost.
The structure of this table forces us to enter facts about two entities when we
just want to enter facts about one.

Update Anomaly
Suppose we update the last row of the following table using the data (100, Drill
Press, 5500, 2500, 08/17/09, 275).
The drill press has two different AcquisitionCosts (data inconsistency).
Equipment cannot be acquired at two different costs. If there were, say, 10,000
rows in the table, however, it might be very difficult to detect this error.

(100,

Drill Press,

5500,

2500,

08/17/09,

275)

Normal Forms
Normalization can be divided into a series of stages called normal
forms, giving more and more protection:
1NF Relations
2NF Relations
3NF Relations

BCNF Relations
4NF Relations

The Normalization Process

2NF is better than 1NF; 3NF is better than 2NF


Objective of normalization is to ensure all tables in at least 3NF
Each table represents a single subject
No data item will be unnecessarily stored in more than one table
All attributes in a table are dependent on the primary key
Each table void of insertion, update, deletion anomalies
Normalization works one relation at a time
Progressively breaks table into new set of relations based on
identified dependencies

The Normalization Process


For most business database design purposes, 3NF is as high as
needed in normalization
Highest level of normalization is not always most desirable
Price paid for increased performance is greater data redundancy
Some situations require non-normalization or denormalization for
efficiency reasons.
Denormalization produces a lower normal form

Normal Forms

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Functional dependency are used to specify formal measures of the


"goodness" of relational designs

Functional Dependency (FD)


In general, a functional dependency exists when the value of one or more
attributes determines the value of another attribute.
Suppose you are buying boxes of cookies and each box costs 5.00.
Then the cost of several boxes with the formula:
CookieCost = NumberOfBoxes * 5
Then we can say that CookieCost is functionally dependent on NumberOfBoxes
and the UnitPrice (i.e., 5 ).
This expression can be read as NumberOfBoxes and UnitPrice determines
CookieCost.
(NumberOfBoxes , UnitPrice)
CookieCost
The variable on the left, here NumberOfBoxes and UnitPrice, are called the
determinant.

Functional Dependency (FD)

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

FDs are derived from the real-world constraints on the attributes


Can be displayed graphically on a relation schema as in the following slide.

Functional Dependency (FD)


Consider the relation
Student (ID, Name, Soc Sec Nbr, Major, Deptmt)
Assume a department offers several majors, e.g. INSY department offers, INSY,
MASI, and POMA majors.
How many determinants can you identify in Student?
(Soc Sec Nbr) (ID, Name, Major, Deptmt)
(ID)
(Name, Soc Sec Nbr, Major, Deptmt)
(Major)
(Deptmt)
A Dependency diagram

ID

Name

Soc_Sec_Nbr

Major

Dept

Functional Dependency (FD)


Full functional dependency
Attribute B is fully functionally dependent on attribute A if it is
functionally dependent on A and not functionally dependent on
any proper subset of A (partial dependency).
This becomes an issue only with composite keys.

Transitive dependency
A, B and C are attributes of a relation such that A B and B C,
then C is transitively dependent on A via B (provided that A is not
functionally dependent on B or C)

What are the functional dependencies in this table?


Can you spot any Partial dependency
Can you spot any Transitive dependency
Note that Primary Keys are underlined

Source of Image: Fundamentals of Database Systems by Ramez Elmasri , Shamkant B. Navathe

Partial dependency

Source of Image: Fundamentals of Database Systems by Ramez Elmasri , Shamkant B. Navathe

What are the functional dependencies in this table?


Can you spot any Partial dependency
Can you spot any Transitive dependency
Note that Primary Keys are underlined

Source of Image: Fundamentals of Database Systems by Ramez Elmasri , Shamkant B. Navathe

Transitive dependency

Source of Image: Fundamentals of Database Systems by Ramez Elmasri , Shamkant B. Navathe

First Normal Form (1NF)


Just insists on some restrictions we have already explicitly or
implicitly imposed on entity types and tables:
A relation is in 1NF if all underlying domains contain atomic
values only, i.e., no repeating groups. (The relation must not
contain multivalued attribute)
In the entity type there is a candidate key whose attributes
never have NULL values, and one such key has been chosen as
the primary key.
Normalizing table structure will reduce data redundancies

A Sample Report Layout with repeated groups

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Conversion to First Normal Form


Step 1: Eliminate the Repeating Groups
Eliminate nulls: each repeating group attribute contains an appropriate
data value

Step 2: Identify the Primary Key


Must uniquely identify attribute value
New key must be composed

Step 3: Identify All Dependencies


Dependencies depicted with a diagram

That Table put into 1NF (assuming there is a PK)


There are
no
repeating
groups in
the table.

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Conversion to First Normal Form (continued)


All key
attributes
are defined

Dependency diagram:

No
repeated
Groups

Source of Image: Database Principles: Fundamentals of Design, Implementation and Management, 2nd Ed.

Example of First
Normal Form

Source of Image: Fundamentals of Database Systems by Ramez Elmasri , Shamkant B. Navathe

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 9 (Friday)
Mathematical background to tables

Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Motivation
Manipulation of data
(query
and
update
operations) corresponds
to operations on relations

A query is applied to relation


instances, and the result of a
query is also a relation
instance.

Relational algebra describes those operations


Data is represented in a Relational Model
Supports simple, powerful Query languages

Relational algebra
First described by Codd at IBM,
It is a family of algebra with a well-founded semantics used for
modelling the data stored in relational databases, and defining
queries on it.
Relational algebra contains two kinds of operators:
common a set operations (such as union, intersection, and
cartesian product),
operators specific to relations (for example projecting on one of
the columns) selection (keeping only some rows of a table)

Relational Model

Emp is a relation
and it is a set with eight members
A mathematical relation is a set of
tuples: sequences of values.
Each tuple corresponds to a row in a
table.
Ech tuple can be considered as a
member/element in the set.

Mathematical sets: Basics


A set is an unordered collection of items of any sorts (people, numbers,
numerals, shoes, atoms, strings of characters, databases, sets, blades of grass, )
without any duplication of items.
The items are called elements or members.

S = {34, SHF, 59, UoB}, where SHF is a name for me and UoB is a name for
this university,
means that
S is the set consisting of (exactly) the following four items:
the abstract number 34, me, the abstract number 59, this university.

Basics, contd
{34, SHF, 59, UoB} = {UoB, 59, 34, SHF, SHF, 34}

Order of writing the members doesnt matter; duplication in the writing


doesnt duplicate the member.
A set can be infinite (e.g., the set of all whole numbers).
A set can contain just one member. Singleton set.
Theres a set with no members at all: the empty set, usually notated as , but
can also be written { }.
Somewhat analogous to zero, or a new committee which has no members
yet.

Another Notation
{n | n is an integer, n > 301} =
The set of n such that n is an integer and n > 301.
(Actually, this notation is a slight simplification.)

The set is the same as that denoted by, for instance,


{n | n is an integer, n 302}.

Some More Examples


{SHF, SHF} has 2 members: me, and a 3-char string.
{3, {4,5}, 4, 6}has 4 members, one of which is a set.
{3, {5,4}, 4, 6} is that same set.
{ {4,5} } has 1 member, which is a set.
{4,5} has 2 members, both numbers.
{} is a singleton set. Its only member is the empty set.
{{}} is a different singleton set.

Membership Relationship
a A means that a is a member of A.
5

{4,5}

{5,4} {3, {4,5}, 4, 6}

a A means that a is not a member of A.


5

{3, {4,5}, 4, 6}

{5}

{3, {4,5}, 4, 6}

{4,6}

{3, {4,5}, 4, 6}

{3,4,5} {3, {4,5}, 4, 6}

Subsets and Supersets


A B means that A is a subset of B (and that B is a superset of A). I.e., every
member of A is also a member of B.
Carefully distinguish between subset-of and member-of !!!
The symbol means the same as
does NOT mean that there cannot be equality.
Examples:
{4,5}
{5} {4,5,6}, {6,4} {4,5,6,7}, {6,4,7,5} {4,5,6,7}
{n | n is an EVEN whole number} {n | n is a whole number}

Subsets and Supersets


A for any set A.
A A for any set A.

(Reflexivity)

If A B and B A then A = B.

(Antisymmetry)

If A B and B C then A C.

(Transitivity)

Some Operations on Sets


Union of sets A and B:
A B = the set of things that are in A or B (or both).
NB: no repetitions created.
Intersection of sets A and B:
A B = the set of things that are in both A and B.
Difference of sets A and B:
A B = the set of things that are in A but not B.
Note: also notated by a backslash instead of a minus sign.
The minus sign is also more standardly used as in A a to mean remove
member a from A (if its a member of A at all).

Some Properties of those Operations


Union and intersection are commutative (can switch):
AB=BA
AB=BA
Union and intersection are associative
(can group differently):
A (B C) = (A B) C
A (B C) = (A B) C
Because of associativity, we can omit parentheses for union-only or intersection-only
cases:
ABCD
ABCD

Bad Associations
Caution: if an operation is not associative, the position of
parentheses is normally important.
In arithmetic, division is non-associative.
(x/y)/z is usually a different value from x/(y/z).

Two Other Properties


Union distributes over intersection:
A (B C) = (A B) (A C)

Intersection distributes over union:


A (B C) = (A B) (A C)

People

Tuples in a Table

PERS-ID

NAME

AGE

9568876A

Chopples

37

2544799Z

Blurp

1698674F

Rumpel

88

The tuples are just lists representing the rows:

9568876A, Chopples, 37 >


2544799Z, Blurp, NULL >
1698674F, Rumpel, 88 >

Table Rows are Tuples


In a table, each attribute has a value domain the set of values
that the attribute can have. E.g., the set of integers, the set of all
character strings of any length, or the set of character strings of a
specific format and length.
If the attribute allows NULL values, we include NULL in the value
domain as well.
The values in a row form a tuple of values from the respective value
domains. Just a list of the values, one for each attribute.

Tuples in General
A tuple in general is an ordered sequence of items of any sort. We will only deal with
finite tuples. Items CAN be duplicated.
Can also be called a vector in other CS terminology.

Notation by angle brackets and commas:


6, JAB, 5, JAB, 5, , 9>
Singleton and empty tuples: <6>, <>
The concatenation ( ) of two tuples is just the result of putting them end to end to get
one tuple.
<6, JAB, 5> <5,6> = <6, JAB, 5, 5, 6>
<6, JAB, 5> <>

= <6, JAB, 5>

Cartesian Products
The set of all possible tuples formed from some list of sets is called the Cartesian
product of the sets.
Notation, e.g.:
DEFGH
if D, E, F, G, H are the setsnot necessarily different.
The tuples are all possible tuples of the form

<d, e, , h>
where

d D, e E, , h H

Examples
Let A = {3, 8, 2} and B = {jjj, bb}.
Then A B =
{ <3, jjj>, <3, bb>, <8, jjj>, <8, bb>, <2, jjj>, <2, bb> }.
B B = { <jjj, jjj>, <jjj, bb>, <bb, jjj>, <bb, bb>}.
A = = A
A {TRUE} = { <3, TRUE>, <8, TRUE>, <2, TRUE> }

Relations
Any subset at all of a Cartesian product is called a relation on the sets in question
(D, E, above)
even the whole of the product (even if infinite)
and even the empty set.

I.e., a relation on D, E, , H is just some set of tuples that are each of form <d,e,
, h> where d D, e E, , h H.

Examples
Let A = {3, 8, 2} and B = {jjj, bb}.
The Cartesian product A B =
{ <3, jjj>, <3, bb>, <8, jjj>, <8, bb>, <2, jjj>, <2, bb> }.

Some relations on A and B:


{ <3, jjj>, <3, bb>,
{ <2, bb> }
AB

<2, jjj>}

Rows as forming a Relation


So, for a given table, the tuples corresponding to all possible rows
that you could create using whatever values you like from the value
domains, forms the Cartesian product of the value domains of the
table.

And, provided the table does not have repeated rows:


AT ANY MOMENT the actual set of rows, considered as tuples, is a
relation on the tables value domains.
NB: crucial here that no row is exactly repeated, because a mathematical set
cannot have repeated elements.

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 9 (Wednesday)
Functional Dependencies and Normalization for Relational Databases
(part 3)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Announcements
Assignment 9 (assessed accounts for 10% of the module mark) will be
available on canvas today.
No lab session this week because you will be starting the conceptual
phase of assignment 9. However, you are very welcomed to attend
the lab but no demonstrators will be available.

Syllabus for ICY Databases students will be finished today


The Master's students (but NOT the ICY students) still have the
following Learning Outcome (LO 5):
Apply relational algebra and the mathematical theory of relations
to describe databases, queries, and consistency conditions.
However, ICY students will be expected to come to lectures in full, as I
may make occasional additional comments that are not on LO5.

Reminder of Previous Lecture

Overview
Fourth Normal Form (4NF)
Normalization and Database Design
Denormalization
Summary
Assignment 9 specifications.

Reminder of Previous Lecture

Fourth Normal Form (4NF)


About a different sort of issue from 2NF/3NF/BCNF.
Those NFs are concerned with the redundancy from functional
dependencies (FDs).
4NF is concerned with redundancy resulting from multivalued
dependencies (MVDs).

Fourth Normal Form (4NF)


A relation is a 4 NF if it is BCNF and
There is no multivalued dependency in the relation or
There are multivalued dependency but the attributes, which are multivalued
dependent on a specific attribute, are dependent between themselves

What is a multivalued dependency (MVD)?

Definition of MVD
A multivalued dependency of some attribute X on an attribute-set D
is like a functional dependency except that X is allowed to have
several values for a given value of D.
The crucial point is that once the D value is specified, the X values are
independent of other attributes.
So, we can think of X as a multivalued attribute implemented by
putting different values in different rows, where the set of X values is
fully determined by just the value of D.

Not 4 NF Example
Assume the following relation with multivalued dependency:
Employee (Eid:pk1, Languages:pk2, Skills:pk3)
Recall that a relation is in BCNF if all its determinant are candidate
keys.
Because relation Employee has only one determinant (Eid, Language,
Skill), which is the composite primary key.
Since the primary is a candidate key, R is in BCNF.
However this relation has a MVD
Eid --->> Languages
Eid --->> Skills
Languages and Skills are independent.

Not 4 NF Example (conti...)


Eid
100
100
100
100
200

Language
English
Kurdish
English
Kurdish
Arabic

Skill
Teaching
Politics
Politics
Teaching
Singing

Insertion anomaly: To insert row (200 English Cooking) we have to insert


two extra rows (200 Arabic cooking), and (200 English Singing) otherwise
the database will be inconsistent.

Not 4 NF Example (conti...)


Here is the table after the insertion:
Eid
100
100
100
100
200
200
200
200

Language
English
Kurdish
English
Kurdish
Arabic
English
Arabic
English

Skill
Teaching
Politics
Politics
Teaching
Singing
Cooking
Cooking
Singing

Change to 4NF
By placing the multivalued attributes in tables by themselves we can
convert the table to the following:
Eid

Language

Eid --->> Languages

Skill
Eid --->> Skills

Eid

Language

Eid

Skill

4 NF Example
Assume the following relation:
Employee (Eid:pk1, Language:pk2, Skill:pk3)

Eid
100
100
100
200
200

Language
English
Kurdish
French
English
Arabic

Skill
Teaching
Politic
Cooking
Cooking
Singing

4 NF Example (conti...)
Assume the following relation with multi-value dependency:
Employee (Eid:pk1, Languages:pk2, Skills:pk3)

Eid --->> Languages

Eid --->> Skills

Languages and Skills are dependent.


This says an employee speaks several languages and has several
skills. However for each skill, a specific language is used when that
skill is practiced.

4 NF Example (conti...)
Thus employee 100 when she teaches, she uses English; but when she cooks, she
uses French. This relation is in fourth normal form.

Eid
100
100
100
200
200

Language
English
Kurdish
French
English
Arabic

Skill
Teaching
Politic
Cooking
Cooking
Singing

Normal Forms Overall


Normalization helps eliminate data redundancies and some other aspects
of poor structure.
Normalization focuses on problems in individual entity types.
Make sure that proposed entities meet required normal form before table
structures are created
Difficult to separate normalization from overall ER modelling process.
Normalization cannot, by itself, guarantee good designs.
Non-normalized entity types may be desirable in some cases, to increase
processing speed and/or reduce conceptual complexity of operations.

Normal Forms Overall


Let < mean provides less protection than. Then:

1NF < 2NF < 3NF < BCNF ((and 3NF < 4NF))
((Also BCNF < 4NF under the second definition of 4NF.
BCNF and 4NF guard against relatively unusual situations. BCNF is
more disruptive to achieve than 2NF or 3NF.
3NF is a reasonable target, but BCNF, 4NF etc. may also need to be
considered.

Non-Normalization/Denormalization
If tables decomposed to conform to normalization requirements:
Number of database tables expands

Joining larger number of tables takes additional disk input/output


(I/O) operations, additional manipulation complexity, and possibly
substantial communication delays.
Processing requirements should also be a goal
Conflicts among design principles, information requirements, and
processing speed are often resolved through compromises that may
include ending up with some non-normalized tables.

Summary
Normalization is used to minimize data redundancies
First three normal forms (1NF, 2NF, and 3NF) are most commonly
encountered
Table is in 1NF when it doesn't contain a repeated group attribute. All
key attributes are defined
Table is in 2NF when it is in 1NF and contains no partial dependencies
Table is in 3NF when it is in 2NF and contains no transitive
dependencies
Table that is not in 3NF may be split into new tables until all of the
tables meet 3NF requirements

Summary (continued)
Normalization is important partbut only partof the design
process
Table in 3NF may contain multivalued dependencies
Numerous null values or redundant data

Convert 3NF table to 4NF by:


Splitting table to remove multivalued dependencies

Tables are sometimes denormalized to yield less I/O, which increases


processing speed

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 10 (Wednesday)
Relational Algebra (part 1)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Relational Query Languages


Languages for describing queries on a relational database
Structured Query Language (SQL)
Predominant application-level query language
Declarative

Relational Algebra
Intermediate language used within DBMS
The basic set of operations for the relational model
Procedural

Relational Algebra
A formal language (based on operators and a domain of values) that
aims to perform queries in relational databases
It is often considered to be an integral part of the relational data
model.
Why is it important??
It provides a formal foundation for relational model operations.
It is used as a basis for implementing and optimizing queries in the
query processing and optimization modules that are integral parts of
relational database management systems (RDBMSs),

Relational Algebra in a DBMS


Relational
algebra
expression

SQL
query

Optimized
Relational
algebra
expression

Query
execution
plan

Executable
code

Code
generator

parser

Query optimizer

DBMS

Relational Algebra Operations


Set operations from mathematical set theory (each relation is considered
as a set of tuples)
Set-difference ( ) Tuples in r1, but not in r2.
Union ( ) Tuples in r1 or in r2.
Intersection () Tuples in r1 and in r2.
Cross-product ( ) Allows us to combine two relations.
Operations developed specifically for relational databases
Selection ( s ) Selects a subset of rows from relation (horizontal).
Projection ( p ) Retains only wanted columns from relation (vertical).
Join ( ) Joining two relations.
Use of relational algebra operators on existing tables produces new tables

Select Operator
Produce table containing subset of rows of argument table satisfying condition

Select Operator
SQL:
SELECT * FROM WHERE
Note: its the WHERE part that is actually doing the selection
according to a criterion.

Relational algebra notation


scondition relation
More compact than SQL notation. Avoids notation private to
particular versions of particular programming languages.

Select Operator
SQL:

SELECT *
FROM Person
WHERE Hobby=stamps
Relational Algebra: sHobby=stamps(Person)
Person
Id
1123
1123
5556
9876

Name
John
John
Mary
Bart

Address
123 Main
123 Main
7 Lake Dr
5 Pine St

Hobby
stamps
coins
hiking
stamps

Id
1123
9876

Name Address
John 123 Main
Bart
5 Pine St

Hobby
stamps
stamps

Selection Condition - Examples


s Id>3000 Or Hobby=hiking (Person)
s Id>3000 AND Id <3999 (Person)
s NOT(Hobby=hiking) (Person)
s Hobbyhiking (Person)

Project Operator
Produces table containing subset of columns of argument table

Project Operator
SQL:
SELECT column specs FROM

Relational algebra notation


attribute list(relation)
Retains only attributes that are in the projection list.
Schema of result:
exactly the fields in the projection list, with the same names that they had in the input relation.

Projection operator has to eliminate duplicates

Project Operator
SQL:

SELECT name, hobby


FROM Person
Relational Algebra: Name,Hobby(Person)
Person
Id

1123
1123
5556
9876

Name

Address

Hobby

John
John
Mary
Bart

123 Main
123 Main
7 Lake Dr
5 Pine St

stamps
coins
hiking
stamps

Name Hobby

John
John
Mary
Bart

stamps
coins
hiking
stamps

Expressions
Id, Name (s
Id

Name

Address

1123
1123
5556
9876

John
John
Mary
Bart

Hobby=stamps OR Hobby=coins

Hobby

123 Main
123 Main
7 Lake Dr
5 Pine St

stamps
coins
hiking
stamps

Id

(Person) )

Name

1123 John
9876 Bart

Result

Person

13

Relational Set Operations


Union of relations R and S:

R S = the set of tuples that are in R or S (or both).


NB: no repetitions created!
Intersection of relations R and S:
R S = the set of tuples that are in both R and S.
Difference of relations R and S:
R S = the set of tuples that are in R but not S.

Union-compatible relations
Result of combining two relations R and S with a set operator
is a relation => all its elements must be tuples having same
structure
Hence, scope of set operations limited to union compatible
relations
Two relations A and B are union-compatible if they have the
same
number of columns and corresponding
columns have the same domains.

Union
Let A and B be two union-compatible relations.
Result of A B, contains all rows A in and all rows in B, with duplicate rows eliminated

Which of these are union-compatible?

(B)

(A)

(C)

To retrieve the Social Security numbers of all employees who either work in
department 5 or directly supervise an employee who works in department 5,
using the UNION operation
Alternative soluion
Note renaming the result set.

Difference
Let R and S be two union-compatible relations.
Then their difference R - S is a relation which contains tuples which are in R but
not in S

Intersect
Let R and S be two union-compatible relations.
Then their intersection is a relation R S which contains tuples which are both in R and S
Note that INTERSECTION can be expressed in terms of union and set difference as follows:
R S = ((R S) (R S)) (S R)

Cross-Join or Product
SQL:
SELECT * FROM two [or more] tables
NB: its the mere listing of the tables that does the Product, but its possible
also to write:
SELECT * FROM T1 CROSS JOIN T2 CROSS JOIN ...
Relational algebra notation:
Result table is T1 T2 where T1 and T2 are the given tables.
Each row of T1 paired with each row of T1.
Yields a table containing all concatenations of whole rows from first given
table with whole rows from second given table.

Cross-Join or Product
If second table also had a PRICE attribute, then the product would have a Table1.PRICE
attr. and a Table2.PRICE attr.

Note that the two


tables need not be
union compatible

JOIN operation
Join (various types)
Allows us to join related rows from two or more tables
Its an important feature of the relational database idea

Joining has been implicitly important because of the use of mutli-table


queries and the use of WHERE to test for attribute equality between
tables.
Denoted by

Condition-Join
R c S s c ( R S)
Where R and S are relations and c is the condition applied.

The JOIN operation can be specified as a PRODUCT operation followed by a


SELECT operation.
Fewer tuples than PRODUCT.
Filters tuples not satisfying the join condition.
Sometimes called a theta-join.
A general join condition is of the form
<condition> AND <condition> AND...AND <condition>
Tuples whose join attributes are NULL or for which the join condition is
FALSE do not appear in the result.

Ramez A. Elmasri, Shankrant B. Navathe. 1999. Fundamentals of Database Systems (3rd ed.). Carter
Shanklin (Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Ramez A. Elmasri, Shankrant B. Navathe. 1999. Fundamentals of Database Systems (3rd ed.). Carter
Shanklin (Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Ramez A. Elmasri, Shankrant B. Navathe. 1999. Fundamentals of Database Systems (3rd ed.). Carter
Shanklin (Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Fundamentals/ICY: (06-21923)/(06-21980) Databases


2014/15
Week 10 (Friday)
Relational Algebra (part 2)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Announcements
Lab session next week to catch up with any of the exercise that you have
missed or to complete the practical requirements of assignment 9.
Next Wednesdays lecture (week 11) will be in LT2, Gisbett Kapp (G 8)
from 12:00 pm to 1:00 pm (last lecture)
No lecture on Friday (week 11).
Normalization hand-outs /exercise has been released on canvas.
Relational Algebra hand-outs /exercise has been released on canvas.
Module evaluation Forms are today (last 10 minutes of the lecture)

Reminder of Previous Lecture

Relational Algebra in a DBMS


Relational
algebra
expression

SQL
query

Optimized
Relational
algebra
expression

Query
execution
plan

Executable
code

Code
generator

parser

Query optimizer

DBMS

Relational Algebra Operations


Set operations from mathematical set theory (each relation is considered
as a set of tuples)
Set-difference ( ) Tuples in r1, but not in r2.
Union ( ) Tuples in r1 or in r2.
Intersection () Tuples in r1 and in r2.
Cross-product ( ) Allows us to combine two relations.
Operations developed specifically for relational databases
Selection ( s ) Selects a subset of rows from relation (horizontal).
Projection ( p ) Retains only wanted columns from relation (vertical).
Join ( ) Joining two relations.
Use of relational algebra operators on existing tables produces new tables

JOIN operation
Join (various types)
Allows us to join related rows from two or more tables
Its an important feature of the relational database idea

Joining has been implicitly important because of the use of mutli-table queries
and the use of WHERE to test for attribute equality between tables.
Denoted by

Cross-Join or Product
Perform a cross join that yields specified attributes

Note that the two


tables need not be
union compatible

Overview
Review on Condition Join
Equijoin Join
Natural Join
Outer Joins
Left
Right
Full

Condition-Join
(Relation 1

condition

Relation 2)

The JOIN operation can be specified as a PRODUCT operation


followed by a SELECT operation.
Fewer tuples than PRODUCT.
Sometimes called a theta-join.

Condition-Join Example
Retrieve the department names of employees who earn more than
40000 pounds
DName(EMPLOYEE
Dno=Dnumber AND Salary > 40000 DEPARTMENT)

Equijoin Join
The most common use of JOIN involves join conditions with equality
comparisons only.
The only comparison operator used is =,
Example of EQUIJOINs.

So Condition join is just Like equijoin but using a non-equality join


condition

Equijoin Join -Example


Retrieve the name of the manager of each department.

Note that, in the result of an EQUIJOIN we always have one or more pairs of
attributes that have identical values in every tuple.
Mgr_ssn and Ssn are identical in every tuple of DEPT_MGR (the EQUIJOIN
result)

Natural Join
Because one of each pair of attributes with identical values is
superfluous, a new operation called NATURAL JOIN was created to get
rid of the second (superfluous) attribute in an EQUIJOIN condition.
NATURAL JOIN is basically an EQUIJOIN followed by the removal of
the duplicate columns from the result
Used when tables share one or more common attributes with same
names.
SQL Syntax:
SELECT column-list FROM table1 NATURAL JOIN table2

Two Tables That Will Be Used


to Illustrate the execution of a Natural Join
(CUSTOMER

AGENT)

The common attributes or columns are called the join attributes

Step 1: PRODUCT

Note the two AGENT_CODE columns

Step 2: SELECT
to get equal agent codes in each row

SELECT is performed on the resulting table to yield only the rows for which the
join-attribute values (e.g. AGENT_CODE values) are equal

Step 3: PROJECT
to get just one agent column

PROJECT is now performed to yield a single copy of each join attribute,


thereby eliminating duplicate columns

What if the two tables have no attributes in common?

What if the two tables have no attributes in common?


So in this case the result is the PRODUCT (CROSS JOIN) of the two
tables!!

My Plan
Get the John Smith information
Get project numbers he works on
Get employee names working on these projects (including John
Smith)
Exclude John Smith from the final result

Note the denotes an AND operator

Last week answers

Note the denotes an AND operator

References
Source of images in this lecture:
Database Principles: Fundamentals of Design, Implementation and
Management, by Stephen Morris, Peter Rob, Carlos Coronel
Fundamentals of Database Systems by Ramez Elmasri, Shamkant B. Navathe

Fundamentals/ICY: (06-21923)/(06-21980)
Databases
2014/15
Week 11 (Wednesday)
Relational Algebra (part 3)
Shereen Fouad
Teaching Fellow
School of Computer Science
University of Birmingham, UK

Announcements
Lab session tomorrow to catch up with any of the exercise that you have
missed or to complete the practical requirements of assignment 9.
No lecture on Friday (week 11).
Today is the Last Lecture!!!

Reminder of Previous Lecture


Condition Join
Equijoin Join
Natural Join

Overview
Outer Joins
Left
Right
Full

Division (optional, not included in the exam)

Outer Joins
Developed for the case where the user wants to keep all the tuples in
Relation 1, or all those in Relation 2, or all those in both relations in
the result of the JOIN, regardless of whether or not they have
matching tuples in the other relation.
So.
Returns rows matching the join condition
Also returns rows with unmatched attribute values for tables to be
joined
Three types (Left, Right and Full)
Left and right designate order in which tables are processed

Outer Joins (continued)


Left outer join
Returns rows matching the join condition
Returns rows in left side table with unmatched values
Right outer join
Returns rows matching join condition
Returns rows in right side table with unmatched values
SQL Syntax:
SELECT column-list
FROM table1 LEFT [OUTER] JOIN table2 ON join-condition

Outer Joins (continued)


Full outer join
Returns rows matching join condition
Returns all rows with unmatched values in either side table
Syntax:
SELECT column-list
FROM table1 FULL [OUTER] JOIN table2
ON join-condition

Outer Join - Example

Left Outer Join


(CUSTOMER

AGENT)

Left Outer join of CUSTOMER and AGENT, using equal AGENT_CODE

Uses all the rows in the CUSTOMER table, by doing equijoin on AGENT_CODE
but also including NON-matching CUSTOMER rows.

Right Outer Join


(CUSTOMER

AGENT)

Left Outer join of CUSTOMER and AGENT, using equal AGENT_CODE

Uses all the rows in the AGENT table, doing equijoin on AGENT_CODE but also
including NON-matching AGENT rows.

Full Outer Join (

Would have the extra row of this table as well as the extra row of
the Left Outer Join table

Using all the rows in the AGENT and CUSTOMER tables, doing equijoin on
AGENT_CODE but also including NON-matching rows from each table.
= Union of Left Outer Join result and Right Outer Join result.

list of all employee names as well as the name of the departments they manage if
they happen to manage a department; if they do not manage one, we can indicate it
with a NULL value

list of all employee names as well as the name of the departments they manage if
they happen to manage a department; if they do not manage one, we can indicate it
with a NULL value

Following stuff on DIVIDE is optional

Division
Goal: Produce the tuples in one relation, r, that match all tuples in
another relation, s
r (A1, An, B1, Bm)
s (B1 Bm)
r/s, with attributes A1, An, is the set of all tuples <a> such that for every
tuple <b> in s, <a,b> is in r

Can be expressed in terms of projection, set difference, and crossproduct

DIVIDE operation on DB tables


Simplest case: 2-col table by 1-col table (T/S)
Q
T
S

The only value of LOC that is associated in T with both values A and B of
CODE is 5.

Division - Example
Student_Records (StudId, CrsCode, Semester, Grade)
Teaching (ProfId, CrsCode, Semester)
List the Ids of students who have passed all courses that were taught
in summer 2013
Numerator: StudId and CrsCode for every course passed by every
student
StudId, CrsCode (Grade F (Student_Records ) )

Denominator: CrsCode of all courses taught in summer 2013


CrsCode (Semester=S2013 (Teaching) )

Result is numerator/denominator

References
Source of images in this lecture:
Database Principles: Fundamentals of Design, Implementation and
Management, by Stephen Morris, Peter Rob, Carlos Coronel
Fundamentals of Database Systems by Ramez Elmasri, Shamkant B. Navathe

THANK YOU
GOOD LUCK

You might also like