You are on page 1of 98

FBE Computer Science Department Lecture Notes Theory of Databases

ICT 252: Theory of Databases


Lecture Notes
Contents
1 Introduction to Databases.............................................................................................................4
1.1 Course Overvie....................................................................................................................!
1." #ntro$uction............................................................................................................................!
1.".1 %hat is a $atabase&............................................................................................................!
1."." %hat is a DB'S&..............................................................................................................(
1.".) %hy use $atabases.............................................................................................................(
2 Database Models.............................................................................................................................6
".1 Fi*e+,rocessin- Systems..........................................................................................................
"." Netor/ Navi-ation Systems.................................................................................................0
".) 1e*ationa* 'o$e*..................................................................................................................12
".! Ob3ect+oriente$ 'o$e*..........................................................................................................1"
".( De$uctive 'o$e*..................................................................................................................1"
".. Summary...............................................................................................................................1)
Co!"onents of a Database #""lication.....................................................................................1
).1 Data #n$epen$ence...............................................................................................................1)
)." Different Types of 4ser........................................................................................................1!
).) Three+Tier 5pp*ication 5rchitecture.....................................................................................1(
).! Features of a DB'S.............................................................................................................10
).( Types of DB'S....................................................................................................................10
4 The $elational Model...................................................................................................................1%
!.1 Tab*es 6 1e*ations................................................................................................................17
!." Domains 6 5ttributes...........................................................................................................18
!.)........................................................................................................................................................18
!.).1 'ore about attributes.......................................................................................................18
!.! Schemas................................................................................................................................"2
!.( 1e*ationships........................................................................................................................""
!.. 9eys......................................................................................................................................"!
!...1 Super/eys........................................................................................................................."!
!..." Can$i$ate 9eys................................................................................................................"!
!...) ,rimary /ey......................................................................................................................"(
!.0 Constraints............................................................................................................................"(
!.0.1 Functiona* Depen$encies.................................................................................................".
!.0." Entity #nte-rity.................................................................................................................".
!.0.) 1eferentia* #nte-rity.........................................................................................................".
!.0.! Tri--ers............................................................................................................................"0
5 $elational #l&ebra........................................................................................................................2%
(.1 Basic Operations..................................................................................................................."8
(.1.1 Se*ect................................................................................................................................"8
(.1." ,ro3ect.............................................................................................................................."8
(.1.) 4nion...............................................................................................................................)1
(.1.! Set Difference..................................................................................................................)"
(.1.( Cartesian ,ro$uct.............................................................................................................)"
(.1.. 1ename............................................................................................................................)(
(." 5$$itiona* Operations...........................................................................................................).
(.".1 Set #ntersection................................................................................................................).
(."." Natura* :oin......................................................................................................................)0
(.".) Division............................................................................................................................)8
(.".! 5ssi-nment......................................................................................................................!"
(.) E;ten$e$ 1e*ationa*+5*-ebra Operations.............................................................................!"
(.).1 <enera*ise$ ,ro3ection.....................................................................................................!)
(.)." 5--re-ate Functions........................................................................................................!)
(.).) Outer :oin.........................................................................................................................!(
(.! Database 'o$ifications........................................................................................................!0
(.!.1 De*etion............................................................................................................................!0
,a-e 1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
(.!." #nsertion...........................................................................................................................!7
(.!.) 4p$ate..............................................................................................................................!8
6 'ntity($elationshi" Modellin&....................................................................................................5)
..1 #ntro$uction..........................................................................................................................(2
.." Overvie Entities= 5ttributes= 1e*ationships.....................................................................(1
..) #$entify Entity Types............................................................................................................("
..! #$entify 5ttributes of the Entities.........................................................................................("
..( Se*ect #$entifiers for Entity Types........................................................................................()
... #$entify 1e*ationships Beteen Entities...............................................................................(!
....1 1e*ationship De-ree.........................................................................................................(!
...." 1e*ationship Car$ina*ity..................................................................................................((
..0 %ea/ Entities........................................................................................................................(7
..7 5ssociative Entities..............................................................................................................(8
..8 <enera*isation >ierarchies....................................................................................................2
..12 5pproach to E+1 Dia-rams...................................................................................................1
..11 Convert an E+1 Dia-ram to a 1e*ationa* Database Desi-n..................................................."
..1" Norma*isation of the Data 'o$e*..........................................................................................!
..1) ,hysica* Database Desi-n......................................................................................................!
..1! 4'L......................................................................................................................................!
* +hysical Database Desi&n............................................................................................................64
0.1 Overvie ,hysica* DB Desi-n...........................................................................................(
0." Choosin- Data Types..............................................................................................................
0.".1 Ca*cu*ate$ Fie*$s................................................................................................................
0."." Co$in-?Compression.........................................................................................................
0.) Contro**in- Data #nte-rity......................................................................................................
0.).1 Defau*t va*ues....................................................................................................................
0.)." 1an-e Contro*...................................................................................................................0
0.).) 1eferentia* #nte-rity..........................................................................................................0
0.).! Nu** @a*ue Contro*............................................................................................................0
% Inde,in&.........................................................................................................................................6*
7.1 ,hysica* vs Lo-ica* vies of $ata..........................................................................................0
7." Fi*e Structures........................................................................................................................8
7.".1 SeAuentia* fi*es.................................................................................................................02
7."." >ash fi*es.........................................................................................................................02
7.) #n$e;+SeAuentia* Fi*e Or-aniBation......................................................................................01
7.).1 %hat is an in$e;&.............................................................................................................01
7.)." ,rimary?C*usterin- #n$e;.................................................................................................01
7.).) Secon$ary?Non+c*usterin- #n$e;.....................................................................................0)
7.).! Dense 6 Sparse #n$ices...................................................................................................0)
7.! 'o$ifyin- #n$e;+SeAuentia* fi*es Cinsert= up$ate= $e*eteD....................................................0!
7.!.1 De*ete from the $ata fi*e..................................................................................................0!
7.!." #nsert to the $ata fi*e........................................................................................................0(
7.!.) 4p$ate the $ata fi*e..........................................................................................................0(
7.!.! 5*-orithm for insertin- to the in$e;................................................................................0(
7.!.( 5*-orithm for $e*etin- from the in$e;.............................................................................0(
7.( 'u*ti*eve* #n$ices.................................................................................................................0.
7.. Summary #n$e;+SeAuentia* Fi*e Or-aniBation..................................................................00
7.0 Binary Search Trees + 1ecap................................................................................................00
7.0.1 Binary Search Tree as an #n$e;.......................................................................................07
7.7 '+ay search trees...............................................................................................................08
7.8 B Trees..................................................................................................................................72
7.12 #n$e;e$ SeAuentia* Fi*es + BE Tree......................................................................................71
7.11 #nsertin- 6 De*etin- to?from a BE Tree...............................................................................7"
7.1" Summary of in$e; fi*e structures..........................................................................................7"
7.1) #n$ices on 'u*tip*e 9eys......................................................................................................7!
7.1! Enforcin- 4niAueness ith an #n$e;....................................................................................7!
7.1( %hy use in$e;es 6 choosin- fie*$s to in$e;.......................................................................7!
7.1(.1 Choosin- in$ices.........................................................................................................7!
7.1(." Fuery OptimiBation ho in$e;es are use$..............................................................7(
7.1(.) 4se of Statistics...........................................................................................................70
,a-e " of 87
FBE Computer Science Department Lecture Notes Theory of Databases
7.1. Creatin- #n$e;es ith SFL..................................................................................................70
7.10 FuiB......................................................................................................................................77
- ./L................................................................................................................................................%%
1) Introduction to Transactions 0 Concurrency...........................................................................%%
12.1 #ntro$uction..........................................................................................................................77
12." 5C#D ,roperties...................................................................................................................78
12.) DB'S Services for Transactions.........................................................................................81
12.).1 5tomicity....................................................................................................................81
12.)." Consistency.................................................................................................................8"
12.).) #so*ation.......................................................................................................................8"
12.).! Durabi*ity....................................................................................................................8"
12.! Concurrency Contro*............................................................................................................8)
12.( Loc/s....................................................................................................................................8!
12.. #mp*icit Transactions............................................................................................................8!
12.0 Demo in Lab.........................................................................................................................8(
11 Introduction to .ecurity...............................................................................................................-5
,a-e ) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
1 Introduction to Databases
1b2ecti3eG To intro$uce stu$ents to the course= to *earn hat is a $atabase an$ hat is
a DB'S.
$eadin& MaterialG
+re"arationG have copies of the course out*ine rea$y to -ive to stu$ents Cor have
a*rea$y -iven to themD.
1.1 Course 13er3ie4
1evie the course out*ine.
Discuss course te;ts emphasise hat boo/s are avai*ab*e in the *ibrary an$ hat
ones they can -et from the boo/store.
Emphasise importance of rea$in- boo/s i** not be provi$in- $etai*e$ han$outs as
the te;ts are avai*ab*e.
Hou shou*$ a*so ma/e sure to atten$ *ectures an$ ta/e notes as # i** e;p*ain
concepts to you an$ then you can rea$ more about them in the te;tboo/s.
5s you $o your rea$in-= you shou*$ try to !a5e notes too bein- a university stu$ent
is not 3ust about copyin- thin-s from the boar$ you must thin/ an$ ana*yse yourse*fI
1.2 Introduction
1.2.1 6hat is a database7
5s/ c*ass e*icit i$eas rite up on boar$.
Can they -ive e;amp*es of a $atabase&
Thin/ about a *ist of names 6 phone numbers for e;amp*e= the phone boo/= your
mobi*e phone= even a *ist ritten on a piece of paper.
Thin/ about the *ibrary Cant them to reco-niBe that the car$ cata*o-ue is a $atabase
arran-e$ by author an$ by tit*eD.
Data in a $atabase is re*ate$ in some ay a co**ection of ran$om $ata is not rea**y a
$atabase in the true sense of the or$.
%hen finishe$= summarise an$ intro$uce this $efinition of a $atabaseG
5 $atabase is a co**ection of related data.
This is Auite a *oose $efinition of $atabase. For e;amp*e a** the or$s on a printe$
pa-e of te;t cou*$ be seen as a $atabase. For our purposes= the $efinition of a $atabase
is more restricte$.
For this course= e are *oo/in- at $atabases that are use$ most*y in or-aniBations. #n
this sense= a $atabase has the fo**oin- propertiesG
1. 5 $atabase represents some aspect of the rea* or*$ this is sometimes ca**e$
the miniworld.
,a-e ! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
". 5 $atabase is a co**ection of $ata that is *o-ica**y coherent in other or$s= it
is not 3ust ran$om $ata put to-ether there is some connection beteen the
$ifferent $ata in the $atabase.
). 5 $atabase is $esi-ne$= bui*t an$ popu*ate$ ith $ata for a specific purpose. #t
has an intended group of users an$ some applications CusesD in hich those
users are intereste$.
To summariseG a $atabase has some source that provi$es its $ata= it interacts ith
events in the rea* or*$ an$ it has an intereste$ au$ience of users.
The siBe an$ comp*e;ity of a $atabase varies a *ot. Hour *ist of names an$ phone
numbers may be *ess than 122 recor$s of $ata= each recor$ havin- a simp*e structure.
The re-istrarJs office of the university ou*$ have a more comp*e; $atabase that
ref*ects the *in/s beteen stu$ents= courses= teachers an$ so on ith 1222s of
recor$s in it.
1.2.2 6hat is a D8M.7
DB'S Database 'ana-ement System.
5s/ the c*ass hat $o you thin/ a DB'S is&
5 DB'S is a set of pro-rams that enab*es users to create a $atabase an$ access the
$ata in the $atabase.
5 DB'S a**os the user to $efine the structure of a $atabase= put the $ata into the
$atabase an$ to manipu*ate the $ata in the $atabase.
There are various DB'S pac/a-es avai*ab*e to buy or as freeare. They a*so provi$e
other functions= such as ays to contro* ho can access the $ata.
Hou cou*$ a*so rite your on DB'S app*ication Ce.-. usin- CEE= @B or :avaD
because $ata is store$ in structures that you can manipu*ate pro-rammatica**y.
But hy reinvent the hee* every time&&
So= most $atabase app*ications are create$ usin- an e;istin- DB'S so you $onJt
have to orry too much about un$er*yin- $ata structures an$ ho to or/ ith them.
%e can ca** the $atabase an$ the DB'S to-ether a database syste!.
%e i** start usin- the 'icrosoft 5ccess $atabase in the *abs an$ e i** a*so *oo/ at
'icrosoft SFL Server *ater.
'ost DB'Ss no have a c*ient+server architecture so $ata can easi*y be share$ on a
netor/. The $atabase resi$es on the server an$ the c*ients access the $b usin- specia*
softare app*ications.
%e i** *oo/ at the functions of a DB'S an$ some types of $atabase system in more
$etai* a *itt*e *ater in the course.
1.2. 6hy use databases
Hou te** me no that you /no hat a $atabase is= hy $o you thin/ e shou*$ use
$atabases&
,a-e ( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Some reasonsG
To or-aniBe information
To be ab*e to -et reports from $ata
To protect $ata the security features of a $atabase a**o you to specify ho
has access= hat $ata they can see an$ hat they can $o ith the $ata
To be ab*e to share $ata
2 Database Models
1b2ecti3eG to brief*y $escribe the evo*ution of $atabase systems an$ to un$erstan$
some of the main $ata mo$e*s use$ an$ no in use.
$eadin&G ,ost Chapter 1K 'annino Chapter 1.
+re"arationG maybe photocopy Chapter ! of Everest CLo-ica* Data StructuresD
for e;tra rea$in-= may ai$ stu$entsJ un$erstan$in- of the $ifferent mo$e*s.
There are $ifferent ays of storin- the $ata in a $atabase. The ay you are probab*y
most fami*iar ith is storin- $ata as ros in a tab*e e.-. a *ist of names= a$$resses an$
phone numbers in E;ce*.
Lto i**ustrate $ra a simp*e tab*e on the boar$ ith co*umn tit*es an$ a coup*e of
ros of $ataM
This is the basis of one $atabase mo$e* the re*ationa* mo$e*.
There are other mo$e*s for $atabases. The $ata mo$e* is a ay of $escribin- the $ata
structures use$ in the $atabase.
Over time= $atabase techno*o-y has evo*ve$ Csho on boar$ *i/e this ith a simp*e
time*ineDG
18.2s Fi*e+base$ mo$e*s
1802s Netor/ navi-ation mo$e*s Cnetor/ 6 hierarchica* mo$e*sD
1872s 1e*ationa* mo$e*
1882s Ob3ect mo$e*s
No= many $atabase systems use the re*ationa* mo$e* or ob3ect+oriente$ mo$e*s.
To -ive you an i$ea of ho $atabase systems have evo*ve$= e i** *oo/ at some of
the ear*ier mo$e*s in brief.
5fter this= the course i** focus on the re*ationa* mo$e*. 'ost $atabase systems no
use the re*ationa* mo$e* a*thou-h OO $atabases are becomin- more i$e*y use$.
2.1 9ile(+rocessin& .yste!s
The ear*iest $atabase systems $urin- the 18.2s ere rea**y 3ust fi*e+processin-
systems. Data as store$ in flat text files in the operatin- system.
Data as a*so -roupe$ into records e.-. a recor$ of Customer $ata in a ban/ mi-ht
*oo/ *i/e thisG
Number= Name= FathersName= ,honeNumber= 5$$ress= Savin-s5ccountNumber=
Savin-s5ccountBa*ance= Current5ccountNumber= Current5ccountBa*ance
,a-e . of 87
FBE Computer Science Department Lecture Notes Theory of Databases
1")=Tesfay= 9infe= 2! !20.22= 9ebe*e 0 'e/e**e= 1")!= 1222= 870.= (!..(2
This $ata ou*$ be store$ in a f*at te;t fi*e on the computer. 5 fi*e contains $ata about
one entity e.-. Customer is an entity. Each recor$ in the fi*e represents an instance of
that entity.
Each ro is a recor$K the fi*e represents a recor$ type.
5 recor$ is a fun$amenta* unit in any $atabase system in fact= e i** see *ater that
the $ata mo$e*s use$ no sti** store $ata in recor$s= but in $ifferent $ata structures.
,ro-rammers ha$ to rite specific pro-rams to carry out tas/s e.-. to retrieve a**
customer recor$s or to fin$ the customer recor$ for a -iven customer number.
Different pro-rams ha$ to be ritten for each tas/ so a *ot of or/ as invo*ve$.
#t cou*$ a*so *ea$ to $ata bein- $up*icate$ in $ifferent fi*es e.-. thin/ of a ban/Js
$atabase. 5 customerJs name an$ phone number cou*$ appear in a fi*e containin- a**
savin-s accounts. #f the same customer a*so has a current account= their name an$
phone number ou*$ a*so appear in the fi*e containin- a** current accounts.
#f the customer chan-es their phone number it has to be chan-e$ in both fi*es.
O1G if a customer has " savin-s accounts nee$ " ros in the fi*e name an$ phone
number are repeate$.
This system or/e$= but $i$nJt a**o $ata to be re*ate$ e.-. to have the customer
name an$ phone number in one p*ace= an$ re*ate a savin-s account to it an$ a current
account to it.
2.2 Net4or5 Na3i&ation .yste!s
The ne;t -eneration of systems $urin- the 1802s came about as $eve*opers
reco-niBe$ that $ata in a system as re*ate$ to other $ata in the same system. This
cou*$ be mo$e**e$ in hierarchies or netor/s.
For e;amp*e= in our ban/ system if e use$ a hierarchical !odel= the customer
recor$ type ou*$ be at the top of the hierarchy. Li/e thisG
The *e-s on the *in/s beteen the recor$ types in$icate a 1+to+many re*ationship Ctry
to e*icit this from stu$ents they may a*rea$y have seen re*ationships in the *abD. This
is the hierarchica* structure. 5 Customer has 5ccounts. 5n 5ccount contains
transactions.
,a-e 0 of 87
Transaction
5ccount
Customer
has
contains
FBE Computer Science Department Lecture Notes Theory of Databases
LTo e;p*ain further compare to a fi*in- cabinet a section for each customerK insi$e
the section for a Customer= there is a fo*$er for each 5ccount they have. #nsi$e the
5ccount fo*$er= $etai*s of each transaction ma$e in the account are store$.M
The $ata is or-aniBe$ into a tree structure. The tree consists of $ata records Cas beforeD
an$ links beteen them.
Customer is a recor$ type each of the ban/Js customers has a recor$ of type
Customer.
5 recor$ is a co**ection of attributes or fie*$s e.-. for Customer= the fie*$s mi-ht be
Name= ,hone= 5$$ress. For 5ccount= the fie*$s mi-ht be type Csavin-s or currentD=
account number an$ ba*ance.
The tree represents parent-child relationships one Customer can have many
5ccount recor$s associate$ ith it e.-. a savin-s 5ccount an$ a current 5ccount. 5n
5ccount has Transactions associate$ ith it Ce.-. $eposit money= ith$ra moneyD.
To he*p un$erstan$in- $ra a tree shoin- a particu*ar customer an$ his?her
accounts *i/e thisG
Transaction 1 Transaction " Transaction ) Transaction ! Transaction (
The $ata is sti** store$ in a f*at fi*eN.*i/e previous*yNbut no the f*at fi*e has )
$ifferent recor$ types in it an$ the recor$ types are re*ate$G
CustomerG 1")= So*omon= Tesfay= 2! !20.22= 9ebe*e 0=
5ccountG Savin-s= 1")!(.= (!22
TransactionGN..
TransactionGN.
5ccountG Current="07.(!= "222
CustomerG NNNNN
There are sti** prob*ems ith this mo$e* a chi*$ no$e in the tree cannot have more
than one parent.
So if the same account is associate$ ith " customers Ce.-. a husban$ an$ ife have a
3oint account an$ a*so separate accountsD= e cannot *in/ the one account recor$ to "
$ifferent customer recor$s.
But e can have the account recor$ appearin- in to tree branches. This can *ea$ to
$up*icate$ $ata an$ inconsistent $ata= if the account is not up$ate$ in a** the
branches.
,a-e 7 of 87
So*omon Tesfay 2! 1")!(. 9ebe*e 0
Savin-s 1")!(. (!22 Current "07.(! "222
FBE Computer Science Department Lecture Notes Theory of Databases
5 pro-rammer sti** has to rite the pro-ram to access the account information= but
no the pro-ram navi-ates throu-h the hierarchy e.-. to fin$ the ba*ance of the
savin-s account for a -iven customer number.
5ccessin- $ata in this type of structure is fast but on*y if you are accessin- $ata
from the top e.-. by customer name or number. #f you ant to fin$ a** savin-s
accounts ith a ba*ance -reater than 1222= you have to rite a pro-ram that accesses
each customer an$ then the savin-s account $ata for each customer.
The net4or5 !odel Cnothin- to $o ith L5ND a*so uses recor$s an$ *in/s but in a
$ifferent ay.
#nstea$ of puttin- $ata in a hierarchy= it structures $ata in a netor/ *i/e thisG
Entry point Entry point
5** the points in the netor/ are *in/e$ to-ether= in a chain.
So the netor/ for the same customer a-ain ou*$ be *i/e thisG
#t is a netor/ of *in/s beteen the $ifferent recor$s. 5-ain= Customer= 5ccount an$
Transaction are record types.
1e*ationships are mo$e**e$ usin- sets. #n a set= there is one oner recor$ type
CCustomerD an$ 1 or more member recor$ types C5ccountD.
This is a set type= *et us name it Customer5ccount. #t mo$e*s the re*ationship beteen
Customer 6 5ccount that a Customer can have 1 or more accounts.
There i** be many occurrences of the Customer5ccount set in the $atabase one for
each Customer. 5 set occurrence re*ates one recor$ from the oner recor$ type
CCustomerD to the set of recor$s from the member recor$ type re*ate$ to it C5ccountD.
For each Customer5ccount set= there is a netor/ of *in/s from the Customer recor$
to the 5ccount recor$s.
The $ata is sti** store$ in fi*es but no e a*so have to $efine the set types.
/uestion to c*assG Can you i$entify another set type in this mo$e*& 5G 5ccount+
Transaction an account has 2 or more transactions.
The entry points are recor$s that can be searche$. For e;amp*e= to fin$ a** savin-s
accounts ith a ba*ance over 1222= a pro-ram can enter the netor/ at the 5ccount
entry point an$ then search throu-h a** the 5ccount recor$s. 5n entry point is
imp*emente$ as an in$e; on a *ist of recor$s so the in$e; must be create$ before
such Aueries can be run a-ainst the $atabase.
,a-e 8 of 87
Customer 5ccount Transaction
Entry point
So*omon Tesfay 2! 1")!(. 9ebe*e 0
Savin-s 1")!(. (!22
Current "07.(! "222
Trans1 Trans"
FBE Computer Science Department Lecture Notes Theory of Databases
L%hat is an in$e;& #t is a *ist of a** possib*e va*ues in a particu*ar fie*$ e.-. a** the
ba*ance va*ues. Each va*ue in the *ist has pointers to a** the recor$s that have that
va*ue in the fie*$. %e i** *earn more about in$e;in- *ater in the course.M
The arros can be fo**oe$ by the DB'S pro-rams to fin$ matchin- $ata e.-. to fin$
the savin-s account that a transaction be*on-s toK to fin$ the customer ho is the
oner of the savin-s account.
The arros represent efficient connections beteen $ata e*ements.
/uestionG in terms of $ata structures can you thin/ of ho this cou*$ be
imp*emente$ Ca ay of *in/in- recor$s in a *ist or setD& 5G pointers Cif they have $one
the Data Structures course= they shou*$ be ab*e to fi-ure this outD.
The arros are actua**y imp*emente$ as pointers embe$$e$ in the fi*es. Each recor$
has a pointer to the ne;t an$ previous recor$ in the netor/.
1emember a pointer points to a stora-e *ocation.
Both fi*e an$ netor/ type systems use$ proce$ura* pro-rammin- to access $ata
*an-ua-es here the co$e ha$ step by step instructions of hat to $o ith the $ata.
They store $ata in fi*es= consistin- of recor$s of $ata thin/ of a te;t fi*e that you
type characters into.
The $eve*opment of the netor/ mo$e* as he*pe$ by a committee of e;perts ho
$efine$ a DDL CData Definition Lan-ua-eD an$ a D'L CData 'anipu*ation
Lan-ua-eD for the mo$e*. The DDL as $esi-ne$ to be in$epen$ent of the *an-ua-e
bein- use$ to manipu*ate the $ata. The D'L as inc*u$e$ in $ifferent *an-ua-es e.-.
ith FO1T15N.
The notion of a DDL an$ a D'L are sti** in use no= in SFL= as e i** see *ater.
Durin- the 1872s= another type of mo$e* evo*ve$ that use$ nonproce$ura* *an-ua-es
to access $ata.
2. $elational Model
This type of mo$e* is the re*ationa* mo$e*= hich e are -oin- to focus on in this
course.
1e*ationa* approach ori-inate$ by E.F. Co$$ in the 1802s= became i$e*y use$ in
the 1872s an$ 1882s.
Base$ on mathematics re*ationa* a*-ebra.
Softare pro-rammers $eve*ope$ co$e that as efficient at $oin- itN.har$are
improve$ in a ay that ma$e it a** possib*e= so the re*ationa* mo$e* became the most
i$e*y use$.
#n a re*ationa* $atabase= $ata is store$ in tab*es Ca*so ca**e$ re*ationsD. Each tab*e is
physica**y separate to other tab*es in the system un*i/e netor/ or hierarchy mo$e*s=
here there must be physica* fi*e *in/s beteen $ata sets.
5 tab*e stores $ata about one specific entity in the mini+or*$ represente$ by the
$atabase e.-. about Customers. 5 ro in a tab*e represents an instance of the entity i.e.
a ro in the Customers tab*e represents one Customer recor$. 5 ro in the 5ccounts
tab*e represents one 5ccount recor$.
,a-e 12 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The co*umns in a tab*e represent attributes of the entity. Therefore the Customers
tab*e ou*$ have co*umns for name= phone number= a$$ress.
There are no physica* *in/s beteen tab*es instea$= the *in/s beteen the $ata are
mo$e**e$ by storin- matchin- $ata in each tab*e. For e;amp*e= e can store the
uniAue Customer #D number ith each 5ccount recor$ so Customer Name becomes
an attribute or co*umn in the 5ccounts tab*e.
Customers
Custo!erID Na!e +honeNu!ber #ddress
1 So*omon Tesfay 2! 1")!(. 9ebe*e 0
" N.
5ccounts
Ty"e Nu!ber 8alance Custo!erID
Savin-s 1")!(. (!22 1
Current "07.(! "222 1
/uestionG hy $o you thin/ e ou*$ use a Customer#D instea$ of Customer Name
in the 5ccounts tab*e&
#G try to -et stu$ents to thin/ about this hat if to $ifferent customers have the
same name& The anser is that e can ensure that Customer#D is $ifferent or uniAue
for each customer e cannot ma/e them a** have $ifferent names. 'ention that this
is the i$ea of keys in re*ationa* $atabase tab*es e i** *earn more about this soon.
5s/ them hat $o they thin/ is the uniAue va*ue for 5ccounts. 5G Number.
The bi- a$vanta-e of the re*ationa* mo$e* is that if the $ata is e**+$esi-ne$= any
Aueries can be ansere$ ithout ritin- specific pro-rams this is $ifferent to the
netor/ an$ hierarchica* mo$e*s= here the $ata $esi-n ou*$ have to ta/e into
account hat Aueries i** be ma$e on the $ata.
The SFL CStructure$ Fuery Lan-ua-eD as $eve*ope$ to $efine an$ manipu*ate $ata
in re*ationa* $atabase tab*es if the $ata is e**+$esi-ne$= a*most any possib*e Auery
on the $ata can be ansere$ usin- SFL.
SFL has DDL 6 D'L e*ements.
For e;amp*e to fin$ a** accounts ith a ba*ance of more than 1222 or to fin$ the
customer phone number for the account number 1")!(..
SFL is not a proce$ura* *an-ua-e it is declarative. That means you $o not have to
rite co$e that specifies how to $o the or/. Hou simp*y state what you ant from
the $atabase an$ the DB'S en-ine $oes the rest of the or/. This ma/es it a very
poerfu* *an-ua-e *ess effort is reAuire$ to -et more resu*ts= compare$ to o*$er
$atabase mo$e*s. #t is a*so easier for pro-rammers to *earn.
2.4 1b2ect(oriented Model
OO COb3ect+oriente$D concepts starte$ out bein- use$ for pro-rammin- :ava is an
OO *an-ua-e= as is CEE.
,a-e 11 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The OO approach of $efinin- ob3ects that can be use$ in many pro-rams is no a*so
bein- app*ie$ to $atabase systems.
5n ob3ect can have properties Cor attributesD but a*so behaviour= hich is mo$e**e$ in
metho$s CfunctionsD in the ob3ect.
#n an OO $b= each type of ob3ect in the $atabaseJs mini+or*$ is mo$e**e$ by a c*ass
Customer c*ass= 5ccount c*ass *i/e tab*es in the re*ationa* mo$e*. 5 c*ass has
properties CattributesD.
5 c*ass a*so has metho$s that are store$ ith the c*ass $efinition e.-. the co$e to
create a ne Customer ob3ect be*on-s in the Customer c*ass. %hen an app*ication is
or/in- ith $ata in the $atabase= the app creates CinstantiatesD ob3ects from the c*ass
$efinitions.
One a$vanta-e of the OO mo$e* is sub+c*asses. 5s there are $ifferent types of
account= they can be mo$e**e$ as sub+c*asses of the 5ccount c*ass Savin-s5ccount
an$ Current5ccount. This ma/es sense because the $ifferent account types have some
$ifferent behaviour e.-. -ainin- interest in a savin-s account but some behaviour the
same e.-. *o$-in- or ith$rain- cash. This is the inheritance concept of OO
pro-rammin-.
Dia-ram c*ass name at the top= properties in the mi$$*e= metho$s at the bottom.
Hou shou*$ be *earnin- the i$eas of OO in your :ava course #nternet ,ro-rammin-.
%e i** not cover the OO $ata mo$e* in this course= but you shou*$ be aare that it
e;ists. 'any $eve*opers are usin- OO $atabases no= but the re*ationa* mo$e* is sti**
very i$e*y use$ an$ probab*y the most i$e*y use$.
2.5 Deducti3e Model
5nother mo$e* for $atabases is the $e$uctive mo$e*. #n a $e$uctive $b system= ru*es
can be $efine$ the ru*es deduce or infer a$$itiona* information from the facts store$
in the $atabase.
De$uctive $atabases are rea**y a type of /no*e$-e base= use$ in the area of 5#
C5rtificia* #nte**i-enceD.
5 $e$uctive $b has facts an$ rules in it. Facts are store$ simi*ar to re*ations in a
re*ationa* $atabase= but attribute names are not necessary.
.
1u*es are specifications that can be app*ie$ to the facts to pro$uce ne information.
The ru*es are $efine$ usin- a $ec*arative *an-ua-e Chat= rather than hoND.
The system has an inference engine that $e$uces ne facts from the $b by interpretin-
the ru*es.
,a-e 1" of 87
Customer
Customer#D
Name
,honeNumber
5$$ress
neCustomer
removeCustomer
N
5ccount
5ccountNumber
Ba*ance
Customer#D
*o$-e'oney
ith$ra'oney
Savin-s5ccount
#nterest1ate
Current5ccount
Over$raftLimit
cashCheAue
FBE Computer Science Department Lecture Notes Theory of Databases
The $e$uctive mo$e* is c*ose*y re*ate$ to the re*ationa* mo$e*K it a*so has its basis in a
branch of mathematics C$omain re*ationa* ca*cu*usD.
#n a $e$uctive $atabase system= the emphasis is on $erivin- ne /no*e$-e from
e;istin- $ata by supp*yin- ru*es base$ on /no*e$-e of the rea* or*$.
2.6 .u!!ary
The $ata mo$e*s e have $iscusse$ Cname themND are mo$e*s for imp*ementation of
$atabases. These mo$e*s are not very usefu* for mo$e**in- $ata in a ay that en$+
users of a system un$erstan$ they are more about ho $ata is store$ on the
computer.
But e a*so have conceptua* $ata mo$e*s these provi$e ays of mo$e**in- $ata that
are c*ose to the ay en$+users perceive the $ata in their system. One of these is
Entity+1e*ationship mo$e**in-= hich e i** cover *ater in this course.
E1 mo$e**in- is often use$ as a step in $esi-nin- a re*ationa* $atabase.
The *an-ua-es use$ to access $ata in $atabases have a*so evo*ve$. For the ear*ier
mo$e*s= proce$ura* *an-ua-es ere necessary co$e that ha$ a** the steps nee$e$ to
access or process $ata. The co$e ha$ to inc*u$e *oops to step throu-h a** the recor$s in
a set= for e;amp*e.
The re*ationa* an$ OO mo$e*s have $ec*arative Cnon+proce$ura*D *an-ua-es to or/
ith $ata. This type of *an-ua-e is easier for pro-rammers to *earn you on*y have to
rite statements that say hat you ant to $o= not ho to $o it. For e;amp*e= in SFL=
you can as/ to -et a *ist of a** the recor$s in a tab*e. Hou $o not have to rite the *oop
that -oes throu-h the tab*e to rea$ each of the co*umn va*ues for each ro.
Co!"onents of a Database #""lication
1b2ecti3eG to /no hat are the ma3or components of a $atabase app*ication an$
hat are the main features of a DB'S.
$eadin& !aterialG Si*berschatB et a* Chapter 1= sections 1.7 6 1.8
+re"arationG photocopy?print $ia-ram of )+tier architecture C>an$out 1DK maybe a*so
$ia-ram on ,- 18 of 'annino Cuser typesD stu$ents can rite on notes on these.
Before e move on to *oo/ at re*ationa* $atabases= e i** first *oo/ at the ma3or
components of a $atabase app*ication.
.1 Data Inde"endence
Ear*y systems Cfi*e processin-= hierarchica*= netor/D c*ose *in/ beteen $atabase
an$ pro-rams to access it $efinition of $atabase as a part of the pro-rams
accessin- it.
Conceptua* C$ata $efinitionsD not separate from physica* stora-e on $is/ $ata store$
in recor$s insi$e fi*es.
,a-e 1) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
,rob*em ith thisG
Chan-es to $b $efinitions OP chan-in- a** co$e that accesses the $ata a *ot of
inspection of co$e to ma/e a** the necessary chan-es.
E;pensive manua* or/.
,erformance tunin- ma/in- chan-es to a $atabase to ma/e it run faster
ou*$ have to recompi*e *ots of pro-rams for one chan-e.
This *e$ to the concept of data inde"endence in $atabase systems $ata $efinitions
shou*$ be separate from app*ications?pro-rams that use the $ata. Overa**= this ma/es
$atabases easier to maintain an$ to optimise. Lasi$e stu$ents may have ta*/e$ about
$ata+oriente$ vs. process+oriente$ approach in S5D $ata in$epen$ence is somethin-
that came about as the approach became more $ata+oriente$M.
Database $efinitions are part of the $atabase sche!a. The schema is a $escription of
the $atabase inc*u$in- hat tab*es are in the $atabase= the co*umns in each tab*e= the
$ata types for the co*umns an$ other information.
The schema is specifie$ hen $esi-nin- the $atabase an$ can usua**y be shon in
$ia-ram form by the DB'S Cin SFL Server *oo/ in Dia-rams in a $atabaseK in
5ccess *oo/ in 1e*ationshipsD.
The schema can be chan-e$ ithout affectin- e;istin- $ata or pro-rams that access
the $ata.
For e;amp*e to a$$ a ne co*umn to a tab*e StartDate to Emp*oyees can chan-e
the schema an$ e;istin- pro-rams sti** or/. On*y nee$ to chan-e pro-rams that nee$
to access the ne co*umn.
.2 Different Ty"es of :ser
5nother factor that inf*uences the architecture of $atabase app*ications $ifferent
types of user.
Ban/ e;amp*e can you thin/ of $ifferent -roups of peop*e ho ou*$ use the
system thin/ about hat they nee$ to be ab*e to $o Las/ stu$ents to thin/ about this
in pairs for ( minutesM.
'ana-ers vie reports
Te**ers?cashiers carry out transactions= open ne accounts= c*ose accounts
Customers vie their on accounts
,ro-rammers rite ne pro-rams= chan-e e;istin- pro-rams
Database a$ministrator chan-e the schema= ma/e the $atabase perform better
So e have $ifferent types of user in a system= a** ith $ifferent nee$s they nee$ to
see $ifferent vies of the $ata. Some nee$ to chan-e $ata= others on*y to rea$ $ata=
some to see the schema.
%e can cate-orise users of the system by the ro*es they haveG
9unctional users
#n$irect user receive reports of $ata from the $b= usua**y from someone ho
is a more $irect user of the system Cne;t user typeD e.-. in a ban/= a te**er ho
*oo/s up a printe$ *ist of customer names an$ account numbers.
,arametric user uses pre$efine$ forms an$ reports= here he?she simp*y has
to c*ic/ a button to run it. 'ay enter input va*ues CparametersD e.-. a** accounts
ith a ba*ance -reater than 1222K a** the *o$-ements in a -iven $ate ran-e
,a-e 1! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
,oer user can rite his?her on reports?forms as nee$e$ not usin-
pre$efine$ reports e.-. cou*$ bui*$ their on form or report in an 5ccess
$atabase or even rite a SFL Auery.
IT users
DB5 CDatabase 5$ministratorD or/s ith functiona* an$ #T usersK ma/es
schema chan-es= monitors $atabase performance an$ tunes it to improve
performance.
5na*yst?pro-rammer -ather reAuirements= $esi-n app*ications= imp*ement
app*ications so they nee$ to create pro-rams that access the $ata.
'ana-ement pro3ect mana-ers supervise $eve*opment of app*icationsK $o
not often use the $atabase $irect*y but may ant to see schema $ia-rams or
other $esi-n information.
#f e can /eep the $atabase $esi-n 6 $efinitions Cthe schemaD separate to pro-rams=
then e can rite $ifferent pro-rams or app*ications for the $ifferent users.
For e;amp*eG a netor/e$ @B app*ication for ban/ staff to maintain accounts. This
app*ication cou*$ have a section for mana-ers to vie summary reports e.-. at the en$
of each $ay. Customers cou*$ access their accounts on a eb+base$ app*ication to
vie their accounts an$ maybe a*so to transfer money beteen $ifferent accounts= to
pay bi**s on*ine etc.
Data in$epen$ence ma/es this easier. LetJs have a *oo/ at a -enera* architecture for
$atabase systems= here $ata is in$epen$ent of the schema.
. Three(Tier #""lication #rchitecture
'ost $atabase systems are no run on a c*ient+server netor/.
+ a**os sharin- of $ata $atabase on server= c*ients access it
+ internet is an e;tension of this eb pa-es that access a centra* $atabase
#n practice= an app*ication that ma/es a *ot of use of a $atabase i** have components
to present?$isp*ay $ata to en$+users an$ to process $ata as e** as the DB'S itse*f.
This fits e** ith the concept of $ata in$epen$ence.
The approach to systems $esi-n no tries to separate presentation of $ata from
business rules as much as possib*e.
LDra this $ia-ram then $iscuss.M

Three(tier architecture for a database a""lication
,a-e 1( of 87
,resentation "N.
5pp*ication server
Cbusiness ru*es in pro-rams?co$e
e.-. :ava= C<#= 5S,= @BD
,resentation Cuser vieD
C$isp*ay $ata= forms to
chan-e?enter $ataD
$ata
On c*ient ,Cs
On server
On server
%i** a*so have an app*ication component
to $ea* ith ma/in- connections to the
app*ication server 6 ca**in- the ri-ht
functions on the app*ication server e-. ca**
openNe5ccountCD function.
5ccepts ca**s to functions the functions
bui*$ Aueries for the $atabase in the
appropriate *an-ua-e CSFL for a re*ationa*
$bD an$ passes them to the $atabase.
4ses an interface stan$ar$ such as ODBC
or :DBC.
FBE Computer Science Department Lecture Notes Theory of Databases
Conceptua**y= this architecture has ) *eve*s= or tiers.
)+D cy*in$er shape use$ to $enote a $atabase in system $ia-rams.
5t the "resentation layer= there can be $ifferent vies of the $ata $ifferent
presentations to $ifferent types of user.
#n a c*ient+server system= this part usua**y communicates ith the app*ication server
usin- the netor/.
5t the a""lication layer there is a *ot of co$e that ref*ects the Qbusiness ru*esJ e.-.
co$e to ca*cu*ate the interest to app*y to an account. #f somethin- chan-es e.-. the
ban/ chan-es the ay it app*ies interest= then on*y nee$ to chan-e it here.
5t the data layer this is here the actua* $atabase is cou*$ be an 5ccess $b or a
SFL Server $b or an Orac*e $b any DB'S. The DB5 may or/ $irect*y ith the
$atabase throu-h the DB'S itse*f= but pro-rammers -enera**y or/ at the app*ication
*ayer.
This architecture is suite$ to internet app*ications e.-. e+commerce= emai* here it
ou*$ be very $ifficu*t to put the business ru*es into the c*ient Cthe eb broserD.
The eb broser can sen$ >TT, reAuests to the eb serverK the eb server can ma/e
ca**s to the app*ication server= or sometimes= $irect to the $atabase server.
The eb server is then the app*ication server in our $ia-ram. %eb scripts ritten in a
*an-ua-e *i/e ,er* or 5S, can use $atabase interfaces 6 $rivers base$ on ODBC to
access the $atabase.
N8G in the 'annino boo/= p- 1.= there is a $ia-ram shoin- the Three Schema
5rchitecture. This is $ifferent to this )+tier architecture for $atabase app*ications. The
Three Schema 5rchitecture refers to a stan$ar$ for DB'Ss. Hou can see this
imp*emente$ in= for e;amp*e= 'S 5ccess the interna* schema Cfi*e stora-e you $o
not nee$ to see itD= a conceptua* schema Cthe tab*e $esi-nD an$ vies CAueries to see
$ata from particu*ar tab*esD.
.4 9eatures of a D8M.
5 $atabase app*ication has at its core a DB'S to mana-e the $atabase itse*f.
5 DB'S provi$es an environment that a**os stora-e an$ retrieva* of $ata in a
$atabase= an$ provi$es ays of carryin- out $atabase a$ministration tas/s.
,a-e 1. of 87
netor/
Netor/ 6 $b interfaces
FBE Computer Science Department Lecture Notes Theory of Databases
Ne;t yearJs course= #CT)("= i** -o more in+$epth into the architecture of a DB'S.
For no= e i** ta*/ about the -enera* functions provi$e$ by a DB'SG
.tora&e 0 retrie3al can be $one in$epen$ent of interna* structures of the
$b. The D'BS i** have its on interna* $ata structures for storin- $ata= but a
user of the DB'S shou*$ not have to /no anythin- about those structures
you can store an$ retrieve $ata in a *o-ica*?conceptua* vie of the $ata. Thin/
of an 5ccess $ata e or/ ith $ata in tab*es. %e $o not nee$ to /no
anythin- about the un$er*yin- $ata structures an 5ccess $b is a** insi$e a
.m$b fi*e= hich e $o not have to *oo/ into.
Catalo& $escribes a** the $ata items store$ in the $b= hich are accessib*e to
users inc*u$es $ata $efinitions e.-. for a co*umn= hat is the $ata+type an$
hat is the siBe. This is the $atabase schema.
.hared u"date to support concurrency i.e. hen more than 1 user are
up$atin- the $atabase at the same time. This is $one ith transactions you
i** -et an intro$uction to transactions toar$s the en$ of this course an$
cover in more $epth in the #CT)(" course ne;t year.
$eco3ery if the $b is $ama-e$= nee$ to be ab*e to restore a or/in- copy. 5
DB'S provi$es bac/up an$ restore functions. #t is usua**y possib*e to
sche$u*e a bac/up to occur on a re-u*ar basis e.-. every ni-ht or every ! hours
Cif there is time, I will show you how to do this with SQL Server, in the labD
.ecurity access restricte$ to authorise$ usersK users assi-ne$ permissions to
carry out certain actions Ce.-. to up$ate or $e*ete $ataDK usua**y passor$+
protecte$ access. Data can a*so be encrypte$ for further protection Cwill have a
brief intro to this in this course, more detail in IC!"#$ will look at how to
create logins % users on SQL ServerD
Inte&rity mechanisms to ensure $ata inte-rity an$ referentia* inte-rity. Data
types= formats= chec/ constraints an$ /ey constraints a** use$ for this Cwill
learn about keys, data types % constraints in the relational modelD
Data inde"endence the manipu*ation of the $ata is in$epen$ent of here
the $ata is physica**y store$ in other or$s= $ata manipu*ation or/s ith
*o-ica* vie of the $ata an$ the process that is manipu*atin- the $ata $oes not
nee$ to /no where or how the $ata is store$ Cwe will use SQL to manipulate
dataD
:tility ser3ices provi$es ays to import 6 e;port $ata= Auery the $ata etc
CSQL Server has the Query &naly'er tool$ &ccess also has toolsD
.5 Ty"es of D8M.
There are various DB'S pac/a-es avai*ab*e on the mar/et.
'any of them are base$ on the re*ationa* mo$e* 1DB'S pac/a-es.
'icrosoft has " 5ccess an$ SFL Server.
5ccess is -oo$ for sma** sca*e app*ications that $o not have *ar-e numbers of users.
SFL Server is better for *ar-e app*ications here the number of connections an$
$atabase transactions is bi-.
SFL Server provi$es better performance= security an$ $ata protection than 5ccess.
,a-e 10 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The current versions C as of 'arch "22(D are 5ccess "222 Cmaybe "22)= # am not
sureD an$ SFL Server "222. Hou may sti** see 5ccess 80 an$ SFL Server 0.2 in use.
The Orac*e Corporation has the Orac*e DB'S= hich is a competitor to SFL Server
these to beteen them have most of the mar/et share. The current version is 8i.
#B' a*so has an 1DB'S DB".
'icrsoft= Orac*e an$ #B' have most of the mar/et share beteen them.
#n the open+source or*$= 'ySFL is a DB'S app*ication. #t can be $on*oa$e$ from
the internetK # a*so have a copy if anyone ou*$ *i/e to borro it to insta** it.
5nother open+source DB'S is ,ost-reSFL= for re*ationa* $atabases= $eve*ope$ by
the 4niversity of Ca*ifornia in the 4S.
Some other commercia* pac/a-es are #n-res an$ #nformi;.
4 The $elational Model
1b2ecti3eG in this unit= you i** *earn about the theory behin$ the re*ationa* mo$e*.
First e i** cover some basics an$ termino*o-y associate$ ith the mo$e*= then e
i** *oo/ in more $etai* at the concepts.
5s most $atabases no are base$ on the re*ationa* mo$e*= if you -et a -oo$
un$erstan$in- of the theoretica* concepts behin$ the mo$e*= you are more *i/e*y to
become a proficient $atabase $esi-ner an$ $eve*oper.
$eadin& MaterialG 'annino= Chapter "K Si*berschatB et a*= Chapter ).
The 'annino boo/ $oes not rea**y cover this usin- re*ationa* a*-ebra termino*o-y
but Si*berschatB et a* $oes recommen$ that you rea$ both.
+re"arationG Dup*icate >an$out " a one pa-e han$out shoin- the re*ation
schemas an$ re*ations that are use$ as e;amp*es $urin- this unit stu$ent=
pro-ramme= course= stu$entCourse Cto save time for stu$ents ritin- them outK they
can a*so rite notes on the pa-eD. Shos the re*ations as in section !.(= but ithout
the re*ationship connectors.
4.1 Tables 0 $elations
# relational database consists of a collection of tables.
Table a*so ca**e$ a relation + because a ro in a tab*e represents a set of re*ate$
va*ues Cthe va*ues in the co*umnsD.
1e*ation is a mathematica* term use$ in re*ationa* a*-ebra. 5s e i** see *ater= the
re*ationa* mo$e* is base$ on re*ationa* a*-ebra.
5 tab*e has colu!ns an$ ro4s or a re*ation has attributes an$ tu"les.
1e*ation= tup*e= attribute the more forma* termino*o-y. %e i** use both.
,a-e 17 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
4.2 Do!ains 0 #ttributes
Let us *oo/ at the re*ationa* mo$e* in more $etai*N
5 do!ain D is a set of va*ues.
Each attribute of a re*ation has a set of permitte$ va*ues the $omain for that
attribute.
5n attribute represents a characteristic of the entity that is represente$ by the
re*ation.
%e can combine to $omains= D1 an$ D" to -et the Cartesian "roductG
D
1
; D
"
this is a** the possib*e combinations of the va*ues in D1 an$ D".
E;amp*eG
D
1
G name
D
"
G phone numbers
Thin/ of a re*ation that has n attributes. Each tup*e in the re*ation is some combination
of va*ues from each of the attribute $omains. But there i** not be a tup*e for every
possib*e combination.
So= a relation is a subset of the Cartesian "roduct of a list of do!ainsG
D
1
; D
"
; NND
n+1
; D
n
4.
%hen or/in- ith re*ations= e use mathematica* termino*o-y *i/e thisG
t Lco*RnameM
$enotes the va*ue of the attribute ith the name co*Rname in the tup*e t.
e.-. tLphoneRnumberM O 2! !20.22
t r
+ $enotes that the tup*e t is in the re*ation r.
#n a re*ation= the or$er of the tup*es is not important because= mathematica**y=
e*ements of a set are not or$ere$.
ButN in a computer+base$ fi*e= the recor$s must be physica**y store$ on the $is/
somehere so they are store$ in some or$er. 5n$ hen you vie the ros in a
re*ation Ce.-. in a tab*e in 5ccessD= you are viein- them in some or$er Cit may not be
the same as the physica* or$er on $is/D.
%hen $efinin- a re*ation= e $o not $efine anythin- about the or$er of the tup*es.
4..1 More about attributes
For a** re*ations r= the $omains of a** attributes must be ato!ic the va*ues must be
in$ivisib*e i.e. a simp*e= sin-*e va*ue= that cannot be further $ivi$e$.
E;amp*eG the set of a** inte-ers is atomic.
The set of sets of inte-ers is not atomic= because a set is not a simp*e= sin-*e va*ue= it
is a *ist of inte-er va*ues. So this cou*$ not be the $omain for an attribute in a re*ation.
%hen $eterminin- if a $omain is atomic or not= you nee$ to consi$er the usa-e in the
$atabase. For e;amp*eG
,a-e 18 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
+ The set of a** possib*e names of peop*e is atomic.
+ The set of fu** names Cfirst name an$ fatherJs nameD is not atomic= as you
can sp*it into first name an$ fatherJs name. <enera**y= this is hat is $one in a
re*ationa* $atabase. Hou cou*$= in theory= have an attribute for fu** name if
you ta/e the vie that names cannot be sp*it. But then it ou*$ be $ifficu*t to
search for a person /noin- 3ust their first name or 3ust their fatherJs name.
Multi"le attributes can have the same $omain. For e;amp*e= in a $atabase for the
university Stu$entRname an$ StaffRname have the same $omain hi*e they be*on-
to $ifferent re*ations CStu$ent= StaffD. Li/eise= if you have Stu$entRFirstName an$
Stu$entRFathersName attributes both in the same re*ation= an$ have the same
$omain.
Thin/ about attributes *i/e Stu$entRFirstName an$ DepartmentRName.
#f you ta/e a physica* vie i.e. thin/ about hat is store$ on the $is/. Both are 3ust a
strin- of characters e.-. T= e= s= f= a= y for the name QTesfayJ or QComputer ScienceJ. So
they cou*$ be the same $omain i.e. a** character strin-s. But on*y certain strin-s ma/e
up person names an$ on*y certain strin-s ma/e up $epartment names.
So *o-ica**y= e /no they have $ifferent $omains.
On the other han$G some $omains are obvious*y $ifferent e.-. <,5 is a number an$
Stu$entRFirstName is character $ata.
Null 3alue is a specia* va*ue that can be a member of any possib*e $omain. Nu**
means the actua* va*ue is un/non or $oes not e;ist. For e;amp*e= if the Staff re*ation
has a ,honeRNumber attribute + it shou*$ be nu** an$ not 2 if you $o not /no the
number or if there is no phone number for the person.
ButNnu** va*ues can cause some prob*ems hen or/in- ith $ata. #f possib*e= you
shou*$ try not a**o them in $atabase tab*es e i** see ho to $o this *ater in the
course.
4.4 .che!as
The database sche!a is the *o-ica* $esi-n of the $atabase.
5 database instance is a snap+shot CpictureD of the $ata in the $atabase at any -iven
instant in time.
#n the re*ationa* mo$e*= e a*so ta*/ about a relation sche!a. This refers to the
$efinition of one re*ation. Thin/ of a re*ation as bein- *i/e a variab*e in pro-rammin-=
hi*e a re*ation schema is *i/e the type $efinition for the variab*e.
9or e,a!"leG
%e have a re*ation ca**e$ stu$ent C*i/e a variab*eD.
The re*ation schema for stu$ent isG
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname= pro-rammeRco$eD
C*i/e the variab*e type $efinitionD
,a-e "2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
pro-rammeRco$e is an a*pha+numeric va*ue that i$entifies the $e-ree or $ip*oma the
stu$ent is re-istere$ for e.-. Computer Science De-ree CCompSciDe-D= 'ana-ement
Dip*oma C'-tDipD.
%e can sho that stu$ent is a re*ation on Stu$ent+schema Ca variab*e of the typeND
*i/e thisG
stu$ent CStu$ent+schemaD
Let us say e a*so haveG
,ro-ramme+schema O Cpro-rammeRco$e= pro-Rname= pro-R$escriptionD
5n$
pro-ramme C,ro-ramme+SchemaD
%e have the attribute pro-rammeRco$e in both re*ation schemas this is a ay to
re*ate the tup*es in to $ifferent re*ations. The pro-rammeRco$e in the stu$ent
re*ation te**s you hat pro-ramme that stu$ent is re-istere$ for. Hou can -et the
pro-ramme $etai*s Cname= $escriptionD in the pro-ramme re*ation.
%ith this structure= e can anser Auestions *i/e
Q-et a *ist of a** stu$ents re-istere$ for the Computer Science De-reeJ
Le*aborate on this e;p*ain ho you can use the va*ue CompSciDe- to *oo/ up the
stu$ent re*ationM
This or/s e** because e /no that one stu$ent can be re-istere$ in one
pro-ramme on*y.
Consi$er stu$ents ta/in- courses. 5 stu$ent can ta/e many $ifferent courses.
Course+schema O CcourseRco$e= courseRname= courseR$escription= cre$itRhoursD
course CCourse+schemaD
/uestionG ho ou*$ you sho this in these re*ations&
First anser *i/e*y to beG put courseRco$e in the stu$ent schema. #f someone su--ests
a separate re*ation fin$ out hat they thin/ shou*$ be in it= con-ratu*ate them if they
are ri-ht but say that first eJ** *oo/ at hat happens if you put courseRco$e in the
stu$ent re*ation.
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname= pro-rammeRco$e=
courseRco$eD
Each stu$ents ta/es severa* coursesNso no there ou*$ be severa* tup*es for each
stu$ent in the stu$ent re*ation.
/uestionG Can you see any prob*em ith this&
5G stu$ent name= i$ an$ pro-ramme co$e are $up*icate$ repeate$ in each tup*e.
5n$ hat if a stu$ent is re-istere$ but has no courses se*ecte$ yet& Then the tup*e for
that customer is incomp*ete an$ e have to put a nu** va*ue for the courseRco$e.
,a-e "1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
/uestionG Can you see ho e can fi; the $up*ication prob*em an$ not have to use
nu** va*ues&
5G ma/e a ne re*ation *i/e this Ce*icit from stu$ents hat attributes they ou*$ put in
the ne re*ationD
Stu$ent+Course+Schema O Cstu$entRi$=courseRco$eD
stu$entCourse CStu$ent+Course+SchemaD
%e create tup*es in the stu$entCourses re*ation on*y hen a stu$ent re-isters for a
particu*ar course. Then e $o not have nu**s.
5n$= as before= e can *oo/ up a -iven stu$entRi$ in stu$entCourse to fin$ hat
courses the stu$ent is ta/in-.
No= e are usin- a re*ation to $escribe an association beteen the stu$ent an$
course entities. So= a relation can describe an entity Cstu$ent or course or pro-rammeD
or it can describe an association between entities.
4.5 $elationshi"s
%e no have a number of re*ations= base$ on these schemasG
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname= pro-rammeRco$eD
,ro-ramme+schema O Cpro-rammeRco$e= pro-Rname= pro-R$escriptionD
Course+schema O CcourseRco$e= courseRname= courseR$escription= cre$itRhoursD
Stu$ent+Course+Schema O Cstu$entRi$=courseRco$eD
%hen or/in- ith a re*ationa* $atabase= e can represent the re*ations in a schema
$ia-ram *i/e this Lfirst $ra ithout the re*ationships as/ c*ass to put them in
themse*ves possib*y have a han$out ith the tab*es on it a*rea$y. 5*so sho some
samp*e $ata recor$s for each tab*eM.
The re*ation name appears in the top bit of the rectan-*e. The attribute names appear
in the bottom bit. Different authors?te;ts use variations of this ay of $rain- a
schema $ia-ram for e;amp*e= some put the re*ation name outsi$e the bo;.
Samp*e $ataG
stu$ent
student;ID student;firstna!e student;fathersna!e "ro&ra!!e;code
122 Sara Ne-ash CSDE<
121 Te/*e >aimanot CSDE<
12" Terhas <irma CSDE<
,a-e "" of 87

stu$ent
stu$entRi$
stu$entRfirstname
stu$entRfathersname
pro-rammeRco$e
pro-ramme
pro-rammeRco$e
pro-Rname
pro-R$escription
course
courseRco$e
courseRname
courseR$escription
cre$itRhours
stu$entCourse
stu$entRi$
courseRco$e
1

1
1
FBE Computer Science Department Lecture Notes Theory of Databases
12) So*omon 9ebe$e CSD#,
pro-ramme
"ro&ra!!e;code "ro&;na!e "ro&;descri"tion
CSDE< Computer Science De-ree ) Hear De-ree in Computer Science
CSD#, Computer Science Dip*oma Dip*oma in Computer Science
course
course;code course;na!e course;descri"tion credit;hours
#CT"(" Theory of
Databases
#ntro$uction to $atabases= DB'S= $atabase
mo$e*sK focus on re*ationa* mo$e*K usin- E+
1 mo$e**in- to $esi-n $atabases.
!
#CT")1 #nternet 6 %eb
,a-e Deve*opment
Basic s/i**s reAuire$ for eb $eve*opment=
inc*u$in- >T'L an$ scriptin-.
)
stu$entCourse
student;ID course;code
122 #CT"("
121 #CT"("
122 #CT")1
12" #CT")1
12) #CT"("
12) #CT")1
There are connections beteen the $ata in the $ifferent re*ations. %e ca** these
relationshi"s because the ros in a tab*e can be re*ate$ to ros in another tab*e.
They are re*ate$ by va*ues that match in the $ifferent tab*es.
/uestionG Can you i$entify re*ationships beteen tab*es on this $ia-ram&
5 stu$ent is re-istere$ for one pro-ramme Cmust be 1D
5 pro-ramme has many stu$ents re-istere$ for it C2 or moreD
SoN1(!any re*ationship beteen ,ro-ramme an$ Stu$ent re*ations.
5 stu$ent can enro* for many courses C1 or moreD
5 course can have many stu$ents re-istere$ for it C2 or moreD.
SoN.1+many beteen Stu$ent an$ Stu$entCourse= 1+many beteen Course an$
Stu$entCourse. 5n$ !any(!any beteen Stu$ent an$ Course but e cannot sho
that $irect*y beteen the tab*es e use the Stu$entCourse re*ation to mo$e* the
association beteen the " re*ations.
NoNa$$ the re*ationships to your $ia-ram put a *ine beteen the re*ate$
attributes= ith a 1 for the one si$e an$ a CinfinityD symbo* for the many si$e.
Some $esi-ners use a $ifferent notation a *ine ith an arro hea$ pointin- to the
many si$e of the re*ationship.
Can a*so have a 1+1 re*ationship e.-. if e intro$uce a Teacher re*ation an$ assume
that one course is tau-ht by one teacher on*y.
1e*ationships beteen re*ations are important because often e nee$ to e;tract $ata
from " or more tab*es for the $ata to be meanin-fu*. For e;amp*e if the re-istrar
ants a *ist shoin- each stu$ent an$ hat courses he?she has re-istere$ for= the $ata
,a-e ") of 87
FBE Computer Science Department Lecture Notes Theory of Databases
must come from the stu$ent an$ stu$entCourse re*ations. %e can -et this $ata by
matchin- the stu$entRi$ fie*$ in both tab*es.
This ay of combinin- tab*es to -et $ata from them is ca**e$ a 3oin e can use SFL
to 3oin tab*es.
#t is important to un$erstan$ the re*ationships beteen your $b tab*es for e;tractin-
meanin-fu* 6 usefu* $ata from them.
%e i** *earn more about re*ationships hen e *oo/ at E+1 mo$e**in-.
4.6 <eys
#n a re*ation= each tup*e CroD represents an instance of the rea*+or*$ entity that the
re*ation mo$e*s.
%e nee$ some ay to $istin-uish the instances from each other the va*ues of the
attributes of an instance must uniAue*y i$entify that instance.
,ut another ayG no to instances Ctup*es?rosD can have e;act*y the same va*ues for
a** the attributes.
5 /ey is an attribute or set of attributes in a re*ation that uniAue*y i$entifies each tup*e
in the re*ation.
4.6.1 .u"er5eys
Loo/ at the Stu$ent+Schema re*ation schema if e ta/e the combination of a**
attributes= it ou*$ uniAue*y i$entify each ro. Or e cou*$ ta/e the combination of
stu$entRi$= stu$entRfirstname= stu$entRfathersname. But e cou*$ not use
stu$entRfirstname= stu$entRfathersname as $ifferent stu$ents cou*$ have the same
name.
Each of these combinations is ca**e$ a su"er5ey.
# su"er5ey is an attribute or co!bination of attributes containin& uni=ue 3alues
for each tu"le in the relation.
4.6.2 Candidate <eys
#f e ta/e aay stu$entRfathersname from the stu$entRi$= stu$entRfirstname=
stu$entRfathersname combinationNe sti** have a super/ey.
#f e then ta/e aay stu$entRfirstname= e have 3ust stu$entRi$ *eft an$ this sti**
uniAue*y i$entifies each tup*e.
There is no sub+set of this set of attributes that is itse*f a super/ey e have re$uce$
the super/ey as much as e can.
%hat e are *eft ith is ca**e$ a candidate 5ey.
# candidate 5ey is a su"er5ey for 4hich no subset is itself a su"er5ey.
%e can a*so that a can$i$ate /ey is a !ini!al su"er5ey it is minima* if removin-
any attributes ma/es it no *on-er uniAue.
Loo/ at the ,ro-ramme+Schema. Startin- ith a** the attributes as a super/ey= can you
i$entify one or more can$i$ate /eys& 5ssume that no to ,ro-rammes have the same
,ro-ramme name.
5G " can$i$ate /eys ,ro-rammeRCo$e an$ ,ro-RName because each can uniAue*y
i$entify a pro-ramme. The combination of both is a super/ey but not a can$i$ate /ey
because it has sub+sets C,ro-rammeRCo$e 6 ,ro-RnameD that are super/eys.
,a-e "! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
4.6. +ri!ary 5ey
No e have i$entifie$ " can$i$ate /eys for the ,ro-rammeRSchema.
%hen $esi-nin- a re*ationa* $atabase= you Cas the $esi-nerD must choose one of these
to be the ,rimary 9ey for the re*ation.
/uestionG %hich ou*$ you choose for this re*ation&
See hat the stu$ents chooseN.fin$ someone ho chooses ,ro-rammeRCo$e an$ as/
them hy they choose it.
The reason is that the co$e is *ess *i/e*y to chan-e over time hereas the university
mi-ht chan-e course names. 5 -oo$ can$i$ate for primary /ey is an attribute hose
va*ues are *east *i/e*y to have to be chan-e$ over time.
# "ri!ary 5ey is a candidate 5ey chosen to be the !ain 4ay to uni=uely identify
tu"les in the relation. #t represents a constraint in the rea*+or*$ that the $atabase
mo$e*s. For e;amp*e= in choosin- ,ro-rammeRCo$e as the primary /ey= e are
ref*ectin- the fact that the Co$e must be uniAue for every ,ro-ramme. Li/eise for
Stu$entR#D.
The $ecision is up to the $esi-ner= but you shou*$ put some thou-ht into it. Sometimes
it is obvious hat the primary /ey shou*$ be= e.-. in the Stu$ent+Schema. But
sometimes it isnJt e.-. in Stu$ent+Course+Schema.
5fter you have some e;perience of $esi-nin- $atabases= you i** not often thin/
about super/eys an$ can$i$ate /eys as your e;perience i** -ui$e you an$ you i**
be ab*e to te** Auite Auic/*y hat shou*$ be the primary /eyI
5 primary /ey that consists of more than one attribute is a co!"osite "ri!ary 5ey.
NoNon your $ia-ram of the re*ation schema= un$er*ine the primary /eys in each
re*ation.
4.* Constraints
# mentione$ ear*ier that a primary /ey ref*ects a constraint in the rea* or*$ of the
$atabase.
5 constraint is a ru*e that restricts the possib*e va*ues that can -o into a re*ation Ctab*eD
in a re*ationa* $atabase.
Besi$es /eys= there are some other types of constraint in the re*ationa* mo$e*.
Some constraints are va*ue+base$= some are va*ue+neutra*.
@a*ue+base$G comparison of an attribute va*ue to some constant va*ue e.-.
Cre$it>ours PO 2 to ref*ect that a courseJs cre$it hours must be -reater than or eAua*
to 2.
@a*ue+neutra*G comparison of attribute va*ues to other attribute va*ues. For e;amp*e= if
e ha$ start $ate an$ finish $ate attributes for a Stu$ent the finish $ate shou*$ be
*ater than the start $ate.
/G 9eys are a form of constraint. Do you thin/ they are va*ue+base$ or va*ue+neutra*&
,a-e "( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#G va*ue+neutra* because they compare attribute va*ues in a -iven co*umn to other
va*ues in the same co*umn.
%e are -oin- to *oo/ at these types of constraintG
Functiona* $epen$ency
Entity inte-rity
1eferentia* inte-rity
Tri--ers
4.*.1 9unctional De"endencies
Lcover as part of norma*isationM
4.*.2 'ntity Inte&rity
Entity inte-rity means that each re*ation must have an attribute or combination of
attributes hose va*ues uniAue*y i$entify each tup*e in the re*ation.
#n other or$s= no to tup*es in the re*ation can have the same va*ue for that attribute
or combination of attributes.
This is to ensure that entities from the rea* or*$ are uniAue*y i$entifie$ in the
$atabase e.-. stu$ents= courses= pro-rammes.
Entity inte-rity is enforce$ ith primary /eys no to tup*es in a re*ation can contain
the same va*ues for the primary /ey attributeCsD.
5*so= the primary /ey of a re*ation cannot have a nu** va*ue in any tup*e so primary
/ey attributes must have an a$$itiona* constraint that $oes not a**o nu**s.
#f nu**s ere a**oe$= then to or more ros cou*$ have the nu** va*ue hich
vio*ates the entity inte-rity constraint.
4.*. $eferential Inte&rity
Loo/ bac/ at the re*ate$ tab*es stu$ent an$ stu$entCourse.
Stu$ent#D is the primary /ey in the stu$ent re*ation.
%e /no that if e *oo/ at stu$entCourse= the va*ues for Stu$ent#D match va*ues in
Stu$ent.Stu$ent#D Cpoint out this $ot notation tab*e.co*umn?re*ation.attribute to
the stu$entsD.
#n fact= e reAuire that the va*ues in stu$entCourse.stu$ent#D match va*ues in
stu$ent.Stu$ent#D. This is referential inte&rity here the va*ues in co*umns of one
tab*e must match va*ues in co*umns of other tab*es.
1eferentia* inte-rity is enforce$ usin- another type of /ey a forei-n /ey.
Stu$ent#D is a forei&n 5ey in the stu$entCourse re*ation.
5 forma* $efinition for a forei-n /eyG
# relation r
1
can ha3e an attribute that is the "ri!ary 5ey of another relation> r
2
.
This a forei&n 5ey fro! r
1
> referencin& r
2
.
r
1
is the referencin& relation. r
2
is the referenced relation.
,a-e ". of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#n a $atabase instance= -iven any tup*e= say t
a
= from the r
1
re*ation= there must be some
tup*e= t
b
= in the r
"
re*ation here the va*ue of the forei-n /ey attribute of t
a
is the same
as the va*ue of the primary /ey attribute in r
"
.
The va*ue of the forei-n /ey attribute of t
a
can be nu** a*so.
>oever= the $b $esi-ner can $eci$e hether or not to a**o nu**s in the forei-n /ey
attribute. #t $epen$s on the usa-e in the rea* or*$.
/G For e;amp*e= in the stu$entCourse re*ation $oes it ma/e sense to a**o stu$ent#D
or course#D to be nu**&
#G no because both are part of the primary /ey the primary /ey cannot be nu**.
/G %hat about in the stu$ent re*ation cou*$ the pro-rammeRco$e be nu**&
#G $epen$s on usa-e can a stu$ent be re-istere$ but not have se*ecte$ a pro-ramme&
# thin/ no= as the stu$ent ou*$ have to choose a pro-ramme. They can a*ays chan-e
it *ater. #f you a**o nu**s here= you cou*$ en$ up ith $ata that shos stu$ents that
are not re-istere$ for a pro-ramme but that are re-istere$ for courses.
4.*.4 Tri&&ers
'any DB'Ss inc*u$e a capabi*ity to $efine ru*es that are processe$ Cor tri--ere$D
hen certain events occur.
For e;amp*e= if the ba*ance on a current account becomes ne-ative C*ess than 2D= the
account shou*$ be mar/e$ as bein- over$ran. This cou*$ be mo$e**e$ ith an
attribute name$ Over$ran in the 5ccount re*ation= hich can have the va*ues true or
fa*se Cor yes?noD.
%e ant the $atabase to automatica**y up$ate the Over$ran co*umn hen the
ba*ance chan-es from bein- positive to ne-ative or vice+verse.
This is ca**e$ a tri&&er in the $atabase.
This is not strict*y a constraint in the re*ationa* mo$e*= but it is a feature that has been
imp*emente$ in many 1DB'S pac/a-e.
The tri--er is a ru*e $efine$ in the $atabase itse*f. The ru*e for this e;amp*e ou*$ be
somethin- *i/e thisG
5fter a ro in the 5ccount re*ation has been up$ate$
FO1 E5C> up$ate$ ro
#F neRba*ance S 2 5ND o*$Rba*ance P 2 T>EN
set Over$ran O true
ELSE #F
o*$Rba*ance S 2 5ND neRba*ance P 2 T>EN
set Over$ran O fa*se
END #F
En$ FO1
5 tri--er has ) parts to itG
Event C5ccount is up$ate$ can be any event e.-. $e*ete or insertD
Con$ition Cba*ance chan-es from bein- positive to ne-ative or vice+verse
can *oo/ at the va*ues in the ro before an$ after the event occurre$D
,a-e "0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5ctions Cset the Over$ran f*a- can carry out any SFL action e.-. insert
to another tab*eD
5 tri--er can be create$ usin- SFL an$ it is then part of the $atabase schema= *i/e
tab*es= co*umns an$ other ob3ects in the $atabase.
5s a $atabase $esi-ner or pro-rammer= you shou*$ be carefu* in your use of tri--ers=
as they can s*o $on the operation of the $atabase as the tri--er i** run every
time a ro or -roup of ros in the tab*e is up$ate$.
Sometimes= you can fin$ another ay of carryin- out the action so thin/ about it
first an$ use a tri--er on*y if you cannot fin$ another ay of $oin- it.
5 $elational #l&ebra
1b2ecti3eG to *earn the basic an$ a$$itiona* operations of re*ationa* a*-ebra= as these
form the basis for SFL. This is $one by *oo/in- at each operation= $oin- some
e;amp*es an$ -ivin- the c*ass e;ercises to $o the operations themse*ves.
+re"arationG $up*icate >an$out )= hich is a short or/sheet on re*ationa* a*-ebra.
<ive to stu$ents after the basic operations have been covere$. They shou*$ $o the
e;ercises outsi$e of c*ass= an$ instructor can brief*y run throu-h the so*utions at
be-innin- of ne;t c*ass.
%e have *oo/e$ at the re*ationa* mo$e*G
1e*ations?attributes?tup*es Ctab*es?co*umns?recor$sD
Domains for attributes
Database schemas
9eys
Constraints
5s a foun$ation for *earnin- SFL= e i** ta/e some time to *oo/ at some of the
operators of re*ationa* a*-ebra as SFL is base$ on it.
1e*ationa* a*-ebra has operators that operate on re*ations.
#t is proce$ura* in nature the operators -enera**y ta/e 1 or " re*ations as input an$
pro$uce a ne re*ation as the output.
SFL is base$ on re*ationa* a*-ebra= but is itse*f most*y a $ec*arative *an-ua-e.
The basic operations areG
Se*ect
,ro3ect
4nion
Set $ifference
Cartesian pro$uct
1ename
Some others= hich are themse*ves $efine$ in terms of the basic operationsG
Set intersection
Natura* 3oin
Division
,a-e "7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
assi-nment
5.1 8asic 1"erations
4nary operatorsG operate on one re*ation se*ect= pro3ect= rename
Binary operatorsG operate on pairs of re*ations union= set $ifference= Cartesian
pro$uct
The resu*t of an operation is a ne re*ation.
5.1.1 .elect
The se*ect operator se*ects tup*es from a re*ation= that satisfy a -iven predicate
Ccon$itionD.
%e use the -ree/ *etter si-ma CD for the operator.
LStu$ents shou*$ have to han$ the han$out shoin- the stu$ent= course= pro-ramme
re*ations an$ samp*e $ataM.
To se*ect tup*es from the stu$ent re*ation= here the stu$ents are on the CSDE<
pro-rammeG

pro-rammeRco$eOTCSDE<T
Cstu$entD
stu$ent is the argument relation
pro-rammeRco$eOTCSDE<T is the predicate
The resu*t of this operation is a ne re*ation containin- ) tup*es.
The pre$icate has a comparison operator CP= S= O= Cnot eAua*D= = D. Hou can
compare va*ues in $ifferent attributes or compare an attribute to a constant va*ue.
#t can a*so inc*u$e *o-ica* 5ND= O1 an$ NOT operatorsG = = .
',ercisesG
%rite operations for the fo**oin-G
se*ect a** courses that have more than ) cre$it hours.
#G
cre$itRhoursP)
CcourseD
Se*ect stu$ents that are in the CSDE< pro-ramme an$ hose name is Te/*e.
#G
pro-rammeRco$eOTCSDE<T stu$entRfirstnameOTTe/*eT
Cstu$entD
5.1.2 +ro2ect
The pro3ect operator can be use$ to -et a sub+set of the attributes from a re*ation.
The resu*t is a ne re*ation= containin- a** the tup*es in the operan$ re*ation= an$ the
specifie$ attributes.
%e use the -ree/ *etter ,i for the operan$ CD.
To -et on*y the first name an$ fatherJs name from the stu$ent re*ationG

stu$entRfirstname= stu$entRfathersname
Cstu$entD
Because the resu*t of an operation is a re*ation= e can use the resu*t as a re*ation
input to another operation.
,a-e "8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
So= if you ant to -et the first name an$ fatherJs name for a** stu$ents on the CSDE<
pro-ramme= you can combine this operation ith the se*ect operationG

stu$entRfirstname= stu$entRfathersname
C
pro-rammeRco$eOTCSDE<T
Cstu$entDD
The resu*t of this is a ne re*ation ith attributes stu$entRfirstname an$
stu$entRfathersname an$ containin- on*y those tup*es that have CSDE< in the
pro-rammeRco$e attribute of the stu$ent re*ation
Cput another ay names of a** stu$ents on the CSDE< pro-rammeD
This is a re*ationa* a*-ebra e;pression as it combines $ifferent operations.
',erciseG
%rite an e;pression to -et the course co$e an$ names of courses that have ! or more
cre$it hours.
5G
courseRco$e= courseRname
C
cre$itRhours !
CcourseDD
#sideG at this point= you shou*$ start to see ho these operations are use$ in SFL.
Thin/ of the pro3ect an$ se*ect operators e have 3ust *oo/e$ at.
No thin/ of $oin- a Auery in a $atabase ith these re*ations in it G
#n 5ccess Fuery Desi-ner you ou*$ a$$ the stu$ent tab*e= then choose the co*umns
stu$Ri$= stu$Rfirstname= stu$Rfathersname from the stu$ent tab*e= then a$$ the criteria
that pro-Rco$e must be eAua* to CSDE<D
The Auery ou*$ *oo/ *i/e this if you rote it in SFL Cin the Fuery Desi-ner= ri-ht+
c*ic/ in the tab*es area= choose SFL @ie an$ you can see the SFL for the Auery.
1i-ht+c*ic/ in the tit*e bar to -o bac/ to Fuery Desi-n.DG
se*ect stu$Ri$= stu$Rfirstname= stu$Rfathersname
from stu$ent
here pro-Rco$e O QCSDE<J
%hat part is eAuiva*ent to a re*ationa* a*-ebra se*ect operation can you see an
ar-ument re*ation an$ a pre$icate&
#G the from c*ause an$ the here c*ause are the ar-ument re*ation an$ the pre$icate.
%hat part is eAuiva*ent to the pro3ect operation&
#G the *ine QSe*ect stu$Ri$= stu$Rfirstname= stu$RfathersnameJ
So= itJs a bit confusin- that SFL uses the or$ Qse*ectJ for hat is actua* the pro3ect
operation. For this reason= some te;tboo/s use a $ifferent name for the se*ect
operation restrict because it is restrictin- the number of tup*es.
Se*ect a subset of tup*es
,ro3ect a subset of attributes
The basic SFL Auery $oes se*ection an$ pro3ection. Hou can picture it *i/e thisG
,a-e )2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
student;ID student;firstna!e student;fathersna!e "ro&ra!!e;code
122 Sara Ne-ash CSDE<
121 Te/*e >aimanot CSDE<
12" Terhas <irma CSDE<
12) So*omon 9ebe$e CSD#,
1esu*tin- re*ation *oo/s *i/e thisG
student;ID student;firstna!e student;fathersna!e
122 Sara Ne-ash
121 Te/*e >aimanot
12" Terhas <irma
5.1. :nion
%e use the union operator to -et tup*es from " $ifferent re*ations.
Let us a$$ a teacher re*ation base$ on this schemaG
Teacher+Schema Cteacheri$= teacherRfirstname= teacherRfathersname= teacherRemai*D
5n$ *et us a*so a$$ an emai* a$$ress attribute for stu$entsG
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname=
pro-rammeRco$e= stu$entRemai*D
No *et us say e ant to -et a *ist of a** stu$ent an$ teacher names= a*on- ith their
emai* a$$resses an$ put a** the $ata into a sin-*e re*ation. #f e se*ect from each one=
e have " re*ationsG

stu$entRfirstname= stu$entRfathersname= stu$entRemai*


Cstu$entD

teacherRfirstname= teacherRfathersname= teacherRemai*


CteacherD
But e can use the union to put the " re*ations to-ether= *i/e thisG

stu$entRfirstname= stu$entRfathersname= stu$entRemai*


Cstu$entD

teacherRfirstname= teacherRfathersname= teacherRemai*


CteacherD
The resu*t is one ne re*ation.
5 re*ation behaves *i/e a set $oes a set a**o $up*icate va*ues&
5G no.
So= $up*icate tup*es are e*iminate$ from the resu*tin- re*ation. #f " tup*es have the
same va*ues in a** the attributes= on*y 1 tup*e is inc*u$e$ in the resu*t.
%hen $oin- a union operationG
,a-e )1 of 87
.election ?red@
+ro2ection ?yello4@
FBE Computer Science Department Lecture Notes Theory of Databases
'ust use compatible re*ations it must ma/e sense to union the " re*ations.
For e;amp*e= it $oes not ma/e sense to union the stu$ent an$ course re*ations.
The re*ations bein- union+e$ must have the same number of attributes e
say that the re*ations have the same arity.
The $omains of the correspon$in- attributes must be the same. #n other or$sG
for r s= the $omain of the i
th
attribute of r must be the same $omain as that
for the i
th
attribute of s.
#n this e;amp*e= the teacher an$ stu$ent re*ations $o not have the same number of
attributes so e use the pro3ection operation to -et " re*ations that have -ot the same
number of attributes in them.
5.1.4 .et Difference
LetJs say e ant to fin$ stu$ents ho are ta/in- one particu*ar course but are not
ta/in- another particu*ar course.
%e can use the set $ifferent operator to fin$ tup*es that are in one re*ation but not in
another. To -et tup*es that are in re*ation r but not in re*ation sG
r s
5s for union= the operation shou*$ be on re*ations that are compatib*e.
To fin$ stu$ents ho are ta/in- the #CT"(" course but ho are not ta/in- the #CT")1
courseG
C-et stu$ents to $o this themse*vesD
1. -et a re*ation shoin- stu$ents ta/in- #CT"(" 3ust -et the stu$ent #D

stu$entRi$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD
". -et a re*ation shoin- stu$ents ta/in- #CT")1 shou*$ be compatib*e ith the
first re*ationG

stu$entRi$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
No e can -et the $ifference beteen these toG

stu$entRi$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD

stu$entRi$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
The resu*t is= a-ain= a ne re*ation= ith one attribute an$ a number of tup*es.
5.1.5 Cartesian +roduct
%e a*rea$y ta*/e$ about the cartesian pro$uct of $omains remember that a re*ation
is a subset of the cartesian pro$uct of a set of $omains.
#n the same ay= e can combine " re*ations ith a Cartesian pro$uct operator the
resu*t is a re*ation that has a** the attributes from both re*ations an$ hose tup*es are
a** the possib*e combinations of the tup*es from each of the " re*ations.
The operator is ; e.-.
r
1
; r
"

,a-e )" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%e cou*$ have the same attribute name in the " re*ations so e have to ma/e sure
e can $istin-uish beteen the attributes in the resu*tin- re*ation.
r O stu$ent ; pro-ramme
The attributes in r areG
Cstu$ent.stu$Ri$= stu$ent.stu$Rfirstname= stu$ent.stu$Rfathersname=
stu$ent.pro-Rco$e= pro-ramme.pro-Rco$e= pro-ramme.pro-Rname=
pro-ramme.pro-R$escD
%e use the re*ation name as a prefi; to in$icate hich schema each attribute comes
from. But if the name occurs in one of the re*ations on*y= e can omit the re*ation
name. >ere= on*y pro-Rco$e occurs in both= so e have to put the re*ation names in
front of those " attributes.
This a*so means that the ar-ument re*ations must have $ifferent names.
r contains tup*es for every possib*e pair of stu$ent 6 pro-ramme tup*es e.-. Cbase$ on
$ata in han$out "D so if e have n tup*es in stu$ent an$ m tup*es in course= r
contains Cn ; mD tup*es.
stu$ent ; pro-ramme
stud;id stud;firstn
a!e
stud;fathersn
a!e
"ro&;cod
e
"ro&;cod
e
"ro&;na!e "ro&;desc
122 Sara Ne-ash CSDE< CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
122 Sara Ne-ash CSDE< CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
121 Te/*e >aimanot CSDE< CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
121 Te/*e >aimanot CSDE< CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
12" Terhas <irma CSDE< CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
12" Terhas <irma CSDE< CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
12) So*omon 9ebe$e CSD#, CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
12) So*omon 9ebe$e CSD#, CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
O1N-ive a simp*e e;amp*eG
r1 C11+SchemaD
11+Schema O C5= B= CD
5 B C
1
"
r" C1"+SchemaD
1"+Schema O CD= ED
%e can en$ up ith a *ot of tup*es in the resu*t $epen$in- on ho many tup*es in the
ar-ument re*ations.
,a-e )) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#n some tup*es= the stu$ent.pro-Rco$e O pro-ramme.pro-Rco$e= but in others they are
not eAua*G
t Lstu$ent.pro-Rco$eM t Lpro-ramme.pro-Rco$eM
#f e $o r
1
; r
"
= the schema for the resu*tin- re*ation r is a concatenation of the
schema 1
1
an$ 1
"
.
%hat use is this operation&
N.e can use it to anser Auestions *i/e Q-et a *ist of a** stu$ents on the Computer
Science De-ree pro-rammeJ if e $o not /no the pro-Rco$e= e can use the select
operator to -et on*y those tup*es here the pro-Rname is QComputer Science De-reeJ=
from the Cartesian pro$uct re*ation.

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
>o many tup*es i** this -et&
#G ! because each stu$ent tup*e has a tup*e that has pro-Rname O UComputer
Science De-reeT in the Cartesian pro$uct.
But is this correct& >o many stu$ents are $oin- the $e-ree pro-ramme&
#G ) e can see this in the tab*e but ho can e rite an a*-ebra e;pression to $o
this usin- another operation&
E*icit from stu$ents e nee$ to fin$ on*y those tup*es hereN.hat N. is true&
#G here the pro-Rco$e matches in both re*ations
4se another se*ect to $o thisG

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
D
Note that e prefi; ith the re*ation name for pro-Rco$e but not for pro-Rname
because pro-Rname occurs on*y in the pro-ramme re*ation.
No ho many tup*es e have ) because on*y ) have matchin- va*ues.
One more stepNthe resu*tin- re*ation has a** the attributes in it e can use a
pro3ection to -et on*y the stu$ent name attributesG
r O
stu$Rfirstname=stu$Rfathersname
C

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
DD
',erciseG
%rite an e;pression to -et a *ist of a** stu$ents ta/in- the course #CT"(" the *ist
shou*$ sho each stu$entJs fu** name.
#G
r O
stu$Rfirstname=stu$Rfathersname
C

stu$ent.stu$Ri$Ostu$entCourse.stu$Ri$
C

courseRco$eOT#CT"("T
Cstu$ent ; stu$entCourseD
DD
,a-e )! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5.1.6 $ena!e
5s e rite more comp*e; e;pressions= it ou*$ be nice not to have to rite out *on-
re*ation names *i/e Qstu$entCourseJ.
%e can rename a resu*t to a shorter name= usin- the rename operator .
This is a*so usefu* if you ant to $o a Cartesian pro$uct of a re*ation ith itse*f as
the to operan$s must have $ifferent names.
To rename the re*ation -iven by an e;pression E to the name ;G

;
CED
For e;amp*eG

cs$e-Rstu$ents
C
pro-rammeRco$eOTCSDE<T
Cstu$entDD
The operation returns the re*ation -iven by the e;pression= an$ the re*ation is name$
cs$e-Rstu$ents. Can no use that name in further operations.
To fin$ any stu$ents name$ So*omon ho are on the CSDE< pro-rammeG

stu$RfirstnameOTSo*omonT
C
cs$e-Rstu$ents
C
pro-rammeRco$eOTCSDE<T
Cstu$entDDD
Or= can simp*y rename a re*ation= as a re*ation is itse*f a trivia* re*ationa* a*-ebra
e;pressionG

stu$ent"
Cstu$entD
%e can a*so rename the attributes in the re*ation= usin- synta; *i/e thisG

; C51=5"=N.5nD
CED
#f the e;pression E has arity n Cn attributesDK 51 is a name for the first attribute= 5n is
a name for the n
th
attribute.
',a!"leG fin$ the hi-hest cre$it hours Cthis is a trivia* e;amp*e as e can easi*y see
from the $ata= but an eAuiva*ent Auestion mi-ht be Qfin$ the account ith the hi-hest
ba*anceJ hich ou*$ be more $ifficu*t to seeD.
1. ma/e a re*ation containin- a** courses that do not have the highest credit hours
i.e. here the cre$it hours va*ue is *ess than some the cre$it hours va*ue in
some other tup*e in the re*ation.
". $o a set difference beteen a** courses an$ those that are in the re*ation ma$e
in step 1.
This means e are $oin- an operation that reAuires " operan$s but here the "
operan$s are base$ on the same re*ation.
.te" 1G
Luse $ifferent co*ours to sho each ne bit you a$$ this e;pressionM

course.cre$itRhours
S
,a-e )( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
e have to have a $ifferent re*ation containin- the same attributes so e ma/e a
ne re*ation that is 3ust the Course re*ation rename$G

c
CcourseD
No e can $o a cartesian pro$uct of course ith the rename$ re*ationG

course.cre$itRhours S c.cre$itRhours
Ccourse ;

c
CcourseDD
No *etJs pro3ect to -et 3ust the cre$itRhours attribute from the course re*ation
because e have se*ecte$ those tup*es here the course.cre$itRhours is *ess than
c.cre$itRhoursG

course.cre$itRhours
C
course.cre$itRhours S c.cre$itRhours
Ccourse ;

c
CcourseDDD
.te" 2G
No e have a re*ation that has one attribute cre$itRhours. Each tup*e represents
some course that has cre$itRhours *ess than some other course i.e. a** courses that
do not have the highest credit(hours.
So if e no $o a set $ifference of this re*ation from a** cre$itRhours hatever is
remainin- must be the hi-hest cre$itRhours. 1emember that for set $ifference= the
ar-ument re*ations must be union+compatib*e.

course.cre$itRhours
CcourseD

course.cre$itRhours
C
course.cre$itRhours S c.cre$itRhours
Ccourse ;

c
CcourseDDD
5.2 #dditional 1"erations
The basic re*ationa* a*-ebra operators that e have 3ust *oo/e$ at are sufficient to
e;press any re*ationa* a*-ebra Auery.
But even ith these= some types of common Auery that e ma/e on re*ations become
comp*e; an$ *en-thy to e;press.
To ma/e some of these easier= there some a$$itiona* operations that simp*ify some
common types of Auery.
These areG
Set intersection
Natura* 3oin
Division
5ssi-nment
Each one of these can be e;presse$ in terms of the basic operations a*so.
5.2.1 .et Intersection
Suppose e ant to fin$ out hat stu$ents are ta/in- the course #CT"(" an$ the
course #CT")1.
This can be viee$ as the intersection of " sets or re*ationsG

stu$Ri$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD

stu$Ri$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
is the set intersection operator.
,a-e ). of 87
FBE Computer Science Department Lecture Notes Theory of Databases
r s can be ritten usin- the basic operations *i/e thisG
r Cr sD
Cr sD -ives tup*es that are in r but not in s.
#f e ta/e those tup*es aay from r= e are *eft ith on*y tup*es that are in r and in s.
#t is easier 3ust to use .
5.2.2 Natural Aoin
Thin/ bac/ to the Cartesian pro$uct operation usua**y= to -et somethin- meanin-fu*
from the resu*ts= e $o a se*ect operation ith some pre$icate on the C, resu*t
because e are *oo/in- for matchin- va*ues in the tup*es.
For e;amp*e= e $i$ this operation to -et a *ist of stu$ents ho are ta/in- the
Computer Science De-ree pro-ramme Cassumin- e $o not /no the pro-Rco$e for
itDG

stu$Rfirstname=stu$Rfathersname
C

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
DD
%e can $o this in a simp*er e;pression usin- the natura* 3oin operator= *i/e thisG

stu$Rfirstname=stu$Rfathersname
C

pro-RnameOTComputer Science De-reeT


Cstu$ent VV pro-rammeDD
The natura* 3oin $oes a Cartesian product but considers only the pairs of tuples where
the attribute that appears in both schemas has e)ual values* It also removes one
occurrence of the attribute that appears in both*
So= in this case on*y tup*es here stu$ent.pro-Rco$e O pro-ramme.pro-Rco$e are
consi$ere$. The resu*tin- re*ation has the pro-Rco$e once on*y= because one
occurrence is remove$.
#f e ant 3ust a *ist of stu$ents an$ the pro-rammes they are on= e can $o thisG

stu$Rfirstname=stu$Rfathersname= pro-Rco$e= pro-Rname


Cstu$ent VV pro-rammeD
+ here= e $onJt nee$ the se*ect on the pro-Rname O UComputer Science De-reeT
pre$icate. %e a*so $o not have to prefi; pro-Rco$e ith a re*ation name= as the
natura* 3oin operator has a*rea$y remove$ one of the pro-Rco$e co*umns.
This operator $epen$s on the matchin- attribute havin- the same name in both
re*ation schemas.
%e can brea/ the natura* 3oin $on into ) steps these are the ) steps e ha$ to $o
before to -et the same resu*t= usin- the basic operationsG
+ Cartesian pro$uct to combine the tup*es
+ Se*ect to remove tup*es here the common attribute $oes not have matchin-
va*ues
,a-e )0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
+ ,ro3ect to -et on*y the attribute e are intereste$ in Cusua**y a*so to remove
one of the matchin- attributesD
There is a*so a forma* $efinition of the natura* 3oinG
Consi$er that e operate on to re*ations= r an$ s. r has re*ation schema 1 an$ s has
re*ation schema S.
#f e consi$er the re*ation schemas to be sets of attributes
$ . is the intersection of the sets i.e. the attributes that appear in both schemas.
$ . is the union of the sets i.e. the attributes that appear in 1= in S or in both.
So= e can say that
r VV s O
1 S
C
r. 51 O s. 51 r. 5" O s. 5" NN r. 5n O s. 5n
Cr ; sDD
here 1 S O W5
1
= 5
"
= N.5
n
X.
Note that this shos that e can $o a natura* 3oin on more than one attribute there
cou*$ be " or more attributes in the re*ations that have matchin- va*ues in them.
#sideG in your *abs= you have been $oin- Aueries usin- the 'S 5ccess Fuery
Desi-ner. Hou have $one some Aueries ith " tab*es= here there is a re*ationship
beteen the tab*es. The Auery in this case is a 3oin beteen the " tab*es.
#f the matchin- co*umns have the same name in the " re*ations an$ if the Auery
removes one of the matchin- co*umns from the resu*t= it is a natural +oin operation.
5 more -enera* name for this type of Auery is e)ui-+oin here the 3oin is base$ on
eAua*ity beteen ros in one or more attributes. #n an eAui+3oin= the matchin-
attributes $o not have to have the same name.
',a!"lesG
1. :oinin- more than " re*ations
Suppose e ant to -et a *ist of a** Computer Science De-ree stu$ents= shoin- the
stu$ent name= course co$es an$ course names for the courses they are ta/in-. %e nee$
to 3oin the stu$ent= stu$entCourse an$ course re*ations.
%e can $o a natura* 3oin beteen stu$ent an$ stu$entCourse an$ then natura* 3oin the
resu*t to courseG

stu$Rfirstname=stu$Rfathersname= courseRco$e= courseRname


C

pro-Rco$eOTCSDE<T
Cstu$ent V;V stu$entCourse V;V courseD
D
%e cou*$ rite Cstu$ent V;V stu$entCourseD V;V course or stu$ent V;V Cstu$entCourse V;V
courseD an$ -et the same resu*t because the or$er in hich the operations are
e;ecute$ $oes not matter e say that the natura* 3oin is associative as an operation.
". 4sin- a 3oin instea$ of set intersection
1emember our e;amp*e for set intersection fin$ the #Ds of a** stu$ents ho are
ta/in- both #CT"(" an$ #CT")1 coursesG

stu$Ri$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD

stu$Ri$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
,a-e )7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
This can a*so be e;presse$ as a natura* 3oinG

stu$Ri$
C
courseRco$eOT#CT"("T
Cstu$entCourseD V;V C
courseRco$eOT#CT")1T
Cstu$entCourseDD
). :oinin- re*ations that $o not have a common attribute
#f e 3oin " re*ations that $o not have a common attribute= the resu*t of the natura* 3oin
is the same as that of the Cartesian pro$uctG
r V;V s O r ; s
1 S O Cempty setD
!. Combine a se*ection an$ a Cartesian pro$uct into a sin-*e operation. Ear*ier= e
$i$ thisG

stu$Rfirstname=stu$Rfathersname
C

pro-RnameOTComputer Science De-reeT


Cstu$ent VV pro-rammeDD
%e can combine the se*ection 6 pre$icate ith the natura* 3oinG
r V;V

s O

Cr ; sD
here CthetaD is a pre$icate on attributes in the schema 1 S.
So e can $oG

stu$Rfirstname=stu$Rfathersname
Cstu$ent VV
pro-RnameOTComputer Science De-reeT
pro-rammeD
The V;V

operator is ca**e$ a theta +oin.
5.2. Di3ision
The division operation is use$ to anser Aueries *i/e Qfin$ courses that are bein- ta/en
by a** stu$ents on the CSDE< pro-rammeJ or Qfin$ stu$ents ho are ta/in- a**
coursesJ.
#n contrast= a natura* 3oin can fin$ courses that are bein- ta/en by any stu$ent.
',a!"leG to fin$ courses that are bein- ta/en by all stu$ents on the CSDE<
pro-ramme
stu$entCourse Y C
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDDD
1efer to your >an$out " the stu$entCourse samp*e $ata. But first= a$$ a ne tup*e to
itG C12"= #CT"("D.
stu$entCourse
stud;id course;code
122 #CT"("
121 #CT"("
122 #CT")1
12" #CT")1
12) #CT"("
12) #CT")1
12" #CT"("
%hat tup*es are in
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDD&
,a-e )8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5nserG
stud;id
122
121
12"
No if e Q$ivi$eJ stu$entCourse by this= e i** -et a resu*t that has on*y the
attribute courseRco$e. #s there any courseRco$e that has a tup*e for every one of W122=
121= 12"X&
5G #CT"(".
%hen $ivi$in-= e are *oo/in- for va*ues in the stu$entCourse.courseRco$e co*umn
that are associated with every value in the stud(id column of the re*ation resu*tin-
from se*ectin- ith the pre$icate pro-Rco$eOTCSDE<T on stu$ent.
To ma/e it more c*ear= e shou*$ a$$ a pro3ection to the re*ation on the *eft= to sho
hat attributes it hasG

stu$Ri$=courseRco$e
Cstu$entCourseD Y C
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDDD
9or!al definitionG
rC1D an$ sCSD are re*ations ith schemas 1 an$ S.
S 1 that is= every attribute in schema S is a*so in schema 1.
The re*ation r Y s is a re*ation on the schema 1 S Ca** attributes in schema 1 that are
not in schema SD.
For this e;amp*eG the resu*t has on*y the attribute courseRco$e .
5 tup*e t is in the resu*tin- re*ation if these " con$itions ho*$G
1. t is in
1+S
CrD
". for every tup*e t
s
in s= there is a tup*e t
r
in r that satisfies both of the fo**oin-G
a. t
r
LSM O t
s
LSM
b. t
r
L1 + SM O t
For this e;amp*eG
Condition 1G
1+S
CrD -ives a** the courseRco$e va*ues in stu$entCourseG W#CT"("=
#CT"("= #CT")1= #CT")1= #CT"("= #CT")1= #CT"("X but $up*icate tup*es are
e*iminate$ by a pro3ection= so e have W#CT"("= #CT")1X.
Condition 2G
%e have W122=121= 12"X in our s re*ation.
+art aG For every tup*e t
s
in s= there is a tup*e t
r
in r that satisfiesG
t
r
LSM O t
s
LSM
#n other or$s= e ant to see is there a matchin- ro for every stu$Ri$ in
stu$entCourse Cfor every courseRco$e va*ue in the set W#CT"("= #CT")1XD.
For every tup*e t
s
in sG this is the set of tup*es W122=121= 12"X.
122G t
r
LSM is WS122= #CT"("P= S122= #CT")1PX
121G t
r
LSM is WS121= #CT"("PX
12"G t
r
LSM is WS12"= #CT")1P=S12"= #CT"("PX
,a-e !2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
To fulfil "art ?b@= e have to ta/e the above tup*es an$ chec/ that t
r
L1 + SM O t= here
t is in W#CT"("= #CT")1X.
For 122G
S122= #CT"("P S#CT"("P O S#CT"("P
there is a matchin- tup*e t
r
L1 + SM for every t
For 121G
there is NOT a matchin- tup*e t
r
L1 + SM for every t #CT")1 $oes not have a match.
So e must e*iminate #CT")1 from the set of tup*es t in
1+S
CrD.
For 12"G
there is a matchin- tup*e t
r
L1 + SM for every t
',"ress r B s usin& the basic o"erations
r Y s O
1+S
CrD +
1+S
CC
1+S
CrD ; sD +
1+S=S
CrDD
LetJs brea/ this $onG

1+S
CrD +
1+S
C C
1+S
CrD ; sD +
1+S=S
CrD D
The steps areG
1. <et a** the courseRco$e va*ues
". E*iminate courseRco$e va*ues that $o not have a** possib*e stu$ent+course
combinations in stu$entCourse. %e $o it *i/e thisN.

1+S
CrD ; s
-ives us every possib*e pairin- of courseRco$e an$ stu$Ri$
course;code stud;id
#CT"(" 122
#CT"(" 121
#CT"(" 12"
#CT")1 122
#CT")1 121
#CT")1 12"

1+S=S
CrD
-ives us the tup*es in r= but ith the courseRco$e attribute first= then stu$Ri$ Cto $o set
$ifference= must be union compatib*e same $omain for the i
th
attribute in each
re*ationD.
course;code stud;id
#CT"(" 122
#CT"(" 121
#CT")1 122
#CT")1 12"
#CT"(" 12)
#CT")1 12)
#CT"(" 12"
Set $ifference of the " above *eaves the tup*e C#CT")1= 121D the on*y tup*e in the top
re*ation that is not in the bottom re*ation.
,a-e !1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
This is -ivin- us the possib*e combinations of tup*es Cfrom the C, of courseRco$e an$
stu$Ri$D that $o not actua**y appear in stu$entCourse.
Do pro3ection
1+S
on this *eaves the courseRco$e on*y C#CT")1D.
No $o set $ifference beteen
1+S
CrD an$ C#CT")1D to find the course(code
values that have all possible combinations existing in stu$entCourse.
The tup*es for
1+S
CrD are C#CT"("= #CT")1D.
The set $ifference is #CT"(".
5.2.4 #ssi&n!ent
#t ou*$ be nice if e cou*$ assi-n the resu*t of an operation to a variab*e an$ then
use that variab*e in a subseAuent e;pressions= *i/e in pro-rammin- *an-ua-es.
%e**Ne can assi-n an e;pression to an temporary re*ation variab*e= *i/e thisG
temp1e*1 E
This assi-ns the resu*t of the e;pression E to the re*ation variab*e temp1e*1. The
variab*e can then be use$ in subseAuent e;pressions.
LetJs ta/e one of the *on-er e;pressions e have use$ as an e;amp*eG

stu$Rfirstname=stu$Rfathersname
C

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
DD
To ma/e it easier to rea$= e can $o an assi-nment or toG
cs$e-Stu$ents
stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C
pro-RnameOTComputer Science De-reeT

Cstu$ent ; pro-rammeDD
resu*t
stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
Ccs$e-Stu$entsD
This means that a Auery can be ritten as a seAuentia* pro-ram consistin- of a series
of assi-nments= fo**oe$ by an e;pression hose va*ue is $isp*aye$ as the resu*t of
the Auery. #t a*so ma/es it easier to e;press comp*e; Aueries.
#n re*ationa* a*-ebra Aueries= assi-nment must a*ays be ma$e to a temporary
variab*e. 5n assi-nment to a permanent= pre+e;istin- re*ation ou*$ be a mo$ification
to the $atabase= that chan-es that re*ation.
5. ',tended $elational(#l&ebra 1"erations
%e have no *oo/e$ at basic operations an$ some a$$itiona* operations= that are
$efine$ in terms of the basic operations.
For or/in- ith re*ationa* $atabases= the re*ationa* a*-ebra has ha$ some e;tensions
a$$e$ to it.
,a-e !" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5..1 Ceneralised +ro2ection
The pro3ection operation has been e;ten$e$ to a**o arithmetic functions to be use$ in
the pro3ection *ist.
For e;amp*e= suppose e ha$ a $ifferent schema for the course re*ation= *i/e thisG
Course+Schema O CcourseRco$e= courseRname= courseR$esc= theoryRhours= *abRhoursD
The formu*a for ca*cu*atin- cre$it hours is theoryRhours E C*abRhours?"D.
So e cou*$ $o a pro3ection on the course re*ation *i/e thisG

courseRco$e= courseRname= theoryRhours E C*abRhours?"D


CcourseD
this -ives us the cre$it hours for each course. The cre$it hours attribute in the
resu*tin- re*ation $oes not have a name e can -ive it a name usin- QasJ= *i/e thisG

courseRco$e= courseRname= theoryRhours E C*abRhours?"D as cre$itRhours


CcourseD
5..2 #&&re&ate 9unctions
Sometimes it is usefu* to be ab*e to ca*cu*ate an a--re-ate= or summary= va*ue for
somethin-. For e;amp*e= to fin$ out the tota* cre$it hours bein- ta/en by each stu$ent.
For this purpose= there are a number of a--re-ate functions $efine$ *i/e sum= av-=
ma;= min= count.
These can be use$ to -et an a--re-ate va*ue for some attribute.
5n a--re-ate function operates on a set of va*ues. For e;amp*e= e can app*y the sum
function to the set of va*ues
W!="=(=1X
The resu*t is 1".
5pp*yin- the av- function returns a va*ue of ).
The set can have the same va*ue appearin- more than once e.-. the set cou*$ be
W!="=!=(=1X
%ith re*ations= e can app*y a--re-ate functions to a** the va*ues in a particu*ar
attribute.
',a!"le 1G sum a** the cre$it hours Cthis $oes not rea**y ma/e senseNbut an
eAuiva*ent operation ou*$ be to -et a sum of a** emp*oyee sa*ariesD.
%e use a < in ca**i-raphic font for this Cca**i-raphic <DG
G
sumCcre$itRhoursD
CcourseD
This ou*$ -ive a resu*t of 12.
5 more usefu* thin- ou*$ be fin$ the tota* number of cre$itRhours to be ta/en by
each stu$ent. To $o this= e can first -et a re*ation that has attributes stu$Ri$=
courseRco$e= cre$itRhours. Then e can partition the re*ation into -roups base$ on the
stu$Ri$ an$ app*y the sum function to each -roup.
',a!"le 2G fin$ the tota* cre$itRhours to be ta/en by each stu$ent.
,a-e !) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
First= e nee$ to $o a natura* 3oin of course an$ stu$entCourse an$ pro3ect to -et the
attributes e antG

stu$Ri$= courseRco$e= cre$Rhours


Ccourse V;V stu$entCourseD
This is no the ar-ument re*ation for the a--re-ate. %e must a*so specify hat
attribute to -roup on stu$Ri$ as subscript at the *eft of the operatorG
stu$Ri$
G
sumCcre$itRhoursD
C
stu$Ri$= courseRco$e= cre$Rhours
Ccourse V;V stu$entCourseD
D
The -roups *oo/ *i/e thisG
stud;id course;code credit;hours
122 #CT"(" !
122 #CT")1 )
121 #CT"(" !
12" #CT")1 )
12" #CT"(" !
12) #CT"(" !
12) #CT")1 )
The resu*tin- re*ation ou*$ *oo/ *i/e thisG
stud;id .u! of credit;hours
122 0
121 !
12" 0
12) 0
%hen an a--re-ate is app*ie$ over a set of va*ues that inc*u$es $up*icates= there are
cases here e ant to e*iminate the $up*icates. For e;amp*e= the count function
counts the number of tup*es in the re*ationG
G
countCcourseRco$eD
Cstu$entCourseD
This ou*$ return a tup*e ith the va*ue 0 as there are 0 tup*es in the stu$entCoures
re*ation.
But if e ant to count ho many courses are bein- ta/en by stu$ents overa** e
$o not ant to count $up*icate va*ues of courseRco$e more than once. To $o this= e
use a $istinct a--re-ate= *i/e thisG
',a!"le G -et the count of ho many courses are bein- ta/en by stu$ents.
G
count+$istinctCcourseRco$eD
Cstu$entCourseD
The -enera* form of the a--re-ate operation G isG
<1=<"=N.=<n
G
F1C51D= F"C5"D=N..FmC5mD
CED
,a-e !! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%here E is any re*ationa*+a*-ebra e;pressionK
<
1
= <
"
= N<
n
are a *ist of attributes on hich to -roup e can -roup on more than
one attribute.
Each F
i
is an a--re-ate functionK
5n$ each 5
i
is an attribute name.
The tup*es in E are partitione$ into -roups such that
5** tup*es in a -roup have the same va*ues for <
1
= <
"
= N<
n

5n$
Tup*es in $ifferent -roups have $ifferent va*ues for <
1
= <
"
= N<
n
.
So= the -roups can be i$entifie$ by the va*ues of the attributes <
1
= <
"
= N<
n
.
For each -roup= there is a tup*e in the resu*tG
C-
1
= -
"
= N.. -
n=
a
1
= a
"
= NNa
m
D
for each i= a
i
is the resu*t of app*yin- the a--re-ate function F
i
on the set of va*ues for
the attribute 5
i
in the -roup.
Li**ustrate by shoin- in the previous e;amp*eM
5 specia* case is hen the *ist of attributes <
1
= <
"
= N<
n
is empty then there is on*y
one sin-*e -roup that contains a** the tup*es in E. This is a--re-ation ithout
-roupin-.
E;amp*e !G use to a--re-ate functions fin$ the tota* cre$it hours for each stu$ent
an$ the ma;imum cre$it hours each stu$ent has.
stu$Ri$
G
sumCcre$itRhoursD= ma;Ccre$itRhoursD
C
stu$Ri$= courseRco$e= cre$Rhours
Ccourse V;V stu$entCourseD
D
5s/ stu$ents to rite hat the resu*t ou*$ *oo/ *i/e.
The resu*tin- re*ation ou*$ *oo/ *i/e thisG
stud;id .u! of credit;hours Ma, of credit;hours
122 0 !
121 ! !
12" 0 !
12) 0 !
5s e $i$ in -enera*ise$ pro3ection= e can -ive a name for the ne attributesG
stu$Ri$
G
sumCcre$itRhoursD as sum+cre$itRhours= ma;Ccre$itRhoursD as ma;+cre$itRhours
C
stu$Ri$= courseRco$e= cre$Rhours
Ccourse V;V
stu$entCourseD
D
5.. 1uter Aoin
The outer 3oin operation e;ten$s the natura* 3oin operation to $ea* ith missin-
information. For e;amp*e= if e ant to -et a *ist of a** re-istere$ stu$ents an$ the
courses they are ta/in-= e ou*$ $o a natura* 3oinG
stu$ent V;V stu$entCourse
,a-e !( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
No= *et us a$$ a ne stu$ent tup*eG
student;ID student;firstna!e student;fathersna!e "ro&ra!!e;code
12! Fi/ir Hohannis CSDE<
%i** this tup*e appear in the resu*t re*ation of the natura* 3oin&
5G no because it $oes not have a matchin- tup*e in stu$entCourse.
So= sometimes= e ant to $o a 3oin here e -et not on*y the tup*es that have
matches Cthe natura* 3oinD= but a*so the tup*es that $o not have a match.
There are ) types of outer 3oin *eft outer 3oin= ri-ht outer 3oin an$ fu** outer 3oin.
#n each form= the 3oin computes a natura* 3oin an$ then a$$s some e;tra tup*es to the
resu*t.
Left outer 2oinG a$$ a** tup*es from the left re*ation that $i$ not have a matchin- tup*e
in the ri-ht re*ation. The attributes of the ri-ht re*ation are fi**e$ ith null va*ues for
the non+matche$ tup*es in the resu*t.
$i&ht outer 2oinG is the opposite of the *eft outer 3oin it a$$s a** tup*es from the
right re*ation that $i$ not have a matchin- tup*e in the *eft re*ation. The attributes of
the *eft re*ation are fi**e$ ith null va*ues for the non+matche$ tup*es.
9ull outer 2oinG $oes both the *eft 6 ri-ht outer 3oin operations fi**in- the non+
matche$ tup*es from both si$es ith nu** va*ues.
Left outer 3oinG
stu$ent stu$entCourse
The resu*t has a** the matchin- tup*es an$ a*so a tup*e for the stu$Ri$ 12!= ith a
nu** va*ue for the courseRco$e.
stud;id stud;firstna!e stud;fathersna!e "ro&;code course;code
122 Sara Ne-ash CSDE< #CT"("
122 Sara Ne-ash CSDE< #CT")1
121 Te/*e >aimanot CSDE< #CT"("
12" Terhas <irma CSDE< #CT")1
12" Terhas <irma CSDE< #CT"("
12) So*omon 9ebe$e CSD#, #CT"("
12) So*omon 9ebe$e CSD#, #CT")1
12! Fi/ir Hohannis CSDE< null
To i**ustrate a ri-ht outer 3oin ith the same " re*ations= suppose that our re*ations $i$
not a*ays enforce the referentia* inte-rity for the stu$entRi$ forei-n /ey an$ the
stu$entCourse re*ation has a tup*eG
stud;id course;code
"22 #CT"("
1i-ht outer 3oinG
stu$ent stu$entCourse
The resu*t for this one isG
stud;id stud;firstna!e stud;fathersna!e "ro&;code course;code
,a-e !. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
122 Sara Ne-ash CSDE< #CT"("
122 Sara Ne-ash CSDE< #CT")1
121 Te/*e >aimanot CSDE< #CT"("
12" Terhas <irma CSDE< #CT")1
12" Terhas <irma CSDE< #CT"("
12) So*omon 9ebe$e CSD#, #CT"("
12) So*omon 9ebe$e CSD#, #CT")1
"22 null null null IC#"#
Note that the tup*e for stu$Ri$ 12! is not in this resu*t because it $oes not have a
matchin- tup*e in stu$entCourse an$ it is in the *eft re*ation= not the ri-ht re*ation.
But if e $o a fu** outer 3oin= e i** have it an$ the stu$Ri$ "22 tup*eG
stu$ent stu$entCourse
stud;id stud;firstna!e stud;fathersna!e "ro&;code course;code
122 Sara Ne-ash CSDE< #CT"("
122 Sara Ne-ash CSDE< #CT")1
121 Te/*e >aimanot CSDE< #CT"("
12" Terhas <irma CSDE< #CT")1
12" Terhas <irma CSDE< #CT"("
12) So*omon 9ebe$e CSD#, #CT"("
12) So*omon 9ebe$e CSD#, #CT")1
12! Fi/ir Hohannis CSDE< null
"22 null null null IC#"#
#sideG #n re*ationa* $atabases= outer 3oins are often use$ to chec/ for missin- or ba$
$ata. For e;amp*e= if the referentia* inte-rity on the stu$Ri$ forei-n /ey in
stu$entCourse as missin- for some reason= you cou*$ $o an outer 3oin to chec/ for
ros that ere a$$e$ ith stu$Ri$ va*ues for stu$ents that $o not e;ist in the $atabase.
#n $atabase termino*o-y= such ros are sometimes ca**e$ orphans because they are
chi*$ recor$s of a 1+to+many re*ationship that $o not have a correspon$in- parent
recor$.
5.4 Database Modifications
So far= a** the re*ationa* a*-ebra e have *oo/e$ at is to retrieve $ata from re*ations.
#n terms of SFL= a** of the operations e have *oo/e$ at can be $one usin- the SFL
Se*ect Auery statement it has $ifferent c*auses an$ /eyor$s for se*ection=
pro3ection= a--re-ate functions= 3oins= set $ifference an$ rename.
#n a re*ationa* $atabase= e a*so nee$ to be ab*e to mo$ify the $ata in the re*ations.
There are ) types of mo$ification.
5s/ c*ass hat are they&
5G insert= up$ate= $e*ete.
#n SFL= there are $ifferent Auery statements for these insert= up$ate= $e*ete.
%eJ** *oo/ brief*y at the re*ationa* a*-ebra operations for each of these.
%e can use the assi-nment operator to assi-n the resu*t of an e;pression to an e;istin-
re*ation.
5.4.1 Deletion
To $e*ete se*ecte$ tup*es from the $atabase.
Can on*y $e*ete ho*e tup*es cannot $e*ete va*ues on*y from particu*ar attributes.
,a-e !0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
r r E
here r is a re*ation= E is a re*ationa*+a*-ebra Auery or e;pression
',a!"leG $e*ete a** the recor$s shoin- stu$ents ta/in- course co$e #CT")1
stu$entCourse stu$entCourse +
courseRco$eOT#CT")1T
Cstu$entCourseD
',erciseG $e*ete a** recor$s for the stu$ent hose #D is 12".
5**o stu$ents to try themse*ves first.
Nee$ to $e*ete from " re*ations stu$ent an$ stu$entCourse.
stu$entCourse stu$entCourse +
stu$Ri$O12"
Cstu$entCourseD
stu$ent stu$ent +
stu$Ri$O12"
Cstu$entD
#f you trie$ to $e*ete from stu$ent first= hat $o you thin/ ou*$ happen&
#G if referentia* inte-rity is enforce$= mi-ht -et an error because the forei-n /ey
va*ues in stu$entCourse ou*$ no not be va*ues e;istin- in stu$ent.
%hen $efinin- a forei-n /ey= you can specify if $e*etes of the primary /ey shou*$
cascade to the forei-n /ey.
#f $e*ete casca$e is enab*e$= hen a tup*e in the primary /ey re*ation is $e*ete$= any
re*ate$ tup*es in the forei-n /ey re*ation are a*so $e*ete$.
#n 'S 5ccess you can see this option in 1e*ationships hen you are $efinin- a
re*ationship or if you $oub*e+c*ic/ on an e;istin- re*ationship.
ButNbeare= often not a -oo$ i$ea to turn on the casca$e $e*ete option= as it means
that you may *ose $ata in the F9 re*ation unintentiona**y. Better to have it off= an$
a**o the user or app*ication to first $e*ete the re*ate$ tu*es.
5.4.2 Insertion
Specify a tup*e to be inserte$ or rite a Auery hose resu*t is a set of tup*es to be
inserte$.
The tup*es to be inserte$ must have the correct arity CZ of attributesD for the re*ation
bein- inserte$ into an$ the va*ues specifie$ must be in the attribute $omains.
r r E
E is a re*ationa* a*-ebra e;pression.
To insert a specifie$ tup*e E is a constant re*ation containin- that tup*e.
',a!"leG insert a ne stu$ent to the stu$ent re*ation= ith #D 12(= name 5baba
<irma= pro-ramme CSDE<.
stu$ent stu$ent W C12(= U5babaT= U<irmaT= UCSDE<TD X
',erciseG insert a recor$ to sho that the ne stu$ent is ta/in- course #CT")1.
stu$entCourse stu$entCourse W C12(= U#CT")1TD X
,a-e !7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
',a!"leG insert tup*es to stu$entCourse to have a** CSDE< stu$ents ta/in- the
#CT)11 course.
First= e can pro3ect 6 se*ect on stu$ent to -et the stu$ent #Ds e antG
r1
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDD
Secon$= combine each stu$ent #D ith the course co$e #CT)11 an$ put those tup*es
into stu$entCourseG
stu$entCourse stu$entCourse Cr1 ; U#CT)11TD
This is ho e specify a set of tup*es to insert into a re*ation.
5.4. :"date
Sometimes= e ant to chan-e the va*ue of one or more attributes in a tup*e= ithout
chan-in- a** the va*ues in the tup*e.
4p$ate the va*ues in a** tup*es in the re*ation or up$ate for on*y some tup*es.
4sin- the emp*oyee re*ation Cyour han$out )D.
',a!"leG increase a** Emp*oyee sa*aries by 12[.
To $o this= e can pro3ect on emp*oyee to -et a** attributes= use an arithmetic function
on the sa*ary attribute an$ assi-n the resu*t bac/ to the emp*oyee re*ation.
emp*oyee
empRi$=empRfirstname=empRfathersname=empRphone=sa*ary E C.1 \ sa*aryD =$eptRi$
Cemp*oyeeD
But if e ante$ to on*y $o the increase for emp*oyees in the $epartment that has
$eptRi$ " e have to use a se*ect to on*y chan-e those tup*es= an$ ma/e sure e
assi-n the chan-e$ tup*es 5ND the unchan-e$ tup*es bac/ to the re*ation.
emp*oyee
empRi$=empRfirstname=empRfathersname=empRphone=sa*ary E C.1 \ sa*aryD =$eptRi$
C
$eptRi$O"
Cemp*oyeeDD
C
$eptRi$"
Cemp*oyeeDD
The union is necessary to ta/e the up$ate$ tup*es an$ the non+up$ate$ tup*es.
The -enera* form isG
r
F1= F"= NFn
CrD
here each F
i
is an attribute of r or an e;pression= invo*vin- on*y constants an$
attributes of r= that -ives a ne va*ue for the attribute.
For up$atin- on*y a sub+set of tup*es= the -enera* form isG
r
F1= F"= NFn
C
,
CrDD C r +
,
CrDD
in other or$s a pro3ection on a se*ection from r= union+e$ ith the tup*es not
inc*u$e$ in the se*ection.
That conc*u$es the unit on re*ationa* a*-ebra.
%e covere$G
,a-e !8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Basic operationsG se*ect= pro3ect= union= set $ifference= Cartesian pro$uct=
rename
5$$itiona* OperationsG set intersection= natura* 3oin= $ivision= assi-nment=
E;ten$e$ OperationsG -enera*ise$ pro3ection= a--re-ate functions= outer 3oin
Database 'o$ificationsG $e*ete= insert= up$ate
5s you start to *earn SFL in the *abs= you shou*$ be ab*e to ma/e connections
beteen these operations an$ SFL statements. %eJ** ta*/ about SFL a-ain *ater in the
theory part of the course.
Lcou*$ $o no but $oin- in *ab= an$ you nee$ to *earn E+1 for pro3ects= so eJ**
move on.M
6 'ntity($elationshi" Modellin&
1b2ecti3esG by the en$ of the unit= stu$ents i** /no ho to create E+1 mo$e*s an$
use them as a too* for conceptua* $ata mo$e**in-K a*so ho to convert an E+1 mo$e* to
a re*ationa* $atabase schema.
+re"arationG E1 5ssistant Con CD+1O' ith 'annino Boo/D insta**e$ in Lab 1.
Copy >an$out ! for stu$ents.
#t ou*$ a*so be -oo$ to a$$ some more practica* e;ercises to this section= perhaps in
the form of tutoria*s here stu$ents have to create E+1 mo$e*s base$ on -iven
information. 4se *ast yearJs assi-nment= for e;amp*e.
$eadin& MaterialG 'annino ch. 0= Si*berschatB et a* ch.".
Systems 5na*ysis 6 Desi-n my han$outs >an$outs 8= 12= 11 but note that these
ere ritten for a $ifferent course= so not a** materia* is re*evant.
#ssi&n!ent 1G $escribe a scenario an$ stu$ents have to create an E+1 $ia-ram=
convert it to a re*ationa* $atabase schema an$ rite a report on it= inc*u$in- ritin- "
simp*e SFL Aueries on the $atabase.
6.1 Introduction
E+1 is a conceptua* $ata+mo$e**in- too* that can be use$ in the initia* sta-es of
$esi-nin- a re*ationa* $atabase.
Because it is conceptual= it a**os you to mo$e* entities in the mini+or*$ of the
$atabase system an$ the *in/s beteen them in ays that en$+users of the propose$
system i** un$erstan$.
5n E+1 mo$e* is a $ia-ram that is use$ to sho ho $ata is or-aniBe$ in a system.
4sua**y comes before the re*ation schema for a $atabase after comp*etin- an E+1
mo$e*= you can convert it to a re*ation schema for a re*ationa* $atabase.
No you /no about re*ationa* $atabases= *etJs *earn ho to -o about ma/in- them
from scratch.
,a-e (2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
NoteG in c*ass= e i** use simp*e e;amp*es. 1ea$ the te;tboo/s for $ifferent= more
comp*e; e;amp*es.
%e i** *oo/ at E+1 mo$e**in- in terms of the steps you cou*$ fo**o to create an E+1
mo$e*G
1. #$entify entity types
". #$entify attributes of the entity types
). Se*ect the i$entifier for each entity type
!. #$entify re*ationships beteen entity types
6.2 13er3ie4 D 'ntities> #ttributes> $elationshi"s
E+1 $ia-ram shos
entities Cob3ectsD in the mini+or*$
re*ationships beteen the entities
attributes of the entities an$ of the re*ationships.
',a!"leG Ta/e for e;amp*e a personne* system in a company= in hich emp*oyees
be*on- to $epartments an$ emp*oyees are assi-ne$ to pro3ect.
FG hat are the entities in this system&
5G Department
Emp*oyee
,ro3ect
FG hat $o you thin/ are re*ationships in this system&
5G 5 Department has Emp*oyees
5n Emp*oyee manages pro3ects
Emp*oyees work on pro3ects
Some of the #ttributes areG
5 Department has a name an$ a *ocation
5n Emp*oyee has a name= a phone number an$ a sa*ary
Entities are $escribe$ by nouns.
1e*ationships are $escribe$ by verbs.
LetJs be-in by $rain- an E+1 $ia-ram to sho the entities= attributes an$
re*ationships in a simp*e system that has emp*oyees an$ $epartments on*y.
Then eJ** *oo/ at the steps to bui*$in- this $ia-ram.
There are symbo*s use$ in E+1 $ia-rams.
Entity type rectan-*e
1e*ationship connectin- *ine beteen entity types= ith a $iamon$ shape
shoin- the name of the re*ationship. Diamon$ sometimes omitte$. The many
si$e of a re*ationship is in$icate$ ith somethin- ca**e$ the CroJs Foot
notation.
5ttributes sometimes shon in an ova* shape connecte$ to the entity typeK
sometimes as a *ist un$er the entity type name. %e i** use the *ist.
,a-e (1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
L$ra this $ia-ram on the boar$= refer bac/ to it throu-h the c*assK stu$ents can use
this to comp*ete the $ia-ram on their han$out !.M
,a-e (" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
6. Identify 'ntity Ty"es
'ntityG a person, place, ob+ect, event or concept in the system.
+ has its own identity that $istin-uishes it from every other entity.
For e;amp*e= each Emp*oyee has a uniAue #D that $istin-uishes it from every other
Emp*oyee.
'ntity ty"e: a co**ection of entities that share common properties or characteristics.
Each entity type in an E+1 mo$e* is -iven a name. The name is singular an$ is a
simple noun.
So e have an entity type ca**e$ ]Emp*oyee] it is not ca**e$ ]Emp*oyees]. This is
because the name represents a set of entities.
#n an E+1 $ia-ram= an entity type is represente$ by a rectangle= an$ the name is
in$icate$ in capita* *etters.
'ntity instance: a sin-*e occurrence of an entity type e.-. an emp*oyee= a $epartment.
5n entity type is $escribe$ 3ust one time in the $ata mo$e*= but many instances of that
entity type may be represente$ by $ata store$ in the $atabase.
For e;amp*e= there may be hun$re$s or thousan$s of emp*oyees in an or-aniBation
each one is an instance of the Emp*oyee entity type.
#n $ata mo$e**in-= the term Qentity, is often use$ to refer to an entity type. The term
Qentity instanceJ is usua**y use$ to in$icate an instance of the entity.
6.4 Identify #ttributes of the 'ntities
5n entity type has a set of attributes properties or characteristics associate$ ith
it. 5n attribute is a fact about the entity that is of interest to the or-aniBation or
system.
So= for e;amp*e= the E',LOHEE entity has attributes of Emp*oyeeR#D=
Emp*oyeeRName= Sa*ary.
#n E+1 $ia-rams= attributes are name$ ith an initia* capita* *etter fo**oe$ by
*oercase *etters.
5n attribute can be represente$ byG
+ an e**ipse Cova*D shape ith a *ine connectin- it to the associate$ entity.
Or
+ by *istin- them ithin the entity rectan-*e= un$er the entity name.
,a-e () of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
'ana-es
>as
FBE Computer Science Department Lecture Notes Theory of Databases
5$$ attributes to $ia-ramG
Note that at this sta-e= e are $oin- a conceptua* mo$e* as en$+users see the system.
En$+users typica**y thin/ about the name of a person= not the first name 6 fatherJs
name. Or they *oo/ at the a$$ress= but $o not thin/ about it in terms of postco$e=
ton= re-ion etc.
6.5 .elect Identifiers for 'ntity Ty"es
No you nee$ to i$entify can$i$ate /eys for each entity type.
1emember from beforeG a can$i$ate /ey is a minima* super/ey.
%hen or/in- ith an E+1 mo$e*= a can$i$ate /ey for an entity is an attribute or
combination of attributes that uni)uely identifies each instance of the entity.
Cbefore= e ta*/e$ about a /ey uniAue*y i$entifyin- each tup*e this is the eAuiva*ent
for entitiesD.
Nee$ to se*ect one can$i$ate /ey as an identifier for the entity.
/G Choosin- an i$entifier is *i/e choosin- a primary /ey if the choices for
Emp*oyee are Emp*oyeeRName 6 5$$ress Cif e assume that every Emp*oyee has a
$ifferent a$$ressD or Emp*oyeeR#D= hich ou*$ you choose an$ hy&
5G Emp*oyeeR#D because it is *ess *i/e*y to chan-e over time.
The i$entifier attributeCsD is CareD un$er*ine$ in the $ia-ram.
#$entifiers are critica* to $ata inte-rity in a $atabase= so hen se*ectin- i$entifiers= you
shou*$ be carefu*. Some more -ui$e*ines areG
Choose a can$i$ate /ey that is -uarantee$ to a*ays have va*i$ va*ues an$ not
be nu** for each instance. #t may be necessary to use va*i$ation contro*s in the
DB'S to e*iminate the possibi*ity of errors Ce.-. nu**s not a**oe$ in a
co*umn= va*i$ation ru*esD. For a can$i$ate /ey that inc*u$es more than one
attribute= a** the parts of the /ey shou*$ be -uarantee$ to have va*i$ va*ues an$
not be nu**.
5voi$ usin- /eys here part of the va*ue in$icates some c*assification of the
entity instance= or some other property of the entity. For e;amp*e= in a
$atabase that trac/s computer maintenance= computers may be name$ ith a
co$e *i/e ]FBE21] here the first three characters in$icate the *ocation of the
computer. #f the *ocation of the computer chan-es= then the co$e nee$s to
chan-e. Therefore the co$e is not a -oo$ i$entifier.
#f a can$i$ate /ey is a composite of to or more attributes= consi$er creatin- a
ne /ey ith a sin-*e va*ue. For e;amp*e= if a footba** *ea-ue is bein- trac/e$=
a can$i$ate /ey for a <5'E entity type mi-ht be the >omeRTeamRName an$
,a-e (! of 87
E',LOHEE
Emp*oyeeR#D
Emp*oyeeRName
Sa*ary
5$$ress
FBE Computer Science Department Lecture Notes Theory of Databases
the 5ayRTeamRName. This cou*$ be substitute$ ith a ne attribute ca**e$
<ameR#D.
6.6 Identify $elationshi"s 8et4een 'ntities
1e*ationships connect the various components of an E+1 mo$e*. 5 relationshi" is an
association beteen the instances of one or more entity types that is of interest to the
or-aniBation or the system.
EitherG a natura* *in/ beteen entities *i/e Qa $epartment has emp*oyeesJ or
Some event that occurs *i/e Qa computer is maintaine$ by a staff memberJ.
Labe* re*ationships ith verb phrases a $epartment has emp*oyeesK an emp*oyee
!ana&es a $epartment.
1e*ationships are shon in one of to aysG
a *ine connectin- the entities= ith a $iamon$ shape containin- the $escription
of the re*ationship. This is from the Chen notation for E1 $ia-rams.
a *ine connectin- the entities= ith the $escription on the *ine as e have
$ran above. This is from the CroJs Foot notation for E1 $ia-rams.
To sho the $iamon$ notationG
6.6.1 $elationshi" De&ree
The above re*ationships each invo*ve " entities so they are binary.
1e*ationships can a*so be unary invo*vin- 1 entity.
For e;amp*e= if e a$$ that an emp*oyee is mana-e$ by another emp*oyeeG
Ternary re*ationships are a*so possib*e beteen ) entities.
,a-e (( of 87
'ana-es
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
'ana-es
>as
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
>as
FBE Computer Science Department Lecture Notes Theory of Databases
5*so hi-her $e-ree= N+ary re*ationships but these $o not often occur an$ they ma/e
thin-s comp*e;.
',erciseG
1. 5$$ a ,ro3ect entity to your $ia-ram. 5 ,ro3ect has a co$e an$ a start $ate. The
pro3ect co$e i** be the i$entifier.
". Emp*oyees or/ on pro3ects put this re*ationship on your $ia-ram.
6.6.2 $elationshi" Cardinality
Consi$er the re*ationships on our $ia-ram.
Department >as Emp*oyeesG e /no that every emp*oyee must be in a $epartment
an$ one $epartment on*y. %e a*so /no that a $epartment must have 1 or more
emp*oyees.
%e can sho this on the $ia-ram. This is ca**e$ relationshi" cardinality. #t means
the number of re*ationships in hich a -iven entity instance can appear. 5n entity
instance can appear inG
one C1D re*ationship beteen its entity type an$ the other entity type or
any variab*e number CND re*ationships beteen its entity type an$ the other
entity type.
5n Emp*oyee can be in on*y one Department so an instance of Emp*oyee can
appear in on*y one re*ationship ith a Department instance.
5 Department can have many Emp*oyees so an instance of Department can appear
in many CND re*ationships ith Emp*oyee instances.
LDra $ia-ram but $o not a$$ car$ina*ity in$icators 3ust yet a$$ them as e -o
throu-h them= be*o.M
,a-e (. of 87
'ana-es
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
'ana-es
>as
,1O:ECT
,ro3ectCo$e
StartDate
%or/s on
FBE Computer Science Department Lecture Notes Theory of Databases
To sho that a Department instance can be re*ate$ to many Emp*oyee instances put
an N in$icators at the Emp*oyee si$e of the connector.
This is ca**e$ the croJs foot symbo*.
To sho that an Emp*oyee instance can be re*ate$ to on*y one Department instance=
put a *ine across the connector at the Department si$e.
5 Department can have at most N Emp*oyees. The Department must a*so have at
least one Emp*oyee. The !ini!u! cardinality for Department in the re*ationship is
therefore 1. The !a,i!u! cardinality is N. for %e sho this ith a horiBonta* bar
across the connector= a-ain at the Emp*oyee si$e.
/G %hat $o you thin/ is the minimum car$ina*ity for an Emp*oyee instance in the
re*ationship& %hat is the *east number of $epartments an emp*oyee can be in&
#G 1 if e assume that every emp*oyee is a*ays assi-ne$ to a $epartment.
So sho this on your $ia-ram a*so by puttin- another horiBonta* bar at the
Department si$e.
Loo/ at your $ia-ram noN.rea$ it by startin- at the Emp*oyee instance.
Hou rea$ the car$ina*ity for Emp*oyee in the re*ationship by *oo/in- at the symbo*s at
the Department si$e of the connector.
The to *ines ne;t to Department sho that an Emp*oyee instance must be in 1 an$
on*y 1 re*ationship ith a Department instance.
The Emp*oyee entity has !andatory "artici"ation in the re*ationship because it
must be in at *east 1
The Department entity a*so has !andatory "artici"ation because it must have a *east
one Emp*oyee i.e. the minimum cardinality is -.
%e a*so say that an entity that has a min car$ina*ity of 1 in a re*ationship is e,istence
de"endent on the re*ationship instances of the entity cannot e;ist ithout a re*ate$
instance of the other entity. 5n Emp*oyee instance cannot e;ist un*ess there is a
Department instance that the Emp*oyee instance is re*ate$ to.
1"tional "artici"ation is a*so possib*e if the minimum cardinality is .. Consi$er
the Emp*oyee+or/s+on+,ro3ect re*ationship. Let us say that an Emp*oyee can or/
on pro3ects but $oes not have toK a*so that an Emp*oyee can or/ on more than 1
pro3ect at a time.
,a-e (0 of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
>as
FBE Computer Science Department Lecture Notes Theory of Databases
5 ,ro3ect a*ays has at *east 1 Emp*oyee or/in- on it.
Can you sho the car$ina*ity for this re*ationship on your $ia-ram&
Hou can put a 2 to in$icate a car$ina*ity of Bero.
#f the ma;imum car$ina*ity is 1= the re*ationship is sin&le(3alued or functional.
The >as re*ationship is sin-*e+va*ue$ for Emp*oyee.
/G Can you see the re*ationship types C1+1= 1+to+many etcD on the $ia-ram&
Department+Emp*oyee 1+to+many ma; car$ina*ity is 1 in one $irection= many in
the other $irection.
Emp*oyee+,ro3ect many+to+many ma; car$ina*ity is many in both $irections.
#f the ma; car$ina*ity is 1 in both $irections= it is a 1+1 re*ationship.
To -et the re*ationship type= e *oo/ on*y at the maximum cardinalities.
This shos one $ifference beteen an E+1 $ia-ram an$ a re*ation schema in a
$atabase the $b cannot sho minimum car$ina*ities. They are ru*es that must be
imp*emente$ e*sehere in the system= maybe by app*ications insertin- $ata to the $b
e.-. that a ,ro3ect must have at *east one Emp*oyee or/in- on it.
#f the !a,i!u! cardinality is a s"ecific nu!ber e.-. a Department can have
ma;imum (2 emp*oyees= then e can sho that by puttin- the number besi$e the
croJs foot.
Eo!e4or5G input the Emp*oyee= Department an$ ,ro3ect entities to E+1 5ssistant
an$ pro$uce the $ia-ram. 4se this to fi-ure out ho to use the too* you i**
probab*y -et an assi-nment usin- it *ater on.
',erciseG a$$ the car$ina*ity for the manages re*ationship -iven that an Emp*oyee
can mana-e many other Emp*oyees= but an Emp*oyee has to be mana-e$ by one an$
on*y one person.
,a-e (7 of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
>as
,1O:ECT
,ro3ectCo$e
StartDate
%or/s on
'ana-es
FBE Computer Science Department Lecture Notes Theory of Databases
',erciseG $ra an E+1 $ia-ram for the fo**oin- system= inc*u$in- car$ina*ities for
the re*ationships. CThere is some space on the bac/ of your >an$out !.D
Computers have a uniAue co$e= an operatin- system an$ a *ocation.
Staff have an #D= a name an$ a phone number.
Computers are maintaine$ by staff hen they nee$ maintenance.
6.* 6ea5 'ntities
Consi$er the Operatin- System attribute of Computer.
%hat if e say that a computer can have " or more $ifferent operatin- systems
insta**e$ on it&
Cou*$ say that Operatin-System is an attribute that can have " va*ues for one
computer it is mu*ti+va*ue$. %e $onJt ant this situation in a re*ationa* $atabase
Catomic ru*e for attributesD.
So e can ma/e a separate entity type for Operatin- System. ButNin terms of this
system= an operatin- system is an entity that $oes not e;ist un*ess it is insta**e$ on a
computer.
So= Operatin- System is a 4ea5 entity an entity that $oes not e;ist ithout the
e;istence of some other entity. To i$entify instances of the ea/ entity= e nee$ to
associate them ith instances of another Cstron-D entity.
# 4ea5 entity &ets "art or all of its identifier fro! the other entity. #n this case=
Operatin- System -ets part of its i$entifier from the Computer entity.
L$ra ne entity onto $ia-ram= remove Operatin-System attribute from Computer.M
/G %hat is the i$entifier for this ne entity&
#G The Operatin-System name uniAue*y i$entifies each instance for a particu*ar
Computer instance e say that Operatin-System is the discri!inator of the ea/
entity. But to uniAue*y i$entify any instance from a** others= the i$entifier is
Operatin-System 5ND Computer#D to-ether.
Dra the car$ina*ities on your $ia-ram assume a computer must have 1 OS an$ can
have up to ).
5 ea/ entity is in$icate$ by $ia-ona* *ines in the corners of the entity rectan-*e.
Norma* re*ationships shou*$ have a $otte$ *ine hi*e the re*ationship beteen a ea/
entity an$ its onin- entity is comp*ete.
,a-e (8 of 87
CO',4TE1
Computer#D
Name
Operatin-System
ST5FF
Staff#D
Name
FathersName
maintains
CO',4TE1
Computer#D
Name
ST5FF
Staff#D
Name
FathersName
maintains
FBE Computer Science Department Lecture Notes Theory of Databases
The re*ationship beteen the ea/ entity an$ the entity from hom it -ets part of its
i$entifier is ca**e$ an identifyin& relationshi".
This is ca**e$ identification de"endency a specia*ise$ /in$ of e;istence
$epen$ency Chen there is a min car$ina*ity of 1D the ea/ entity is $epen$ent on
the i$entifyin- re*ationship. The ea/ entity a*so borros part of its i$entifier from
the other entity.
5nother type of situation in hich you can have a ea/ entity is hen you have an
entity type that is c*ose*y associate$ ith another entity type an$ in fact $oes not have
a separate i$entity. #f e say that an Emp*oyee has an Office an$ an Office is in a
Bui*$in- Office 6 Bui*$in- are ne entity types. Every Office is in a bui*$in-= so
Office is a ea/ entity ith an i$entifyin- re*ationship beteen it an$ Bui*$in-. See
the Bui*$in-+1oom e;amp*e on p-."17 of 'annino.
6.% #ssociati3e 'ntities
Loo/ a-ain at the computer maintenance $ia-ram.
The maintains re*ationship itse*f has attributes because e nee$ to /no hen the
maintenance happene$= as e** as hich staff member $i$ it.
So the Date is an attribute of the maintains re*ationship. #f the staff member recor$s
some notes about the maintenance= e mi-ht a*so have a Notes attribute.
'any+to+many C'+ND an$ 1+to+many C1+'D re*ationships can have attributes.
5$$ them by puttin- a connectin- *ine from the re*ationship name to each attribute.
1e*ationship attributes are associate$ ith the re*ationship on*y= not ith 3ust one of
the entities.
%e can chan-e the re*ationship into an entity= but ma/e it a 4ea5 entity because the
i$entifier for instances of the ea/ entity consists of i$entifiers from both the other
entities. 5n entity shou*$ have a noun name so eJ** ca** it 'aintenance.
/G Of the other attributes= $o you thin/ either of them i$entifies instances&
,a-e .2 of 87
O,E15T#N<RSHSTE'
Operatin-System
)
has
CO',4TE1
Computer#D
Name
ST5FF
Staff#D
Name
FathersName
maintains
Date Notes
FBE Computer Science Department Lecture Notes Theory of Databases
#f no anser forthcomin- su--est that a computer is maintaine$ on*y once in a
-iven $ay.
#G Date.
%e a*so have to chan-e the car$ina*ities an$ participation for Staff an$ Computer ith
the ne entity.
%hat e have $one no is to rep*ace the many+to+many re*ationship beteen Staff
an$ Computer ith " 1+many identifyin& relationshi"s an$ an associati3e entity.
The 'aintenance entity associates the other " entities an$ a*so i** -et its primary
/ey as a combination of the other primary /eys.
Other notationG some notations $epict an associative entity as a $iamon$ insi$e a
rectan-*e.
',erciseG *et us say that hen an Emp*oyee or/s on a pro3ect= he?she is on the
pro3ect for a fi;e$ *en-th of time so the assi-nment has a start $ate an$ a finish $ate.
5men$ your $ia-ram to sho this you i** use an associative entity for the
re*ationship= ca** it 5ssi-nment.
5ssociative entities can be use$ to associate more than " entities N+ary
re*ationships. These are rare but they $o occur sometimes ternary re*ationships=
beteen ) entities can happen.
9or e,a!"leG ta/e the 5ssi-nment associative entity. #f e a*so say that 1o*e is an
entity in the system e.-. ,ro3ect 'ana-er= 1eAuirements Coor$inator. %hen an
Emp*oyee is assi-ne$ to a ,ro3ect= he?she is assi-ne$ in a particu*ar 1o*e. So
5ssi-nment is no a )+ay re*ationship. #t -ets its primary /ey from a** ) entities.
See a*so 'annino p- """= ,art+Supp*ier+,ro3ect e;amp*e. Note that a ternary
re*ationship can sometimes be rep*ace$ ith " 1+' re*ationships instea$.
6.- Ceneralisation Eierarchies
Some entity types can be c*assifie$ into $ifferent sub+cate-ories.
For e;amp*e some or-aniBations have sa*arie$ emp*oyees an$ vo*untary emp*oyees.
%e can sho this in an E+1 $ia-ram usin- a &eneralisation hierarchy.
Let us say there are " types of Emp*oyee entities Sa*aryEmp an$ @o*untaryEmp.
Sa*arie$ emp*oyees are pai$ a sa*ary hi*e vo*untary emp*oyees receive a $ai*y
a**oance. Other attributes= *i/e #D an$ name= are common to both types.
Depict this as a hierarchy in the E+1 $ia-ram L$ra $ia-ram be*oM.
,a-e .1 of 87
CO',4TE1
Computer#D
Name
ST5FF
Staff#D
Name
FathersName
'5#NTEN5NCE
Date
Notes
FBE Computer Science Department Lecture Notes Theory of Databases
Emp*oyee entity type is the su"erty"e or "arent. Sa*aryEmp an$ @o*untaryEmp are
subty"es or children.
%e can say that a Sa*aryEmp is an Emp*oyee an$ a @o*untaryEmp is an Emp*oyee
this type of re*ationship is often ca**e$ #S+5.
The common attributes are inherited by the subtypes e.-. Emp*oyee#D a*so app*ies
to Sa*aryEmp an$ to @o*untaryEmp.
Each subtype can a*so have its on $irect attributes Sa*ary= Dai*y5**oance.
The D an$ the C in$icate some constraints on the
D is for a Dis3ointness constraintG hen subtypes in the hierarchy $o not have any
entity instances in common.
This one is $is3oint because an Emp*oyee cannot be both a Sa*aryEmp an$ a
@o*untaryEmp.
C is for a Comp*eteness constraintG it means that every entity instance of a supertype
must be an entity instance in one of the subtypes.
This one is comp*ete.
#f e sai$ that some Emp*oyees can be both types= then the hierarchy is not $is3oint
it is overlapping.
#f e sai$ that some Emp*oyees are neither types= then it is not comp*ete e.-. some
emp*oyees are vo*untary but $o not ta/e an a**oance.
The hierarchy can be e;ten$e$ to have more *eve*s.
6.1) #""roach to '($ Dia&ra!s
To he*p you in $rain- E+1 $ia-rams= consi$er these points Cprinte$ on bac/ of
>an$out !DG
9ey thin- isG /eep Thin-s C*ear an$ Simp*e
1. Each 'ntity Ty"e shou*$ mo$e* on*y one concept remember that an Entity
Type is a collection of entities that share common characteristics.
". 5 $elationshi" shou*$ mo$e* one interaction between /ntity ypes.
,a-e ." of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
5$$ress
S5L51HE',
Sa*ary
@OL4NT51HE',
Dai*y5**oance
D>C
FBE Computer Science Department Lecture Notes Theory of Databases
). #ttributes shou*$ mo$e* simp*e concepts. This means that attributes shou*$
not be multi-valued Ci.e. possib*e for one instance to have mu*tip*e va*ues for
the attributeD or be structured Chave $ifferent partsD.
!. #f an attribute is !ulti(3alued= it can be chan-e$ to a ea/ entity that is
connecte$ to the ori-ina* entity type by a re*ationship connector.
1ecap steps to -o throu-hG
1. #$entify entity types if you have DFDs= these are a -oo$ startin- point $ata
stores 6 e;terna* entitiesK rea$ throu-h your $escriptions of the system an$
un$er*ine the nouns these may in$icate entity types
". #$entify attributes of the entity types from your $escription of the system
). Se*ect the i$entifier for each entity type
!. #$entify re*ationships beteen entity types a-ain= from $escription of the
system
No *etJs a$$ Lprompt stu$ents a$$ ea/ entities for hatN.K a$$ associative
entities for hatN.MG
(. 5$$ ea/ entities for mu*ti+va*ue$ attributes an$ i$entification+$epen$ent
entities
.. 5$$ associative entities for '+N re*ationships
These steps are a -ui$e you mi-ht i$entify more entity types as you *oo/ at
re*ationships= for e;amp*e.
There are too*s avai*ab*e for creatin- E+1 $ia-rams e on*y have the E+1 5ssistant
here= hich is Auite simp*e an$ easy to use. Softare *i/e 'S @isio inc*u$e temp*ates
ith the E+1 symbo*s.
1ea$in-G 'annino a fu** e;amp*e= -oin- throu-h a** the steps section 0.(= p- ")2.
Note in particu*ar the Comp*eteness 6 Consistency Chec/s Cp- ")0D.
6.11 Con3ert an '($ Dia&ra! to a $elational Database Desi&n
NoNonce you have a nice E+1 $ia-ram= hat $o you $o ith it&&
5nserG you use it to $esi-n your $atabase. %eJ** *oo/ at ho to convert to a
re*ationa* $atabase.
Crou" acti3ity
Loo/ at your Computers E+1 $ia-ram.
%or/in- in -roups of )+! can you $efine some ru*es for convertin- the $ia-ram to a
re*ationa* $atabase&
Thin/ aboutN.ho to i$entify the re*ations= the attributes in the re*ations= the primary
/eys= the forei-n /eys.
5**o ( minutes for stu$ents or/in- to-ether.
Then each -roup turn to a -roup besi$e them an$ put your ru*es to-ether.
5s/ $ifferent -roups for a ru*e eachNput on boar$.
Tie into the fo**oin-.
,a-e .) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
1. $e"resent entities each norma* entity type becomes a re*ation= ith the
i$entifier bein- the primary /ey an$ the other attributes bein- non+primary+
/ey attributes of the re*ation. For a ea/ entity $oes have a re*ation= but i**
a$$ to the ,9 *ater. For sub+types $o have a re*ation but i** a$$ to the ,9
*ater. On*y create re*ations for parent entity types at this sta-e.
E-sG Staff= Computer= Operatin- System Cea/ entityD
". $e"resent 1(Many relationshi"s D the "ri!ary 5ey of the 1 side beco!es a
forei&n 5ey in the !any si$e Ce.-. 1 Dept has many Emp*oyees Dept#D is
F9 in Emp*oyee re*ationD.
#f 'in car$ina*ity O 1 on the 1 si$e then the F9 cannot a**o nu** va*ues Ce.-.
an Emp*oyee must be in 1 Dept= so the Dept#D F9 cannot a**o nu**sD
Consi$er this e;amp*eG
teaches
Course
CourseCode
CourseName
Instructor
InstructorID
InstName
4sin- this ru*e= the ,9 of #nstructor becomes a F9 in Course.
ButNthe re*ationship is optiona* for Course Cmin car$ O 2 on 1 si$eD so a
Course can have no #nstructor Ce.-. #CT"2" is the pro3ect course there is no
instructor assi-ne$ to itD. This means the F9 can have nu** va*uesN.
Some $b $esi-ners prefer to not have a situation here nu**s are necessary.
FG Can you thin/ ho to a$$ress this&
5G create a ne tab*eG Course#nst CCourseCo$e= #nst#DD CourseCo$e is the
,9= #nst#D is a F9.
This a$$s another tab*e to the $b so Aueries to -et $ata about instructors an$
courses no have to 3oin ) instea$ of " tab*es more comp*e;.
Some $esi-ners /eep the F9 in Course an$ a**o nu**s or you cou*$ have a
$efau*t va*ue of 2 that you $eci$e means no instructor is assi-ne$.
# ou*$ /eep the F9 an$ a**o nu**sN.but it $epen$s on the system.
The ru*e isG
). 1"tional rule for 1"tional 1(Many $elationshi"s if the min car$ina*ity on
the 1 si$e is 2= then the F9 can a**o nu** va*ues. Can avoi$ this by creatin- a
ne re*ation ith ,9 bein- the ,9 of the entity type on the ' si$e. ,9 of
the 1 si$e is F9 in the ne re*ation.
!. $e"resent Many(Many relationshi"s ?associati3e entities@ each one
becomes a separate re*ation. The ,9 is a combination of the ,9s in each of the
entity types in the re*ationship an$ maybe another attribute of the re*ationship
E;amp*esG 'aintenance ,9 is Computer#D= Staff#D an$ Date. Emp+,ro3ect
5ssi-nment entity ,9 is Emp#D= ,ro3ectCo$e Cas an Emp*oyee is assi-ne$
to a particu*ar pro3ect one time on*y= no nee$ for another attribute in the ,9D.
(. $e"resent identifyin& de"endencies Cso*i$ connectin- *inesD each one
a$$s an attribute to the primary /ey ,9 of the connecte$ entity type e.-.
Operatin-System a$$ Computer#D to the ,9.
,a-e .! of 87
FBE Computer Science Department Lecture Notes Theory of Databases

',ercises to do in classG
#f havenJt a*rea$y $one so convert the Emp+Dept $ia-ram on han$out !.
'annino p- "). Cfi-ure 0.)"D= p- ")7 Cfi-ures 0.)) 6 0.)(D
.. CeneraliFation hierarchy the parent entity type an$ each sub+type becomes
a re*ation. The sub+type re*ations inherit the ,9 from the parent but the other
inherite$ attributes appear on*y in the parent.
The ,9 of a sub+type entity is a F9 to the parent re*ation.
D,C
Employee
EmpID
Emp_FathersName
Emp_FirstName
SalaryEmp
(EmpID)
Salary
(Emp_FathersName)
(Emp_FirstName)
VoluntaryEmp
(EmpID)
DailyAllowance
(Emp_FathersName)
(Emp_FirstName)
,arentheses sho inherite$ attributes from E1 5ssistant no nee$ to sho on
$rain- of this $ia-ram.
<et these tab*es Cas/ stu$ents hat type of re*ationships they thin/ there are
shou*$ be 1+1DG
6.12 Nor!alisation of the Data Model
This i** be covere$ in #CT)(" an$ a*so in this semesterJs S5D course.
6.1 +hysical Database Desi&n
Data types etc
Lchec/ ith 5nthoni are they $oin- this in S5D& #f so= i** not $o in this course
i** focus instea$ on in$e; fi*e structuresM
6.14 :ML
* +hysical Database Desi&n
,a-e .( of 87
Emp*oyee
Emp#D
EmpRFathersName
EmpRFirstName
Sa*aryEmp
Emp#D
Sa*ary
@o*untaryEmp
Emp#D
Dai*y5**oance
FBE Computer Science Department Lecture Notes Theory of Databases
1b2ecti3eG by the en$ of this unit= stu$ents i** /no hat tas/s are invo*ve$ in
physica* $atabase $esi-n.
$eadin& !aterialG
+re"aration:
*.1 13er3ie4 D +hysical D8 Desi&n
%hen you convert an E+1 mo$e* to a re*ationa* $atabase schema= you have the tab*es
6 co*umns= ,9s an$ F9s.
5s e** as the *o-ica* $atabase $esi-n Cfrom E+1D= there are other types of information
reAuire$ to comp*ete the physica* $atabase $esi-n.
These areG
Norma*ise$ re*ations= inc*u$in- estimates of the vo*ume of $ata Cnumber of
rosD that i** be store$ in each re*ation
0e are not covering normalisation on this course 1 you should be covering in
S&2 and also on next year,s IC!"# 3&dvanced 2atabases4 course*
5ormalisation is to convert complex data structures to more simple, stable
structures, with no data redundancies*
Definitions of each attribute
ype of data 3text, number, date etc4, number of characters allowed, does it
allow null values and so on*
o the $ata types Le i** be *earnin- SFL $ata typesM
o the entity inte-rity constraints C,9s an$ uniAueness constraintsD
o the referentia* inte-rity constraints CF9sD
o $efinitions of any tri--ers necessary
o va*i$ation ru*es + hich in SFL can be $one ith chec/ constraints
Descriptions of here an$ hen $ata are use$ + entere$= retrieve$= $e*ete$=
up$ate$ inc*u$in- freAuencies
6or a small system, this may not be necessary but for a large system, it will
help later on e*g* when deciding on indexes*
E;pectations an$?or reAuirements for response time
7ow fast does the db have to respond 1 this will affect what indexes you
choose
9no*e$-e an$ un$erstan$in- of the techno*o-ies to be use$ for fi*e stora-e
an$ for the DB'S
If you have understanding of the file % data structures that a 289S uses, it
will help you to understand your database better*
L'ay be ab*e to re*ate these to hat SFL has been covere$ in the *abs if the Tech
5ssistant has covere$ the SFL for tab*es= they i** /no ho to create $ifferent
constraints C,9= F9= uniAue= chec/D.M
,a-e .. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Every fie*$ must have a $ata type. The $ata type is a co$in- scheme use$ by the
DB'S to represent $ata. The $ata type $etermines hat are the va*i$ va*ues for the
fie*$ an$ a*so certain ru*es of behaviour for the fie*$.
#t a*so $etermines ho much space the fie*$ ta/es up on $is/= for each ro of $ata.
Different DB'S app*ications offer $ifferent $ata types.
For e;amp*e= for a numerica* va*ue= 'S 5ccess has types such as *on- inte-er an$
$oub*e.
*.2 Choosin& Data Ty"es
Every fie*$ must have a $ata type. The $ata type is a co$in- scheme use$ by the
DB'S to represent $ata. The $ata type $etermines hat are the va*i$ va*ues for the
fie*$ an$ a*so certain ru*es of behaviour for the fie*$.
#t a*so $etermines ho much space the fie*$ ta/es up on $is/= for each ro of $ata.
Different DB'S app*ications offer $ifferent $ata types.
For e;amp*e= for a numerica* va*ue= 'S 5ccess has types such as *on- inte-er an$
$oub*e.
%hen se*ectin- $ata types= you nee$ to ba*ance these ( ob3ectivesG
'inimise stora-e space
1epresent a** possib*e va*ues for the fie*$
#mprove the $ata inte-rity of the fie*$
Support a** $ata manipu*ations reAuire$ on the fie*$
Ensure that the $ata type i** be suitab*e for future as e** as present nee$s
e.-. that the ran-e a**oe$ by a number fie*$ covers future -roth in va*ues.
Some DB'S pac/a-es provi$e other capabi*ities for certain $ata types. E;amp*es are
ca*cu*ate$ fie*$s an$ co$in-?compression techniAues.
*.2.1 Calculated 9ields
5 ca*cu*ate$ fie*$ is one here the va*ue is $erive$ from other fie*$ va*ues. So a
formu*a can be specifie$ for the fie*$. Some DB'S app*ications a**o a ca*cu*ate$
fie*$ to be e;p*icit*y $efine$ a*on- ith other ra $ata fie*$s.
*.2.2 Codin&GCo!"ression
#f the va*ues in a fie*$ are from a *imite$ ran-e Cnumber or characterD= consi$er
assi-nin- a co$e to each va*ue. So= for e;amp*e= if manufacture$ furniture items are
bein- store$ in a $atabase tab*e= an$ one attribute is the oo$ from hich the item is
ma$e. The possib*e va*ues may be birch= oa/= maho-any= pine= an$ euca*yptus. These
va*ues reAuire a character fie*$ of *en-th 12. >oever= if a co$e= e.-. a sin-*e *etter or
a number is assi-ne$ to each one= the fie*$ $ata type can be character of *en-th 1 or
inte-er. The stora-e space reAuire$ for the fie*$ is thus re$uce$.
This can a*so have $isa$vanta-es users may not reco-nise the co$e va*ues= so they
nee$ to be $eco$e$ by pro-rams that rea$ them.
*. Controllin& Data Inte&rity
'ost DB'S app*ications provi$e further means of contro**in- the inte-rity of $ata.
Some of these are as fo**os.
,a-e .0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
*..1 Default 3alues
5 $efau*t va*ue is a va*ue that i** be assi-ne$ to a fie*$ if no e;p*icit va*ue is
provi$e$. #t is a*so usefu* hen a fie*$ often has the same va*ue.
+icture Control ?+atterns@
,icture contro*s a**o for a pattern to be specifie$ for a fie*$. The pattern can specify=
for e;amp*e= that the character in position 1 must be in the ran-e 5+^ an$ that the
secon$ character must be a $i-it. These can a*so be use$ to format currency va*ues.
*..2 $an&e Control
%hen the va*ues a**oe$ for a fie*$ must be in a specifie$ ran-e= the ran-e can be
$efine. For e;amp*e= here a number must be in the ran-e 2 to 122 or a month fie*$
must have the va*ues :an= Feb= 'ar etc.
*.. $eferential Inte&rity
1eferentia* inte-rity contro*s can be use$ here there is cross+referencin- beteen
attribute va*ues in $ifferent re*ations. This is usua**y use$ for forei-n /ey attributes
i.e. the va*ues in the forei-n /ey fie*$ must e;ist in the re*ate$ primary /ey fie*$.
*..4 Null Halue Control
5 fie*$ can be specifie$ to a**o nu**s or not this $epen$s on the nature of the
attribute. For e;amp*e= hen enterin- a ne customer into a system= the customer
name shou*$ be /non= so this fie*$ shou*$ not a**o nu**s. >oever= it is reasonab*e
to e;pect that the customer]s phone number may not yet be /non= so this fie*$ can
a**o nu**s.
% Inde,in&
1b2ecti3eG by the en$ of this unit= stu$ents i** have *earne$ hat an in$e; is= hy
in$e;es are important= ho in$e;es are imp*emente$ an$ ho to choose in$e;es in a
$atabase.
$eadin& !aterialG 'annino chapter 12K Si*berschatB et a* chapter 11 Cfor revision of
physica* stora-eD an$ chapter 1" C#n$e;in- 6 >ashin-D
4sefu* a*so for stu$ents to revie their notes from the Data Structures course=
particu*ar*y binary search trees.
SFL Server Boo/s On*ine *oo/ up indexes Con the #n$e; tabD= rea$ the architecture
topic -ives some information on ho SFL Server mana-es in$e;es= mentions B+
trees.
5*so use the search tab e.-. search for Qb+treeJ brin-s up severa* resu*ts that i** -ive
you more information on ho SFL Server uses B+trees.
+re"aration
,rint?copy >an$out 7 for the section on Fi*e Structures
,rint ?copy >an$out 8 for the section on #n$e;+SeAuentia* Fi*e Or-aniBation
,rint?copy >an$out 12 *ab or/sheet for tab*es?in$ices for the section on %hy
4se #n$e;es.
,a-e .7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
,rint?copy >an$out 11 AuiB to use in c*ass on comp*etion of the unit.
5*ternative*y= you can 3ust ca** out the Auestions an$ stu$ents can rite them into their
on notes.
%.1 +hysical 3s Lo&ical 3ie4s of data
5t an even *oer *eve*= the actual "hysical stora&e of the $ata on the $is/ is a*so
important. >oever= noa$ays= the DB'S you use i** ta/e care of this for you
base$ on physica* $b $esi-n you specify Cconstraints= $ata types etc as mentione$
ear*ierD.
For this course= e are -oin- to *oo/ at some fi*e structures that are use$ by DB'Ss
for $ata stora-e if you reca** your Computer Or- 6 5rchitecture course= you *earne$
about $ifferent types of stora-e ma-netic $is/= 15' an$ so on.
%hen you $esi-n a re*ationa* $atabase= you specify the re*ations Ctab*esD an$ the
attributes Cco*umnsD in them.
Data is then inserte$ into the tab*es by some means CSFL insert= import to the
$atabase= usin- a $ata entry too*D.
The $b user has a *o-ica* vie of that $ataN.but the ay the $ata is actua**y store$ on
the physica* $is/ may not be the same.
1emember that in a DB'S= the stora-e= retrieva* an$ manipu*ation of $ata shou*$ be
in$epen$ent of the interna* structures this means e can or/ ith the *o-ica* vie
of the $ata ithout havin- to /no how the DB'S stores the $ata on the physica*
$is/.
%e have a*so $iscusse$ the concept of data inde"endence that app*ications usin-
the $ata shou*$ be separate from the $ata structures an$ $ata stora-e.
ButNfor this course= you i** *earn ho this physica* stora-e re*ates to the *o-ica*
vie.
%e i** a*so consi$er ho physica* stora-e affects spee$ ho fast recor$s can be
retrieve$ from the $is/.
L'annino= ,- )10 usefu* $ia-ramsM
1eca** Cfrom Computer Or-aniBation 6 5rchitectureD a bloc5= or "hysical record=
is the sma**est amount of $ata that can be transferre$ beteen secon$ary stora-e an$
primary stora-e in a sin-*e access.
+ri!ary stora&e can be $irect*y accesse$ by the C,4K -ives fast access but *o
capacity. @o*ati*e main memory.
.econdary stora&e ma-netic?optica* $is/= tapes typica**y s*oer access but hi-h
capacity. Stab*e stora-e.
B*oc/s or physica* recor$s are or-aniBe$ into fi*es on the $is/.
Lo-ica* recor$ a ro in a tab*e.
One ,1 cou*$ contain severa* L1s from one tab*e.
Or an L1 cou*$ be containe$ in " or more ,1s.
Or a ,1 cou*$ contain severa* L1s from $ifferent tab*es.
But typica**y a ,1 contains severa* L1s.
,a-e .8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Typica* siBe of a ,1G a number of bytes from a poer of " e.-. 12"! bytes C1/b= "_
12D or !28. C!/b= "_1"D.
The OS *oo/s after the fi*e or-aniBation on $is/. The DB'S /nos about the *o-ica*
vie of the $ata. The " OS 6 DB'S must or/ to-ether to transfer the bytes of
$ata to app*ications hen nee$e$.
5pp*ications i** as/ for $ata in *o-ica* recor$s in $ifferent aysG
SeAuentia* retrieva* e.-. a** the ros in a tab*e
1etrieva* by a search /ey e.-. a** Customers hose first name is Te/*e or a**
Customers or$ere$ by the CustomerName.
The $ifferent app*ications runnin- un$er the OS usua**y have their on areas of
memory= ca**e$ buffers. Bytes of $ata for *o-ica* recor$s must be transferre$ to the
appropriate buffer hen nee$e$.
#f $ata is reAueste$ an$ it is a*rea$y in the buffer= it $oes not nee$ to be transferre$
a-ain.
The overa** performance of app*ications usin- the $b i** $epen$ on a number of
factors. Base$ on hat e have been ta*/in- about= these factors areG
,hysica* recor$ transfers Cto buffersD
C,4 operations?cyc*es CreAuire$ to $o thin-s *i/e sortin- the ros in a tab*e to
or$er by a particu*ar co*umnD
'ain memory Cho much memory is avai*ab*eD
Dis/ space Cho much $is/ space is avai*ab*eD
<enera**y= you are or/in- ith fi;e$ amounts of memory an$ $is/ space. But you
can increase them by -ettin- a ne ,C or a$$in- to the e;istin- one.
%hat e can attempt to contro* is the number of physica* recor$ transfers by tryin-
to or-aniBe the $ata on $is/ in a ay that minimises ho much has to be transferre$
for the types of reAuest our app*ications ma/e to the $atabase.
%e can a*so try to minimise the C,4 operations reAuire$ by consi$erin- hat the
most freAuent Aueries are on the $ata an$ ho the $ata shou*$ be or$ere$ on the $is/.
',a!"leG consi$er the fact that the *o-ica* recor$s can be or$ere$ on the $is/= in the
physica* recor$s= on*y one ay e.-. in or$er of Customer#D or CustomerName but
not by both. But app*ications i** ant to -et the Customer $ata in $ifferent or$ers for
$ifferent purposes or they i** ant to -et one customer -iven a Customer#D va*ue or
a CustomerName va*ue. #f it is more common to Auery the Customer $ata by the
CustomerName= then it ou*$ ma/e more sense to have the physica* or$er be by
CustomerName= even if Customer#D is the primary /ey.
This is hy e have in$e;es e can in$e; a tab*e by the primary /ey an$ a*so by
other co*umns.
%.2 9ile .tructures
#n or$er to -et an un$erstan$in- of in$e;es hy they are use$ an$ ho they are
imp*emente$= e i** start by *oo/in- at some $ifferent fi*e structures that can be
use$ by DB'Ss.
,a-e 02 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Hou i** be fami*iar ith some of these from your Data Structures 6 5*-orithms
course.
%e i** *oo/ at these types of fi*e or-aniBationG
SeAuentia* fi*es
>ash fi*es
B trees
Distribute >an$out 7 te** stu$ents to use $urin- c*ass= they shou*$ try to fi** in the
spaces by *istenin- to hat you are sayin-. The purpose of this is "+fo*$ 1= to -ive
them a note+ta/in- ai$ an$ "= to -ive them a chance to practice Qactive *istenin-J= an
important s/i** they shou*$ be ab*e to *isten to hat you are sayin- an$ pic/ out the
important points.
%.2.1 .e=uential files
:nordered se=uential file
Simp*est fi*e structure *o-ica* recor$s store$ in the or$er in hich they are inserte$
Ne recor$s appen$e$ to the en$ of the fi*e
The fi*e is unor$ere$ because there is no or$erin- base$ on va*ues in the recor$s e.-.
base$ on CustomerName. But if each ne Customer is -iven the ne;t Customer#D in
a numerica* seAuence= then the fi*e happens to be or$ere$ by the Customer#D.
5*so ca**e$ a hea" fi*e.
De*ete a *o-ica* recor$ *eaves a space that can be fi**e$ by a ne recor$. " ays to
approachG
/eep a$$in- recor$s at the en$ of the fi*eK perio$ica**y reor-aniBe$ the fi*e to
free up the space.
mar/ the $e*ete$ recor$ as free spaceK hen a$$in-= *oo/ for free space an$ fi**
it.
#d3anta&eG fast insertion Cat the en$= or into spaces create$ by $e*ete$ recor$sDK fast
to retrieve in or$er in hich recor$s ere entere$.
1rdered se=uential file *o-ica* recor$s arran-e$ in or$er of a /ey Cone of the
co*umnsD. 4sua**y the /ey is the ,9 co*umn but not a*ays.
#d3anta&eG fast retrieva* if retrievin- a subset of recor$s or$ere$ by the /eyK fast for
seAuentia* searches
Disad3anta&eG s*o insertion= because recor$s have to reor$ere$ to /eep the /ey
or$er. E.-. if /ey is CustomerName= hen a ne Customer recor$ is inserte$= have to
put it in the ri-ht position for the name va*ue so the recor$s are sti** or$ere$ by the
name.
%.2.2 Eash files
Le;pect that stu$ents /no hash fi*es a*rea$y from the Data Structures courseM
SeAuentia* fi*e access is not fast hen you ant to access in$ivi$ua* recor$s by /ey
va*ue.
5 hash fi*e consists ofG a *ist of /ey va*ues 6 recor$ a$$resses CpointersD an$ the
recor$s.
5 hash function is app*ie$ to each /ey va*ue to -ives a physica* recor$ a$$ress here
the information associate$ ith the /ey is store$.
,a-e 01 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
'o$ is an e;amp*e of a simp*e hash function.
Collisions can occur here to /eys hash to the same physica* a$$ress. #f there is
not enou-h space in the physica* recor$ for both *o-ica* recor$s= there is a prob*em.
This reAuires co**ision han$*in- for e;amp*e= puttin- the *o-ica* recor$ in the ne;t
avai*ab*e physica* recor$ space Cthis is ca**e$ *inear probe co**ision han$*in- there
are other techniAuesD.
The hash function can be chosen to minimise co**isions e.-. usin- a prime number for
the mo$ function.
5 fi*e that is not fu** is *ess *i/e*y to have co**isions.
#f the fi*e -ets fu**= reor-aniBation is necessary to insert a** the *o-ica* recor$s into a
ne= bi--er hash fi*e.
To avoi$ this= $ynamic hash fi*es can be use$. 5 $ynamic hash fi*e can -ro
automatica**y. The space a**ocate$ for the hash fi*e is $ivi$e$ into buc/ets a buc/et
can ho*$ mu*tip*e *o-ica* recor$s.
#f a buc/et is fu**= it is sp*it into " buc/ets an$ the recor$s $istribute$ over the "
buc/ets.
#d3anta&eG fast for insertion 6 retrieva* Cif there are no co**isionsDK fast for searches
by the /ey va*ue e.-. fin$ a Customer -iven the Customer#D= if the Customer#D is the
/ey va*ue.
Disad3anta&eG not so fast for seAuentia* searches e.-. -et a** Customers or$ere$ by
the Customer#D or the CustomerName. This is because a -oo$ hash function ten$s to
sprea$ the *o-ica* recor$s uniform*y across the physica* recor$s in the hash fi*e.
So many physica* recor$ accesses are reAuire$ each access costs in terms of
resources C*i/e C,4 6 transferrin- to buffersD.
%. Inde,(.e=uential 9ile 1r&aniFation
%hat is rea**y necessary in a DB'S is fast seAuentia* an$ /ey access to $ata.
SeAuentia* fi*es 6 hash fi*es ays of or-aniBin- $ata on $is/.
To -et faster access= e use somethin- ca**e$ an in$e;.
%e are -oin- to ta*/ aboutG
%hat is an in$e;&
>o are in$e;es imp*emente$ by a DB'S&
%hy are in$ices important in a $atabase&
%..1 6hat is an inde,7
Thin/ of an index in a book you can *oo/ up a or$ in the in$e; an$ it -ives you a
pa-e number or numbers to *oo/ at. #n terms of fi*e structures= an in$e; is simi*ar it
is a *ist of /ey va*ue an$ a$$resses for recor$s.
'ore forma**yG
an inde, is a structured collection of 5ey 3alue 0 address "airs. The "ur"ose of
an inde, is to facilitate access to a collection of records.
L1efer to >an$out 8= fi-ure 1 6 fi-ure " an e;amp*e $atabase tab*e an$ an in$e; on
it.M
,a-e 0" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%..2 +ri!aryGClusterin& Inde,
1i-ht si$e of fi-ure " an or$ere$ seAuentia* fi*e of the Customer $ata or$ere$ on
Customer#D.
%e can have an in$e; for the fi*e the in$e; consists of a *ist of possib*e va*ues for
the co*umn that is use$ to or$er the fi*e i.e. Customer#D. The va*ues in the in$e; are
a*so store$ in or$er. This is shon at the *eft of Fi-ure ".
Ca** the Customer#D va*ues the search 5eys. Each search /ey is paire$ ith a pointer
to the recor$ that has that Customer#D va*ue.
+ointerG consists of the identifier of a disk block an$ an offset within the disk block to
i$entify the *o-ica* recor$ ithin the b*oc/.
Because Customer#D is a can$i$ate /ey Ca*so happens to be the primary /eyD= each
search /ey in the in$e; correspon$s to on*y 1 *o-ica* recor$ in the $ata fi*e.
0hat if the search key is not a candidate key& Can have many recor$s ith that /ey
va*ue so the pointer is to the first recor$ in the 1
st
b*oc/ containin- recor$s ith that
/ey va*ue. Because the fi*e is an or$ere$ seAuentia* fi*e= a** recor$s for that /ey va*ue
are in seAuence.
#f a fi*e is or$ere$ by a non+can$i$ate /ey= it is usua**y or-aniBe$ so that *o-ica*
recor$s ith $ifferent /ey va*ues are in $ifferent b*oc/s.
The in$e; fi*e itse*f can a*so be store$ as a separate seAuentia* fi*e but it ta/es up a
*ot *ess space than the fi*e containin- the $ata so it ta/es up feer b*oc/s.
L$urin- this session= you can test ho e** the stu$ents reca** hat they *earne$ in the
Data Structures 6 5*-orithms course by as/in- some Auestions about the a*-orithms
6 their comp*e;ityM
/G hat a*-orithm cou*$ be use$ to search the seAuentia* in$e; fi*e&
#G 5 binary search algorithm.
/G %hat is the Bi-+O comp*e;ity of a binary search a*-orithm&
#G O C*o- nD comp*e;ity
#f e say n is the number of b*oc/s Cphysica* recor$sD that the fi*e is store$ in= n is
sma**er for the in$e; fi*e than for the $ata fi*e. So it is faster to search the in$e; fi*e.
#f a fi*e occupies b b*oc/s= a binary search nee$s to rea$ up toG
*o-
"
CbD b*oc/s C means roun$ up i.e. cei*in-D.
#f the search /ey is the va*ue that or$ers the seAuentia* fi*e= the in$e; is a "ri!ary
inde, C$ifferent to primary /eyD.
5*so ca**e$ a clusterin& inde,. #n other or$s the physica* or$er of recor$s on $is/
is the same as the or$er of va*ues in the in$e;.
',erciseG ta/e a fe minutes to rea$ >an$out 8 E;amp*e 1.
1evieG in this e;amp*e= the number of in$e; entries Cr
i
D is eAua* to the number of
b*oc/s in the $ata fi*e. This type of in$e; $oes not have an entry for every possib*e
search /ey va*ue. This is ca**e$ a s"arse in$e; see Fi-ure ) on >an$out 8. The
,a-e 0) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
opposite= here the in$e; has an entry for every possib*e search /ey va*ue= is a dense
in$e;. %eJ** ta*/ more about these *ater.
'a/e sure a** stu$ents have un$erstoo$ the improvement in performance by as/in-
these AuestionsG
aD %hich is faster to search the or$ere$ $ata fi*e or to search the in$e; fi*e&
Lin$e;M
bD %hy& Lsma**er number of b*oc/ accesses reAuire$M
cD 1eca*cu*ate if the in$e; is $ense i.e. has an entry for every search /ey= hat
is the number of b*oc/ accesses reAuire$&
bfr
i
O .7 Cas beforeD
r
i
O )2=222 Cnumber of in$e; entriesD
b
i
O )2=222?.7 O !!" Cnumber of b*oc/s reAuire$D
*o-" C!!"D O 8 C*oc/ access to search the in$e;D
8 E 1 O 1) Cto access the $ata b*oc/D
$D Base$ on the anser to CcD= hich $o you thin/ is better a $ense or a sparse
in$e; ith a pointer to each b*oc/ of the $ata fi*e& L$ense in$e; is faster than
searchin- the $ata fi*e but on*y 3ustN.sparse in$e; is betterM
%.. .econdaryGNon(clusterin& Inde,
L1efer to >an$out 8= fi-ure ! note that one of the in$e; va*ues has " pointers from
itM
Can have an in$e; on a search /ey that is not the va*ue that the recor$s are or$ere$ by.
E.-. an in$e; on CustomerName. Each CustomerName points to a** the recor$s that
have that name va*ue. This is a secondary or non(clusterin& inde,.
Consi$er the improvement in searchin- this -ives usN.
/G Bi-+O comp*e;ity for a *inear search on the $ata fi*e&
#G OCnD C*inear comp*e;ityD= nOnumber of b*oc/s
Cou*$ have to rea$ b?" b*oc/s on avera-eD
But a binary search on the secon$ary in$e; i** be much faster.
',erciseG ta/e a fe minutes to rea$ >an$out 8 e;amp*e ".
Shos that there is a -reater improvement than that of a primary in$e; over binary
search of the $ata fi*e.
5 seAuentia* $ata fi*e can have more than one in$e; but on*y one of them can be a
primary in$e; because the *o-ica* recor$s can be in b*oc/s in one or$er on*y.
%..4 Dense 0 ."arse Indices
There are " types of or$ere$ in$e;G
Dense inde,G every search /ey va*ue has an entry in the in$e;. To fin$ a *o-ica* $ata
recor$ base$ on a /ey va*ue= fin$ the /ey va*ue in the in$e;= then fo**o the pointer to
the b*oc/ 6 recor$.
."arse inde,G the in$e; has entries on*y for some of the search /ey va*ues. To fin$ a
*o-ica* $ata recor$ base$ on a /ey va*ue= fin$ the *ar-est /ey va*ue that is *ess than or
eAua* to the search va*ueK fo**o the pointer to the b*oc/ an$ then fo**o the pointers
insi$e the fi*e unti* the *o-ica* recor$ is foun$.
,a-e 0! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
4sua**y store an in$e; entry for each b*oc/ so $onJt have to rea$ another b*oc/ to
fin$ the /ey va*ue.
',erciseG Hou are -iven the Cust#D va*ue C)!88(2) use the binary search
a*-orithm to fin$ the *o-ica* recor$ containin- this search /ey va*ue= usin- first fi-ure
" an$ then fi-ure ).
/G %hat va*ue in the in$e; $o you fo**o a pointer from in each fi-ure&
#G in fi-ure "G from the C)!88(2) /ey va*ueK in fi-ure )= from the C))!28(8 va*ue
Cbecause it is the hi-hest va*ue that is *ess than the one e antD.
/G 1efer to >an$out 8 fi-ures "= ) 6 ! hich is $ense an$ hich is sparse&
#G fi-ure " is $ense= fi-ure ) is sparse= fi-ure ! is $ense.
FG cou*$ the secon$ary in$e; in fi-ure ! be sparse&
5G no= because the or$er of the *o-ica* recor$s may not be the same as the or$er of the
/eys so the a*-orithm to *oo/ seAuentia**y for the search /ey va*ue onJt or/.
5 primary in$e; can be $ense or sparse. The a*-orithm for searchin- the sparse in$e;
i** or/ because the or$er of the /eys in the in$e; is the same as the or$erin- of the
*o-ica* recor$s.
#f it is $ense every *o-ica* recor$ has a pointer to it
#f it is sparse on*y some *o-ica* recor$s have a pointer to them
Sti** or/s if the search /ey is a can$i$ate /ey or not.
So= a secondary index always has to be dense.
Secondary index on a candidate key *oo/s *i/e a $ense primary in$e; but the
pointers are not to successive *o-ica* recor$s in the fi*e. Each va*ue in the in$e; has a
pointer to one recor$ on*y.
Secondary index on a non-candidate key each search /ey va*ue in the in$e; may
have pointers to more than one recor$. Cannot have a pointer to the first recor$ ith
the va*ue because other recor$s may be scattere$ throu-hout the fi*e.
L1efer to >an$out 8= fi-ure ( $ifferent to fi-ure !M
Sometimes the secondary index has an extra level of indirection so that each search
/ey has on*y 1 pointer from it. This /eeps the in$e; entries to a fi;e$ *en-th.
The pointer is to a buc/et Ca b*oc/ of space that can ho*$ mu*tip*e $ata recor$sD. The
buc/et ho*$s pointers to a** the *o-ica* recor$s that have the /ey va*ue in them. #f the
buc/et cannot ho*$ a** the pointers necessary= it can e;pan$ automatica**y e.-. by
usin- a *in/e$ *ist to pointers outsi$e the buc/et.
Comparison of $ense 6 sparse in$ices
."arseG ta/es up *ess space= an$ *ess or/ for insertin-?$e*etin- C$onJt a*ays have to
insert or $e*ete a /ey va*ue from the in$e;D
DenseG ta/es up more space but faster to fin$ a *o-ica* recor$ as $o not have to scan
*o-ica* recor$s hen searchin- for a /ey va*ue that is not in the in$e;.
%.4 Modifyin& Inde,(.e=uential files ?insert> u"date> delete@
,a-e 0( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The a*-orithms for insertin- an$ $e*etin- vary s*i-ht*y $epen$in- on the type of the
in$e;.
Both the in$e; an$ the $ata fi*e must be up$ate$.
%.4.1 Delete fro! the data file
%e a*rea$y ta*/e$ about ho deletion of *o-ica* recor$s can be han$*e$ C" metho$s
/eep a$$in- to the en$ 6 $o a fi*e reor- perio$ica**y 6 mar/ free spaceD.
%.4.2 Insert to the data file
Insertions for an or$ere$ fi*e have to fin$ the correct position= base$ on the
or$erin- va*ue= then ma/e space to put it in. #nvo*ves movin- recor$s on avera-e=
have to move ha*f the recor$s.
Other options for insertionG
1. 9eep some unuse$ space in each b*oc/K but hen they fi** up= same
prob*em.
". insert to a temporary unordered fi*e ca**e$ an overf*o fi*e.
,erio$ica**y= the overf*o fi*e is sorte$ an$ recor$s mer-e$ ith the
main?master fi*e. But this ma/es the search a*-orithms more comp*e;
have to $o *inear searches on the overf*o fi*e.
%.4. :"date the data file
'o$ifications?up$ates to $ata " factorsG
1. search con$ition to fin$ the *o-ica* recor$
". the fie*$ to be mo$ifie$
>ave to first fin$ the recor$ to be up$ate$ either binary search if va*ue of the search
/ey in the recor$ is /non O1 *inear search on the $ata fi*e.
Then $o the up$ate if a non+or$erin- fie*$= chan-e the recor$ an$ rerite in same
a$$ress.
#f the or$erin- fie*$ may have to chan-e position in fi*e OP $e*ete o*$ recor$ then
insert ne recor$.
%.4.4 #l&orith! for insertin& to the inde,
Lref ,- !(1 in Si*berschatB et a*M
,erform a *oo/ up for the search /ey va*ue that is in the ne recor$
Dense in$e;G
If /ey is not in the in$e;= insert an in$e; entry ith the ne va*ue= in
the correct position
'lse if I5ey is in the inde, andJ in$e; has pointers to a** recor$s for
the /ey va*ue= a$$ a pointer to the ne recor$
'lse if I5ey is in the inde, andJ in$e; has pointers on*y to first recor$
for the /ey va*ue= ensure the recor$ is after other recor$s ith the same
/ey va*ue.
Sparse in$e; Centry for each b*oc/DG
If a ne b*oc/ has been create$= insert the first /ey va*ue in the b*oc/
into the in$e;.
'lse if the ne recor$ has the *oest /ey va*ue in its b*oc/= up$ate the
in$e; entry pointin- to that b*oc/= so it has the ne /ey va*ue
'lse no chan-e to the in$e;.
,a-e 0. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%.4.5 #l&orith! for deletin& fro! the inde,
Lref ,- !(1 in Si*berschatB et a*M
Loo/ up the recor$ to be $e*ete$
Dense in$e;G
If $e*ete$ recor$ as on*y one ith the search /ey va*ue= $e*ete the
in$e; entry
'lse if the in$e; has pointers to a** recor$s ith the same /ey va*ue=
$e*ete the pointer to the $e*ete$ recor$
'lse if the in$e; has on*y a pointer to the first recor$ ith the /ey
va*ue if $e*ete$ recor$ as the first recor$= up$ate the pointer to
point to the ne;t recor$.
Sparse #n$e;G
If in$e; contains an entry for the /ey va*ue of the $e*ete$ recor$
If $e*ete$ recor$ as on*y recor$ ith the /ey va*ue= rep*ace
the in$e; entry ith an entry for the ne;t search /ey va*ue Cin
or$erD. #f ne;t search /ey va*ue a*rea$y has an entry= $e*ete the
in$e; entry instea$ of rep*acin- it.
'lse if in$e; entry for the /ey va*ue points to the $e*ete$
recor$= up$ate the in$e; entry to point to the ne;t recor$ ith
the same /ey va*ue.
'lse $o nothin- Cno chan-e reAuire$D.
%.5 Multile3el Indices
#n$e;+seAuentia* fi*e or-aniBation binary search to fin$ *o-ica* recor$s nee$s
*o-
"
CbD accesses if the in$e; fi*e is in b b*oc/s each step of the a*-orithm re$uces
the part of the in$e; fi*e that e continue to search by an in$e; of ".
Even ith a sparse in$e;= the in$e; fi*e can become very bi- thus ma/in- the binary
search *ess efficient. For e;amp*eG itJs not uncommon to have a $atabase tab*e
containin- 122=222 recor$s.
b*oc/in- factor= bfr O 12K assume one in$e; entry per b*oc/ OP 12=222 recor$s in the
in$e;.
5ssume 122 in$e; recor$s in a b*oc/ Cin$e; recor$s sma**er than $ata recor$sD= store$
as a seAuentia* fi*e on $is/.
Lrecor$G here= e are ta*/in- about a basic unit of stora-e on $is/ simi*ar to a recor$
in a $atabase tab*e but not Auite the same conceptM
#f in$e; is sma** enou-h can /eep in main memory OP fast access.
#f *ar-e /eep on $is/ OP nee$ severa* b*oc/ accesses to $o the search.
122 in$e; recor$s in a b*oc/K 12=222 in$e; entries OP 12=222?122 O 122 b*oc/s
binary search nee$s *o-
"
C122D O 0 b*oc/ rea$s.
#f a b*oc/ rea$ ta/es )2ms= search ta/es up to "12ms Auite s*o.
,a-e 00 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#f e can re$uce the part of the in$e; that e continue to search $urin- the binary
search a*-orithm= e can re$uce the number of b*oc/ accesses.
%e $o this ith a mu*ti+*eve* in$e;= *i/e thisG
Create an in$e; on the in$e; fi*e itse*f this is the secon$ Cor outerD *eve* of
the in$e;.
The ori-ina* in$e; is the first Cor innerD *eve*.
The secon$ *eve* in$e; has an entry for the /ey va*ue of the first entry in each
block of the first *eve* in$e;K the pointer is to the b*oc/. This secon$ *eve* is
itse*f a primary in$e; on the in$e;.
Search a*-orithmG
Binary search on secon$ *eve* in$e; fin$ recor$ for *ar-est search /ey va*ue
SO search va*ue
Fo**o the pointer binary search the b*oc/ for *ar-est search /ey va*ue SO
search va*ue
Fo**o the pointer i** be to the recor$ Cfor a can$i$ate /ey $ense in$e;D or
the b*oc/ containin- the recor$ Ccan$i$ate /ey sparse in$e; or non+can$i$ate
/eyD
The process can be repeate$ create a thir$ *eve* in$e; on the secon$ *eve* an$ so on
/eep repeatin- if the ne *eve* nee$s more than one b*oc/ for stora-e i.e. unti* the
entries in a *eve* a** fit in 1 b*oc/.
Consi$er the performanceG
5ssume first *eve* has r
1
entries an$ the b*oc/in- factor is bfr OP number of b*oc/s b
O r1?bfr
5n$ a*so secon$ *eve* has r
"
Or
1
?bfr entries.
5 thir$ *eve* ou*$ have r
)
Or
"
?bfr entries.
5n$ so on.
Search of a mu*ti+*eve* in$e; nee$s appro;. *o-
bfr
CbD b*oc/ accesses an
improvement on *o-
"
CbD if bfrP ".
',erciseG ta/e a fe minutes to rea$ >an$out 8 e;amp*e ).
%.6 .u!!ary D Inde,(.e=uential 9ile 1r&aniFation
1evie
Or$ere$ seAuentia* fi*es ith an in$e;
fast seAuentia* access
fast /ey search Cusin- in$icesD
,erformance $e-ra$es as the $ata fi*e an$ the in$e; fi*e -ro in siBe particu*ar*y if
overf*o fi*e is use$ C*inear searchesD.
Can fi; this by reor-aniBin- the fi*e but freAuent reor-aniBation is not $esirab*e.
5 common*y use$ fi*e structure for in$ices is a B Tree maintains its efficiency
$espite insertion 6 $e*etion. Base$ on search trees so *etJs $o a Auic/ recap of
,a-e 07 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
binary search trees 6 search trees + hich you shou*$ have covere$ in the Data
Structures course.
Lperhaps $o this by as/in- the c*ass to come up ith hat they /no about binary
search trees hat are the characteristicsM
%.* 8inary .earch Trees ( $eca"
5 binary tree has ma;imum of " branches from each no$e.
#n a binary search tree= for each no$e= the va*ues in the *eft sub+tree are *oerK the
va*ues in the ri-ht sub+tree are hi-her.
To retrieve a** the /ey va*ues from the tree= in seAuentia* or$er= $o an in+or$er
traversa*.
Lfrom Data StructuresG tree traversa*G $epth+first +P rea$ a ho*e branch before the
ne;tK brea$th+first+P rea$ a ho*e *eve* before the ne;tK traversa* means to visit a** the
no$es in the tree.
or$erG hen the no$e is visite$ OP in+or$er O *eft tree= no$e= ri-ht treeK pre+or$er O
no$e= *eft tree= ri-ht treeK post+or$er O *eft tree= ri-ht tree= no$eD
/G a*-orithm for a binary search&
#G compare search va*ue ith the root if it is eAua*= search terminates. #f it is
-reater= procee$ $on the ri-ht sub+tree. #f it is *ess= procee$ $on the *eft sub+tree.
%.*.1 8inary .earch Tree as an Inde,
5 binary search tree can be use$ as the structure for the in$e; fi*es e have $iscusse$=
instea$ of storin- an in$e; in a seAuentia* fi*e.
1ather than storin- ho*e $ata recor$s in a tree Cthis ou*$ ma/e the tree very bi- in
terms of $is/ spaceD= e store the in$e; entries Cva*ue 6 pointer pairsD in the tree.
Each no$e can store a search /ey va*ue an$ a pointer to the $ata recor$ containin- that
search /ey Cor a buc/et of pointers to mu*tip*e recor$sD.
5 no$e can a*so ho*$ up to " more pointers= to its *eft an$ ri-ht sub+trees.
The va*ue store$ in the in$e; is the search 5ey. Each /ey has associate$ infor!ation.
Sho a $ia-ram *i/e this Ccomp*ete to sho pointers to $ata recor$s from every no$e
in the treeK a*so put a 2 in the pointers from *eaf no$es to sho that they are nu**
pointersD.
,a-e 08 of 87
5$$is 5baba

Bahir Dar

NaBret

5assa

Dessie

<on$er
'e/e**e

Data fi*e
5$$is 5babaN.
5assaN..
NNNNN
FBE Computer Science Department Lecture Notes Theory of Databases
9i&ure 1 D a binary search tree 4ith "ointers to data records
This e;tension of the binary search tree to inc*u$e pointers to $ata recor$s outsi$e the
tree itse*f ma/es it an inde,.
The a$$resses of the recor$s in the $ata fi*e are $etermine$ by the strate-y use$ to
insert recor$s to the $ata fi*e.
%.% M(4ay search trees
5n m+ay search tree can have up to m branches from each no$e.
This means that a no$e can store up to Cm+1D /ey va*ues an$ can have up to m
branches from it.
L1ef >an$out 7 m+ay search trees $ia-ramM
Dra this $ia-ram on the boar$ , in$icates a pointer= 9 in$icates a /ey va*ue.
This shos a no$e in an m+ay search tree= here the no$e has n /ey va*ues in it= so
it is of or$er CnE1D.
,
2
9
2
,
1
91 ,
"
N.. ,
n+1
9
n+1
,
n
9i&ure 2 D node in an !(4ay search tree ?sub(trees under the arro4s@
The /ey va*ues insi$e a no$e are store$ in ascen$in- or$er i.e. 9
i
S 9
iE1
for i O 2 to n+"
5s before= the branches from a no$e are pointers to the root no$es of its sub+trees.
#n an m+ay tree= a no$e can contain up to m+1 va*ues.
5 no$e containin- n /ey va*ues CnSO m+1D has nE1 pointers or branches.
For each /ey va*ue= the va*ues in the sub+tree to the *eft of it are *ess than it an$ the
va*ues in the sub+tree to the ri-ht of it are -reater than it.
#n other or$sG
all key values in the nodes pointed to by :
i
are less than ;
i
and are greater than ;
i--
for i < . to n--
The sub+trees in the m+ay tree are a** themse*ves m+ay search trees.
For a -iven number of /ey va*ues= an m+ay search tree i** have a sma**er hei-ht
than a binary search tree.
The ma;imum search *en-th is the hei-ht of the tree i.e. to fin$ a -iven /ey va*ue= the
ma; number of no$es that must be rea$ is the hei-ht of the tree. No more than one
no$e is visite$ on any -iven *eve* of the tree.
#f the tree hei-ht is minimiBe$= then the search time can be minimiBe$.
,a-e 72 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
To $o this= the tree shou*$ have as many branches from each no$e as possib*e i.e. a
hi-her va*ue of m. #n other or$s= a lo4(hei&ht> bushy tree -ives faster searches.
The a*-orithm for searchin- an m+ay tree is simi*ar to that for a binary search tree=
e;cept it has to scan the array of /ey va*ues in each no$e to fin$ the /ey or to fin$ the
pointer to the ne;t branch to fo**o.
1efG >an$out 8= fi-ure . shos a )+ay search tree.
%.- 8 Trees
The B+tree structure is a specia* form of an m+ay search tree that has become the
most popu*ar for or-aniBin- in$e; structures because it -ives -oo$ performance for
both seAuentia* an$ /ey searches. #t is a*so more efficient for insertin-?$e*etin-
compare$ to the more basic in$e;+seAuentia* fi*e or-aniBation.
5 B+tree is an m+ay search tree ith these propertiesG
Each no$e of the tree= except for the root and the leaf nodes= has at *east Cm?"D
sub+trees an$ no more than m sub+trees i.e. each no$e is at least half(full.
The root of the tree must have at least 2 subtrees= un*ess it is itse*f a *eaf
no$e. This forces the tree to branch ear*y so searchin- is faster.
5** *eaf no$es of the tree must be on the same *eve*. This -ives faster
searchin-.
NoteG Some te;ts say that the *ast property above is that the tree shou*$ be ba*ance$.
But this is a $ifferent $efinition of ba*ance$ to hat stu$ents have *earne$ in Data
Structures course. To chec/ their /no*e$-e from Data Structures= as/ these
AuestionsG
/G %hat is a hei-ht+ba*ance$ tree&
#G %hen the $ifference beteen the hei-hts of the *eft an$ ri-ht sub+trees is 2 or 1 for
a** no$es in the tree.
/G %hat is a perfect*y ba*ance$ tree&
#G a tree that is hei-ht+ba*ance$ an$ for hich a** *eaf no$es are on 1 or " *eve*s.
To avoi$ confusion= e i** not use the or$ ba*ance$ here e i** say that a** *eaf
no$es must be on the same *eve*.
5** of these properties he*p to optimise the performance of a search tree in terms of
searchin- an$ insertion?$e*etion of /eys.
#n a$$ition= a B+tree is often imp*emente$ so that each node is a block on disk.
The capacity of each no$e i.e. the va*ue of m is then $etermine$ by the physica*
recor$ siBe= the /ey siBe an$ the pointer siBe Ca** in bytesD.
The hei-ht of the tree can be minimise$ by ma;imisin- the number of /eys store$ in
each no$e.
The hei-ht a*so then $etermines ho many physica* recor$ accesses are reAuire$ to
fin$ a -iven /ey va*ue.
1efG >an$out 8= fi-ure 0. This is a B tree of or$er ) containin- the same /ey va*ues as
the )+ay search tree in fi-ure .. Chec/ for yourse*f that a** three properties are
fu*fi**e$.
,a-e 71 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#nsertion 6 $e*etion in a B+tree is a bit more comp*icate$ than to an m+ay search
tree because the ba*ance an$ the minimum number of /eys per no$e must be
preserve$.
#nsertion a*-orithm Loomis p- )00?'annino p- )"8
Te** stu$ents to rea$ 'annino= chapter 12= pa-es )".+)"7 ith particu*ar reference
to the $ia-ram on p- )"8= shoin- ho va*ues are inserte$ to no$es of a B+tree.
De*etion Loomis p- )7"
Cost cover formu*as& 'annino p- )"7K Loomis p- )0..
%.1) Inde,ed .e=uential 9iles ( 8K Tree
B+tree prob*em for seAuentia* search= performance is not so -oo$.
>ave to $o an in+or$er traversa* of the tree. Each no$e has to be visite$ once for each
/ey va*ue in it.
5 variation of the B+tree a BETree is often use$ because it -ives -oo$ performance
for seAuentia* and /ey searches.
5 BE Tree is essentia**y a mu*ti+*eve* in$e;= here the bottom *eve* of the tree is
eAuiva*ent to the first *eve* of a mu*ti+*eve* in$e;.
The structure of a BE Tree is $ifferent to a mu*ti+*eve* in$e;+seAuentia* fi*eG
Consists of " parts an inde, set Cas in a B treeD an$ a se=uence set.
1efer to >an$out 8 fi-ure 12. The first ) *eve*s in this one are the in$e; set=
the bottom *eve* is the seAuence set.
SeAuence set the *eaf no$es.
On*y the *eaf no$es have pointers to the $ata recor$s hereas in a B tree= a**
no$es have pointers to $ata recor$s.
Structure of a *eaf no$e in a BETree of or$er CnE1DG
,
2
9
2
,
1
91 ,
"
N.. ,
n+1
9
n+1
,
n
9i&ure D structure of a leaf node in a 8KTree
Note that the pointer ,n is $ifferent to the others.
For i S n= ,
i
is a pointer to the recor$ containin- the search /ey va*ue 9
i
.
Or= if the search /ey is not a can$i$ate /ey an$ the fi*e is not or$ere$ by the search
/ey= a pointer to a buc/et of pointers to recor$s ith the search /ey va*ue 9
i
Can extra
level of indirectionD.
,
n
is a pointer to the ne;t *eaf no$e in the seAuence set so the *ast pointer in the *eaf
no$es chains the *eaf no$es to-ether to form the seAuence set.
,a-e 7" of 87
Ne;t *eaf no$e
FBE Computer Science Department Lecture Notes Theory of Databases
For an m+ay search tree Can$ a B TreeD= e sai$ that a** /ey va*ues in the
sub+tree pointe$ to by ,
i
are *ess than 9
i
an$ -reater than 9
i+1
.
For a BE Tree= every search /ey va*ue must appear at *east once in a *eaf
no$e an$ possib*y a*so in a non+*eaf no$e. That means that a /ey va*ue 9
i
in a
non+*eaf no$e i** a*so appear in a *eaf no$e.
.o all 5ey 3alues in the sub(tree "ointed to by +
i
are less than or equal to
K
i
and &reater than <
i(1
. So the /ey va*ue 9
i
appears in the sub+tree pointe$
to by ,
i
.
Or sometimes= the /ey va*ues in the sub+tree pointe$ to by ,
i
are *ess than 9
i

an$ greater than or e)ual to 9
i+1
. #n that case= the /ey va*ue 9
i
appears in the
sub+tree pointe$ to by ,
iE1
.
L1ef fi-ure 12 on >an$out 8 consi$er if the va*ue "2 as move$ to the no$e
CmD from the no$e C*DM
5 *eaf no$e can ho*$ up to n /ey va*ues= an$ has a minimum of n?" va*ues.
1an-es in the *eaf no$es $o not over*ap they are seAuentia* i.e. if L
i
an$ L
3
are *eaf
no$es= here iS3= every search /ey va*ue in L
i
is *ess than every search /ey in L
3
.
The *eaf no$es form a $ense in$e; for the $ata fi*e.
The non+*eaf no$es form a mu*ti+*eve*= sparse in$e; on the *eaf no$es.
/G No= can you see ho a BE Tree -ives better performance than a B+Tree for
seAuentia* searches&
#G because the *eaf no$es form a seAuence set can fo**o the pointers from no$e to
no$e to -et a** recor$s in or$er of the search /ey. This is faster than $oin- a tree
traversa*.
%.11 Insertin& 0 Deletin& toGfro! a 8K Tree
#nsertin- an$ $e*etin- to?from a BE Tree is simi*ar to a B Tree= e;cept that the
seAuence set must be consi$ere$ as e** as the in$e; no$es.
The a*-orithms must /eep the non+*eaf no$es Ce;cept for the root no$eD at *east ha*f
fu**.
%.12 .u!!ary of inde, file structures
L1ef Si*berschatB p- !!.= p- !00?!07M
%e have $iscusse$ severa* $ifferent fi*e or-aniBations for in$e;esG
#n$e;+seAuentia* fi*e
Or$ere$ seAuentia* $ata fi*e ith an in$e; that is itse*f an or$ere$ seAuentia*
fi*e
Fast access by search /ey
Fast seAuentia* access by or$erin- fie*$
B+Tree
Or$ere$ seAuentia* $ata fi*e ith an in$e; in a B+tree structure
Fast access by search /ey
Fast seAuentia* access by or$erin- fie*$
,a-e 7) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
S*o seAuentia* access by search /ey
BE Tree
Or$ere$ seAuentia* $ata fi*e ith a mu*ti+*eve* in$e; in a BE tree structure
Fast access by search /ey
Fast seAuentia* access by or$erin- fie*$
Fast seAuentia* access by search /ey
#n a$$ition to these= the hash fi*e or-aniBation can a*so be use$ to bui*$ an in$e;.
>ash #n$e;
5$$resses of *o-ica* recor$s in the $ata fi*e $etermine$ by a hash function on
the search /ey va*ues
Search /ey va*ues a*so store$ in a hash fi*e consistin- of buc/ets
The hash function use$ to $etermine the buc/et in hich each search /ey is
store$
Each search /ey has pointerCsD to $ata fi*e recor$s
To fin$ a /ey va*ue= the hash function is app*ie$ to it an$ then the in$e; entry
can be *ocate$
5 hash in$e; is a secon$ary in$e; as the or$er of the search /eys $oes not
correspon$ to the or$er of the $ata fi*e. But can use the hash function to easi*y
fin$ recor$s in the $ata fi*e.
1emember a*so the non+or$ere$ seAuentia* fi*e or heap fi*e as a metho$ of fi*e
or-aniBation. #f seAuentia* an$ /ey searches are not freAuent*y nee$e$= a heap fi*e is as
-oo$ a ay as any to store $ata.
'any DB'Ss no use B Trees or BE Trees for their in$e;in-. They a*so use heap
fi*es for unor$ere$ $ata.
<enera**y= if you are usin- a DB'S for your $atabase= you $onJt have a choice over
the fi*e structure it uses. But you can $eci$e hat in$e;es are bui*t for the $ata.
'S SFL Server "222 uses B+Trees= not sure if it a*so uses BE Trees.
DonJt /no hat 5ccess uses probab*y the same.
'S SFL Server 6 'S 5ccess both create an in$e; on the primary /ey by $efau*t= for
e;amp*e.
#f you $o not te** it otherise= the in$e; i** be a primary?c*ustere$ in$e;.
But in SFL Server= you can te** it to be a non+c*ustere$?secon$ary in$e; an$ create a
primary?c*ustere$ in$e; on another fie*$. #n 5ccess Cas far as # /noD= you cannot
create a primary in$e; on a fie*$ other than the primary /ey. SFL Server has -reater
f*e;ibi*ity over in$ices.
Lasi$eG ,ro-rams+PSFL Server+PBoo/s On*ine use the #n$e; or Search tab to fin$
artic*es about b+tree an$ in$e;M
Lmaybe omit this ne;t bitM
Some factors that may inf*uence the DB'S $esi-nerJs choice of fi*e structures areG
,a-e 7! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#ccess ty"es seAuentia* or by search /ey va*ueK by in$ivi$ua* search /ey
va*ue or in a ran-e
#ccess ti!e ho *on- it ta/es to fin$ a particu*ar $ata item or set of items
e.-. binary search of a tree
Insertion ti!e time to insert the $ata recor$ an$ to up$ate the in$e;
Deletion ti!e time to $e*ete a $ata recor$ an$ to up$ate the in$e;
."ace o3erhead ho much space the in$e; ta/es up usua**y orthhi*e to
-o for some e;tra space e.-. a $ense in$e; if the other factors are improve$.
%.1 Indices on Multi"le <eys
5** the in$e;es e have consi$ere$ so far ere on a search /ey va*ue base$ on one
attribute. But an in$e; can a*so be on a combination of va*ues e.-. CustomerName
an$ CustomerFathersName.
The in$e; structure is sti** the same= e;cept that the /ey va*ues store$ are rea**y tup*es
of va*ues e.-. a** the combinations of Name+FathersName that e;ist in the Customer
tab*e. The in$e; entries are a*phabetica**y or$ere$ on both va*ues e.-.
5baba Te/*e comes before 5baba Tesfay.
The pointers are to $ata recor$s containin- both the va*ues.
%.14 'nforcin& :ni=ueness 4ith an Inde,
Hou may often ant to specify that a co*umn other than the primary /ey shou*$ have
uniAue va*ues.
'ost DB'Ss a**o this by creatin- an in$e; an$ *ettin- you specify that $ata recor$s
must have uniAue va*ues for the search /ey of the in$e;.
This is because chec/in- in$e; entries to see if the search /ey va*ue in a ne recor$
a*rea$y e;ists or not is more efficient than chec/in- in the $ata fi*e.
See for e;amp*e= 'S 5ccess hen you are $efinin- a co*umn= you can choose that
is in$e;e$ ith $up*icates a**oe$ or $up*icates not a**oe$.
%.15 6hy use inde,es 0 choosin& fields to inde,
No that you /no ho in$e;es are imp*emente$= *etJs ta*/ about hy you ou*$
ant to use them.
,ut simp*e= in$e;es can spee$ up Aueries in a $atabase. To un$erstan$ hy= eJ** *oo/
at ho Aueries are e;ecute$.
<enera**y= the DB'S stores each $b tab*e in one fi*e. #t may or may not have an in$e;
Cor in$icesD on the fi*e.
#n$e;in- i** ma/e some operations faster e.-. searchin- but others s*oer e.-.
insert?$e*ete so the choice is a tra$e off .
To ma/e -oo$ choices= you nee$ to un$erstan$ your $b system an$ hat Aueries i**
be most freAuent*y ma$e in it. 5 -oo$ startin- point ou*$ be to try to rite $on the
Aueries that are freAuent*y run e.-. *oo/ at forms in the interface an$ thin/ about
searches the users i** nee$ to $o.
%.15.1 Choosin& indices
,a-e 7( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Hou can have only one clusterin& inde, per tab*e because the recor$s can be in
on*y one or$er on secon$ary stora-e. So choose this in$e; carefu**y.
Some DB'Ss i** automatica**y ma/e the ,9 be the c*usterin- in$e; 5ccess 6
SFL Server. Some i** a**o you to chan-e it.
<ui$e*ines for choosin- itG
4sefu* on a co*umn that is often searche$ for a range of values because the
*o-ica* recor$s are *i/e*y to be in the same b*oc/. #f= for e;amp*e= a $ate
co*umn is often Auerie$ to fin$ a ran-e of $ates= the $ate co*umn shou*$ be in
a c*ustere$ in$e;.
#f there is a co*umn that is use$ freAuent*y to sort the data retrieved from a
table= then that co*umn is a can$i$ate for a c*usterin- in$e;. This is because
the ros are a*rea$y physica**y sorte$ by the search /ey= so it is not necessary
for the DB'S to sort the ros a-ain. For e;amp*e= in the pubs $atabase= the
emp*oyee tab*e has the c*ustere$ in$e; on the *name= fname an$ minit co*umns
as it is *i/e*y that most Aueries on emp*oyees i** sort the $ata by name= not
by empRi$.
4se for a primary key or other uni)ue column because it ma/es Aueries that
nee$ to fin$ a specific va*ue in the co*umn very fast.
#n summary thin/ about the $ata an$ hat fie*$s are most often use$ in the %>E1E
part of Aueries.
1emember a*so that you can create an in$e; on " or more co*umns if the va*ues are
often accesse$ to-ether. #n a$$ition= the DB'S may use the in$e; even hen on*y
one of the co*umns is bein- searche$.
See a*so chapter 12= section 12.( of 'annino p-s )). )!". This -ives some more
insi-ht an$ some -oo$ ru*es to use. %e i** not $iscuss here in c*ass= but consi$er this
as part of the courseN.so it can be in e;ams. Hou can as/ me about them either
persona**y or throu-h the $iscussion forums.
%.15.2 /uery 1"ti!iFation D ho4 inde,es are used
$eadin&G 'annino= chapter 12 Cas beforeDK Si*berschatB et a* Chapter 1)= section 1).(
on :oin Operations Cspecifica**y Neste$+Loop 3oin on p- (2) an$ 'er-e :oin= p- (2.D.
%hen you $efine or rite a SFL Auery= the DB'S carries out a process of trans*atin-
an$ ana*ysin- the Auery to pro$uce an execution plan.
1ea$G section 12.! of Chapter 12= 'annino= p- ))".
1. chec/ for synta; 6 semantic errors if there are errors= stop processin-. Synta;
O misuse$ /eyor$s e.-. F1O' in the ron- p*ace or misspe**e$. Semantic O
co*umns or tab*es ron-*y use$ e.-. comparin- va*ues in co*umns that have
incompatib*e $ata types.
". Auery transformation transform to a stan$ar$ format usua**y base$ on
re*ationa* a*-ebra. Can invo*ve rearran-ement to ma/e the Auery faster ca**e$
Auery optimisation.
,a-e 7. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
). access p*an eva*uation base$ on the re*ationa* a*-ebra e;pression= come up
ith an access p*an that *ists the in$ivi$ua* fi*e operations nee$e$ e.-. to use a B
Tree= to mer-e recor$s= to sort recor$s.
!. e;ecution of the access p*an interpret an$ e;ecute the *ist of fi*e operations.
E;amp*e Ca 3oin of ) tab*es= ith 3oin con$ition in the %>E1E c*auseD
SELECT t.tit*e= t.type= a$vance= notes= a.auRi$= a.auRfname= a.auR*name
F1O' tit*es t= tit*eauthor ta= authors a
here t.tit*eRi$ O ta.tit*eRi$ 5ND ta.auRi$ O a.auRi$
5ND t.Type O ]business] 5ND auRfname *i/e ]m[]
This Auery $oes " inner 3oins one to 3oin tit*e an$ tit*eauthor an$ then another to 3oin
the resu*t to author. Tit*e an$ author are re*ate$ throu-h a many+many re*ationship=
mo$e**e$ by the tit*eauthor tab*e.
.te" 1G synta; 6 semantic chec/. #f there as an error e.-. one of the /eyor$s
misspe**e$ or a missin- 5ND= an error i** be reporte$.
.te" 2G Auery transformation transform to re*ationa* a*-ebra operations. ,uttin- "
tab*es in the F1O' c*ause in$icates a Cartesian pro$uct. To $o an inner 3oin= the
stan$ar$ synta; is to use the #NNE1 :O#N?ON /eyor$s. But SFL a**os you to
specify the matchin- fie*$s in the %>E1E c*ause so a Cartesian pro$uct is
converte$ to a natura* 3oin if there is a 3oin con$ition in the %>E1E c*ause.
One possib*e e;pression ou*$ beG

tit*e=type=a$vance=notes=auRi$=auRfname=auR*name
C
tit*e.typeOJbusinessJ
CTit*eD nat.3oin Tit*e5uthor nat.3oin C
author.fname L#9E Jm[J
C5uthorDDD
Note that the se*ect operations are $one before the natura* 3oins this e*iminates some
ros before $oin- the 3oin= ma/in- the 3oins more efficient.
.te" G 5ccess p*an eva*uation fi*e operations. 5ccess p*an *oo/s *i/e a tree
structure the *eaf no$es are the startin- points the tab*es in the Auery. #t shos
hat in$ices i** be use$. This is one possib*e access p*an.
,a-e 70 of 87
'er-e 3oin C-et
matchin- rosD
4se B+Tree in$e; on Tit*eR#D
to -et ros in seAuence=
fi*terin- to -et on*y those
ith typeOJbusinessJ
4se B+Tree in$e; on Tit*eR#D
to -et ros in seAuence
5uthors
4se B+Tree in$e; on 5uRi$ to -et
ros in seAuence= fi*terin- to -et on*y
those here auRfname be-ins ith m.
Sort CauRi$D
'er-e 3oin C-et matchin- rosD
FBE Computer Science Department Lecture Notes Theory of Databases
5nother possib*e p*an for this Auery mi-ht be to first 3oin 5uthors 6 Tit*e5uthor.
The DB'S i** eva*uate the $ifferent p*ans each operation type Ce.-. rea$ a B Tree
in$e;= sort= mer-e 3oinD has an associate$ cost. The DB'S can estimate costs by
estimatin- the number of ros that i** resu*t from each operation.
Hou can see a representation of the p*an create$ by SFL Server in the Fuery 5na*yBer.
Before runnin- the Auery= use Fuery menu= Sho E;ecution ,*an. %hen you run the
Auery= an e;tra tab appears in the resu*ts pane E;ecution ,*an. This shos a
$ia-ram= simi*ar to the one above. #f you point the mouse at a no$e in the $ia-ram=
you see more information about hat is happenin- e.-. e;act*y hat in$e; is bein-
use$= hat operation is bein- carrie$ out= the C,4 cost etc.
#n this case= you i** see that the in$ices on Tit*es an$ 5uthors are c*ustere$ so
accessin- those in or$er is fast. The in$e; on Tit*e5uthor.Tit*eR#D is non+c*ustere$= so
it may not be so fast but as the number of Tit*es has a*rea$y been re$uce$
CtypeOJbusinessJD= the number of matchin- ros is sma**. On*y those b*oc/s pointe$
to by in$e; entries for the remainin- Tit*eR#D va*ues nee$ to be accesse$.
.te" 4G e;ecution of access p*an interpret the p*an an$ e;ecute it.
%.15. :se of .tatistics
The Auery optimisation part of the DB'S nee$s statistics about the $ata in the
$atabase to ma/e -oo$ $ecisions.
# mentione$ that in eva*uatin- the access p*an= the DB'S estimates the cost of each
operation= by estimatin- the number of ros that i** be returne$.
#t $oes this by referencin- tab*e profi*es an$ statistics such as number of ros in
each tab*e an$ $istribution of va*ues in a co*umn Ce.-. of CustomerName va*uesD.
5 DB'S *i/e SFL Server /eeps these statistics in its system $atabases. #t a*so up$ates
the statistics re-u*ar*y= automatica**y.
The DB'S i** a*so ana*yse the avai*ab*e in$ices on a tab*e an$ try to use the one
that -ives the *oest cost for a particu*ar operation.
Sometimes= you may fin$ that it $oes not use the in$e; you e;pect it to= or ant it to.
SFL Server a**os you to force it to use a particu*ar in$e; in a Auery usin-
somethin- ca**e$ a tab*e hint Cuse Boo/s On*ine if you ant to fin$ out moreD.
%.16 Creatin& Inde,es 4ith ./L
Hou can create in$ices usin- SFL Aueries or ith the Enterprise 'ana-er.
#n SFLG
C1E5TE L4N#F4EM LCL4STE1ED V NONCL4STE1EDM #NDE` in$e;Rname
,a-e 77 of 87
Tit*es Tit*e5uthor
FBE Computer Science Department Lecture Notes Theory of Databases
ON tab*eRname Cco*umnRname L5SC V DESCM= N..D
4niAueG if uniAue= no " ros can have the same va*ue for the in$e; co*umnCsD
C*ustere$?nonc*ustere$G c*usterin- or not
5SC?DESCG ascen$in- or$er or $escen$in- or$er of search /eys in the in$e;. For a
c*ustere$ in$e;= this i** affect the or$er of the ros in the $ata fi*e.
Hou can a*so or/ ith in$ices usin- the Enterprise 'ana-er see >an$out 7=
section 0= un$er the course Database ith SFL Server for $etai*s.
#n 'S 5ccess= in$e;in- is one of the properties of a co*umn you can specify if the
co*umn is to be in$e;e$ an$ if yes= if $up*icates are a**oe$ or not an$ a*so if it is
ascen$in- or $escen$in-. Hou can a*so use the #n$e;es button on the too*bar usefu*
if you ant to create an in$e; on to or more co*umns to-ether.
LDurin- this unit= arran-e ith the technica* assistant to $o a *ab session on in$e;in-.
<ive him?her an$ the stu$ents a copy of >an$out 12 CTab*es 6 #n$ices %or/sheetD to
use in the *ab. #$ea**y near the en$ of the unit in the same ee/ as the *ast fe
sections above are covere$ in c*ass.M
%.1* /uiF
>an$out 11 is a AuiB for stu$ents to he*p chec/ their on *earnin- of this unit.
Either -ive them each a copy or ca** out the Auestions in c*ass.
<ive them about "2 minutes to anser the Auestions= or/in- on their on.
Then -et them to sap ith the person besi$e them.
Then you -o throu-h the Auestions an$ their ansers stu$ents can mar/ each otherJs
papers. 4se this as an opportunity to $iscuss the ansers if stu$ents have $ifferent
ansers to the su--este$ ansers= chec/ them an$ e;p*ain hy they are not Auite
ri-ht or accept if they are -oo$ ansers.
The so*utions to the AuiB are in the $oc >an$out 11 #n$e;in- FuiB So*utions.$oc.
- ./L
Cmost*y to be covere$ in the *abs by the course technica* assistantD
1) Introduction to Transactions 0 Concurrency
1b2ecti3esG stu$ents to /no hat $atabase transactions are= the 5C#D properties of a
transaction= hat concurrency means an$ to un$erstan$ the issues invo*ve$ in
concurrency contro*.
$eadin&G 'annino= chapter 1)K Si*berschatB et a* chapters 1( 6 1..
1).1 Introduction
5 transaction usua**y means an interaction amon- " or more parties in the con$uct of
business e.-. a customer *o$-in- money into a ban/ account= a customer ith$rain-
money from the ban/ account= the purchase of a car or boo/in- a f*i-ht.
,a-e 78 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#n a DB'S= this type of transaction may reAuire severa* $atabase operations to ta/e
p*ace. For e;amp*e to *o$-e money into a ban/ account= the operations that must
ta/e p*ace cou*$ beG
1. #nsert a ne recor$ to the 5ccount>istory tab*e
". 4p$ate the account ba*ance in the 5ccounts tab*e
To ith$ra money= the operations mi-ht beG
1. Chec/ the account ba*ance to ma/e sure the ith$raa* amount is
avai*ab*e
". #nsert a ne recor$ to the 5ccount>istory tab*e
). 4p$ate the account ba*ance in the 5ccounts tab*e
L#f you nee$ another e;amp*e transfer money beteen " accounts nee$ to $ebit
one account= cre$it the other an$ up$ate both ba*ances. %i** refer bac/ to these
e;amp*es= so /eep on the boar$ if possib*e.M
So= in a $atabase system= one transaction can invo*ve any number of rea$s from an$
rites to the $atabase. 5ny one of the operations e.-. chec/ the account ba*ance $oes
not ma/e sense to the en$ user if ta/en on its on. The en$ user may as/ the system
to perform a *o$-ement or a ith$raa*= an$ $oes not nee$ to /no about the
operations that ta/e p*ace.
#n other or$s= a co**ection of operations to-ether is a sin-*e unit from the point of
vie of the $atabase user.
#n $atabase termino*o-y= a transaction is: a se=uence of o"erations "erfor!ed as a
sin&le> lo&ical unit of 4or5.
The operations performe$ as part of a transaction ma/e chan-es to the $ata i.e. they
are $ata mo$ification operations.
5t any point in time= there may be many transactions happenin-. For e;amp*e= in a
ban/Js system= there are customers comin- into $ifferent branches an$ ma/in-
*o$-ements an$ ith$raa*s at a** times $urin- the $ay. So the system has to be ab*e
to han$*e concurrent transactions Ctransactions happenin- at the same time= an$ usin-
the same $atabase tab*esD.
The 5ey feature of a transaction is that all the o"erations !ust succeed. #f any one
of the operations fai*s= then the transaction must fai* it must un$o any of the
operations that $i$ succee$.
For e;amp*e= if the system removes money from the savin-s account but fai*s to a$$
the money to the current account there i** be a prob*em ith the customer]s
account.
So= $atabase transactions must have ays of chec/in- that a** operations have
comp*ete$.
1).2 #CID +ro"erties
For a *o-ica* unit of or/ to be consi$ere$ a transaction= it shou*$ have certain
properties. These properties he*p to ensure the inte-rity of $ata in a $atabase.
,a-e 82 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
They areG
#to!icity
Consistency
Isolation
Durability
These are /no as the 5C#D properties of a transaction Cfrom the first *ettersD.
Lrite on boar$ but *eave room to put e;tra info for each one a$$ notes as you -o
throu-h the fo**oin- section that $escribes each propertyM
#to!icity
5** $ata chan-es ma$e by the operations are ref*ecte$ in the $atabase or none of them
are Ca** $ata mo$ifications performe$ or noneD.
For e;amp*e to ith$ra money from an account reAuires ) stepsG
1. Chec/ the account ba*ance to ma/e sure the ith$raa* amount is
avai*ab*e
". #nsert a ne recor$ to the 5ccount>istory tab*e
). 4p$ate the account ba*ance in the 5ccounts tab*e
#f a** ) steps succee$= the transaction is comp*ete e say it is co!!itted.
Let us suppose the first step succee$s= then the secon$ step fai*s for some reason= a
recor$ cannot be inserte$ to the 5ccount>istory tab*e.
#n this case= the entire transaction must fai* because the account cannot be $ebite$.
#f the secon$ step $oes succee$ an$ then the thir$ step fai*s the secon$ step must be
un$one= because the account cannot be $ebite$ ithout up$atin- the account ba*ance.
#f a transaction fai*s= the process of un$oin- the previous steps an$ -oin- bac/ to the
initia* state is ca**e$ rollin& bac5 the transaction.

Thus= 5tomicity means that a DB'S must be capab*e of recoverin- transactions if
somethin- -oes ron- hi*e the operations are bein- e;ecute$. #f a transaction is
partia**y comp*ete$= it must un$o the comp*ete$ operations.
There are $ifferent situations that can cause this. ,ossib*e causes areG
" sets of operations CtransactionsD -et into a $ea$*oc/ because both are
aitin- to use the same $ata eventua**y one of them i** be terminate$ this
is ca**e$ transaction recovery.
a crash of the system if the DB'S has a system fai*ure hi*e transactions
are ta/in- p*ace= after the $b is recovere$= there may be partia**y comp*ete$
transactions Chere some of the operations too/ p*ace but others $i$nJtD this
is ca**e$ crash recovery.
Consistency
%hen comp*ete$= a transaction must preserve the consistency of a $atabase. This
means that after comp*etion= a** $ata must be in a correct state an$ comp*y ith a**
$ata constraints an$ va*i$ation ru*es.
e.-. for a ban/ transaction that transfers money beteen " customer accounts + this
ou*$ mean that the tota* amount of money recor$e$ in the customer]s accounts
shou*$ sti** be the same.
,a-e 81 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Consistency a*so means that data integrity in the $atabase must be maintaine$. This
means avoi$in- the situation here a transaction can rea$ Qdirty dataJ. Dirty $ata is
$ata that has been chan-e$ but not yet committe$ to the $atabase.
To he*p ensure consistency= the DB'S must be ab*e to $ea* ith concurrent
transactions. %e i** ta*/ more about this *ater.
Isolation
#n summary= this means that any transaction must be unaare of other transactions
e;ecutin- in the system concurrent*y Cmeanin- at the same timeD.
No other transactions or e*ements of the $atabase can see the chan-es resu*tin- from a
transaction unti* the transaction comp*etes. Other transactions shou*$ see the $ata in
the state it as in before the transaction or after it comp*etes not in beteen.
Or= in other or$s= a transaction must see a consistent $atabase a transaction cannot
rea$ or rite $ata that is bein- mo$ifie$ by another transaction.
,ossib*e conseAuences of transactions that are not iso*ate$G
Lost u"dates an up$ate by one user overrites an up$ate by another user
:nco!!itted de"endency Ca*so /non as a $irty rea$D hen one
transaction rea$s $ata ritten by another transaction before the first
transaction commits
+hanto! $o4s hen a transaction 5 rea$s $ata ros= then another
transaction B $oes somethin- to up$ate the ros= then transaction 5 carries out
the same rea$ a-ain but -ets $ifferent recor$s from the first time.
%e i** ta*/ more about each of these hen e cover Concurrency Contro*.
Durability
5fter a transaction comp*etes successfu**y= the chan-es it has ma$e in the $atabase
i** persist CremainD even if there is a system fai*ure Ce.-. a $is/ crashD. #n other
or$s the chan-es must be permanent an$ cannot be erase$ from the $atabase Cafter
they are committe$D.
e.-. for ban/in- if the system crashes= the customer i** sti** see that she move$
B122 to the current account. Or if customer ` transferre$ B1222 to the account of
customer H the money must sti** sho as havin- been transferre$.
1). D8M. .er3ices for Transactions
5 DB'S shou*$ provi$e services to he*p meet the 5C#D properties of transactions.
SFL a*so has some features that -ive the $b $esi-ner contro* over transactions.
1)..1 #to!icity
SFL has statements to be-in?commit?ro**bac/ transactions.
The DB'S automatica**y carries out transaction 6 crash recovery.
1)..1.1 ./L state!ents
#f there is a seAuence of operations that form a transaction= the pro-rammer shou*$
be-in a transaction before the first one an$ commit the transaction after the *ast one.
Hou a*so have to put *o-ic in the co$e to ma/e a** the operations fai* if any one of
them fai*s SFL has a statement to roll back a transaction a*so.
,a-e 8" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
8e&in transaction
Operation 1
#f fai*ure
$ollbac5 transaction
E;it
En$ if
Operation "
#f fai*ure
$ollbac5 transaction
E;it
En$ if
Co!!it transaction
1)..2 Consistency
Data inte-rity constraints are chec/e$ hen up$ates are ma$e.
The re*ationa* mo$e* has bui*t+in inte-rity ru*es *i/e entity inte-rity Cprimary /ey
constraint= uniAueness constraintD= referentia* inte-rity Cforei-n /ey constraintD= chec/
constraints Cva*i$ation ru*esD= nu** chec/s.
%hen a $atabase operation is carrie$ out= the DB'S chec/s that a** the chan-e$ or
a$$e$ $ata sti** meets a** the constraints on it. #f it $oes not= an error occurs.
#n SFL Server= certain SFL statements e.-. C1E5TE T5BLE= #NSE1T are
automatica**y ro**e$ bac/ if an error occurs.
For other statements or combinations of statements= the pro-rammer can chec/ for
such errors an$ ro** bac/ the transaction if an error occurs.
1).. Isolation
>an$*in- of concurrent transactions concurrency contro*. 5 mechanism ca**e$
*oc/in- is use$ to he*p contro* concurrency.
To prevent other transactions accessin- a tab*e it is usin-= a transaction can loc5 the
tab*e. 5 *oc/ te**s other transactions that the tab*e cannot be accesse$.
But there are $ifferent *eve*s of *oc/in- possib*e for e;amp*e= if one step in a
transaction is to simp*e rea$ $ata from a particu*ar tab*e= there is no nee$ to stop other
transactions from rea$in- the tab*e.
%e i** ta*/ more about this *ater.
1)..4 Durability
5 $atabase can be bac/e$ up re-u*ar*y. Then if there is a system fai*ure= the $atabase
bac/up can be restore$ to recover the $atabase.
ButN.if the bac/ up as ta/en ) hours before the fai*ure= the *ast ) hours of $ata
chan-es i** not be in itN.so the transactions ma$e in that time are not $urab*e.
To a$$ress this= the DB'S can /eep a lo& of transactions. 5 *o- fi*e can be a$$e$ to
a** the time an$ $oes not ta/e as much space as a comp*ete bac/up.
The *o- is *i/e a *ist of a** the chan-es that have ta/en p*ace.
#f the bac/up is restore$= then the *o- fi*e from the time of bac/up can be app*ie$ to
the $atabase this i** brin- it ri-ht up to $ate= up to the point in time here it
crashe$.
,a-e 8) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
This is /non as a transaction *o-.
L5s/ have you notice$ that hen you create a $atabase on SFL Server= there are to
fi*e *ocations specifie$ one is for the $ata itse*f an$ the other is for the *o- fi*e. They
can be put in $ifferent $rives= so if the $ata $rive crashes= you may sti** have the *o-
fi*e.M
1).4 Concurrency Control
Concurrent transactions are transactions that are runnin- at the same time. 5 DB'S
shou*$ be ab*e to han$*e this an$ sti** maintain the iso*ation property of transactions.
%hat iso*ation actua**y meansG
mo$ifications ma$e by concurrent transactions must be iso*ate$ from mo$ifications
ma$e by any other concurrent transactions.
Let us suppose there are to transactions= 5 an$ B= occurrin- concurrent*y an$ usin-
the same tab*es= then transaction 5 shou*$ see the $ata in the state it as in before
transaction B is carrie$ out or after transaction B as comp*ete$. Transaction 5
shou*$ not see the $ata in any interme$iate state Ce.-. after one up$ate but before
another oneD.
This is because a transaction can ma/e chan-es to $ata an$ it can consist of severa*
$ifferent operations to ma/e those chan-es. Suppose Trans 5 is ma/in- chan-es. #f
Trans B sees the $ata hi*e 5 is e;ecutin-= it may rea$ $ata before 5 chan-es it. Then
5 comp*etes= but B has rea$ some $ata that is no incorrect.
e.-. for ban/in- cannot have another transaction tryin- to ta/e money out of the
savin-s account hi*e money is bein- move$ to the current account hat if there
i** be no money *eft in the account after the transaction comp*etes&
LetJs *oo/ at a simp*e e;amp*e to sho hat can happen hen transactions run
concurrent*y. Dra a time *ine CT1+T.D an$ a series to sho " transactions runnin- at
the same time= operatin- on the same $ata.
The te;t in parentheses is information you Cthe teacherD can ta*/ about?a$$ after
$rain- the $ia-ram.
L,ossib*e c*ass activity to $o for this to invo*ve the stu$ents an$ to he*p your visua*
an$ /inaesthetic *earners
1
. Hou can have stu$ents act out the process of the $ifferent
transactions an$ hat they $o.
'a/e some car$s one sayin- Transaction 5= one sayin- Transaction B. <et a
vo*unteer to ho*$ up each one= or stic/ on the a** or boar$. Then have a stu$ent for
each transaction. Ta/e the steps ritten in the tab*e for each transaction at each time
CT1= T" etcD rite them on pieces of car$?paper as instructions. >ave somethin- to
be the $atabase e.-. ritten on the boar$ or another stu$ent ho*$in- the va*ue of ` in
the $b. Then han$ each QtransactionJ stu$ent a piece of paper ith an instruction on it
-ive the T1 instructions first= *et them $o the action an$ then continue ith T" an$
so on. 1emember that each transaction rea$s $ata into its on buffer so the va*ue it
1
One *earnin- theory says that *earners are visua*= au$itory or /inesthetic. #n short= this means some
peop*e *earn best by seein- $ia-rams= pictures etcK some by *istenin- an$ some by $oin- thin-s. 'ost
peop*e are visua* or au$itory but some stu$ents i** be /inesthetic. TheyJ** be bore$ by p*ain o*$ cha*/
6 ta*/ teachin-I
,a-e 8! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
has in its buffer $oesnJt chan-e if the va*ue in the $b chan-es i.e. if Trans 5 rites ;=
Trans B $oes not -et the ne va*ue unti* it $oes a ne rea$ of `.
>opefu**y this i** he*p stu$ents to un$erstan$ ho concurrent transactions operate
an$ to see ho a *ost up$ate can occur.M
Ti!e Transaction # Transaction 8
T1 rea$ ` 3result= -4
T" ;GO ` E (2 3result= x<"-4 rea$ ` 3result= - 1 because rans & has not yet
written the new value of x4
T) rite ` 3> in db now has value
"-4
`GO`E"2 3result= #-4
T! rite ; 3> in db now has value #-4
T( Commit transaction
T. Commit transaction
FG hat is the va*ue of ` after T.&
5G "1.
FG %hat shou*$ it be&
5G 1 E "2 E (2 O 01.
#f this as your ban/ account= ou*$ you be happy&I
This is ca**e$ a lost u"date the up$ate ma$e by Transaction 5 as *ost.
:nco!!itted de"endency Cor $irty rea$D e;amp*eG
Ti!e Transaction # Transaction 8
T1 rea$ ` 3result= -4
T" ;GO ` + 1 3result= x<.4
T) rite ` 3> in db now has value .4
T! rea$ ` 3result= . 1 because rans & has
now written the new value of x4
T( ro**bac/ transaction
5fter T(= Transaction B has an incorrect va*ue for ` if it performs an operation e.-.
to a$$ to `= the resu*t i** be incorrect because Transaction 5 ro**e$ bac/ itJs
operations puttin- ` bac/ to a va*ue of 1.
Incorrect su!!ary Cphantom rosD e;amp*eG
Occurs hen Transaction 5 is up$atin- $ata hi*e Transaction B is rea$in- the $ata to
ca*cu*ate a summary.
Ti!e Transaction # Transaction 8
T1 rea$ ` 3result= -4
T" ;GO ` + 1 3result= x<.4
T) rite ` 3> in db now has value .4
T! rea$ ` 3result= . 1 because rans & has
now written the new value of x4
T( sum O sum E `
T. rea$ H 3result= !4
,a-e 8( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
T0 Sum O sum E H 3result= .?! < !4
T7 1ea$ H 3result= !4
T8 H O y 1 3result= #4
T12 %rite y 3result y<# in the db4
>ere= Transaction B has rea$ ` after Transaction 5 up$ate$ it= but rea$ H before
Transaction 5 up$ate$ it.
So Transaction B has inconsistent $ata remember that iso*ation means that
Transaction B shou*$ see the $ata in the state it as in before ransaction & modified
it, or after it modified it not in beteen the to states.
1).5 Loc5s
5** of the above prob*ems can be avoi$e$ by usin- a mechanism ca**e$ *oc/s.
%hen a transaction is ma/in- chan-es to $ata in a tab*e= the transaction can -et a *oc/
on the tab*e.
%hi*e the tab*e is *oc/e$ by the transaction= other transactions ishin- to access the
$ata cannot $o so they must ait unti* the *oc/ is re*ease$.
LHou can sho ho this ou*$ prevent any of the previous ) e;amp*es in each case=
Transaction B ou*$ have to ait for Transaction 5 to re*ease its *oc/ before
accessin- the $ata.M
This seems *i/e a -oo$ so*utionN.no= transactions 3oin a Aueue Ca *ine in 4S
En-*ishD hen they ant to access $ata in a particu*ar tab*e.
%hen the *oc/ is avai*ab*e= it is -rante$ to the ne;t transaction in the Aueue.
But consi$er hat this means if there are many transactions concurrent*y accessin-
the same $ata they have to ait in the Aueue. This can cause $e*ays in the
app*ications comp*etin- their transactions.
So this can re$uce the $b performance s*oin- thin-s $on. 5n$ ith this=
transactions are not rea**y concurrent any more instea$ they ait an$ a transaction is
the on*y one up$atin- the $ata at a -iven point in time.
This is ca**e$ seria*iBation here transactions access $ata in seAuence= formin- a
Aueue.
This is one e;treme of isolation le3el. The other e;treme is to a**o transactions to
rea$ $ata before it is committe$ by other transactions i.e. to accept the possibi*ity of
uncommitted dependency happenin-.
#n SFL Server= this iso*ation *eve* is ca**e$ read unco!!itted.
#t means the DB5 C$b a$ministratorD is a**oin- more transactions to access the same
$ata at the same time= /noin- that sometimes this i** resu*t in an uncommitted
dependency an$ thus some inconsistent $ata.
But this may be acceptab*e if it $oes not happen often.
#t a*so $epen$s on the $ata e.-. sensitive $ata *i/e ban/ account transactions vs
customer a$$ress $ata. 5 chan-e in the ban/ account ba*ance shou*$ not be rea$ unti*
it is committe$K a chan-e to the customerJs a$$ress $oes not happen often an$ cou*$
be rea$ before committin-. #f the chan-e is committe$= the effect of havin- the $ata
before it chan-e$ is minima*.
,a-e 8. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5 DB'S i** have $ifferent *eve*s of iso*ation. SFL Server has ! *eve*s.
The choice of iso*ation *eve* is a tra$e+off beteen concurrency an$ $ata consistency.
The hi-hest *eve* of iso*ation= Seria*iBab*e= -ives *o concurrency but hi-h
consistency.
The *oest *eve*= read unco!!itted= -ives hi-h concurrency but *o consistency.
Dra a $ia-ram *i/e this C$onJt nee$ the -ri$ *inesDG
Concurrency Consistency
Seria*iBab*e Chi-h iso*ationD
*o
hi-h
1epeatab*e 1ea$
1ea$ Committe$
1ea$ 4ncommitte$ C*o
iso*ationD
hi-h *o
1).6 Loc5 Ty"es
1).* I!"licit Transactions
%e have not been usin- be-in?commit transaction hen e rite SFLNbut SFL
Server is smart it $oes somethin- ca**e$ #mp*icit Transactions. #t $oes these for
certain statementsG
5*ter= create= $rop= $e*ete= insert= up$ate= se*ect.
5n imp*icit transaction occurs automatica**y. So if you $o an insert= if it creates an
error Ce.-. you try to put a character va*ue into an int co*umnD= SFL Server
automatica**y ro**s bac/ the transaction.
This $efau*t transaction mo$e is ca**e$ 5utocommit Transactions because it
automatica**y commits if the statement e;ecution is successfu* an$ it automatica**y
ro**s bac/ if the statement fai*s.
%hen you connect to the SFL Server throu-h the Fuery 5na*yBer= you can choose a
$ifferent Transaction 'o$e. Others are #mp*ict Transactions an$ E;p*icit
Transactions.
L-et e;p*anations from >an$out 0 pa-es (?.M
#mp*ict TransactionsG
E;p*icit Transactions
1).% De!o in Lab
To $emonstrateG
On one c*ient= chan-e the #mp*icitRTransactions settin- to ON
Start an imp*icit transaction to up$ate the name of a tit*eG
up$ate tit*es set tit*e O ]The Busy E;ecutive]]s Database <ui$e up$ate$ by ;;;]
here tit*eRi$ O ]B412)"]
Chere ;;; is the stu$entJs user nameD.
#f other c*ients no try to se*ect from the tit*es tab*e= they shou*$ fin$ that the Auery is
ta/in- some time because the transaction has a *oc/ on the tab*e.
,a-e 80 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Commit the transaction on the first c*ientG
Commit tran
No the se*ect on the other c*ients shou*$ comp*ete.
11 Introduction to .ecurity
Types of Database $i$nJt cover ear*ier in -reat $etai*
Cou*$ $oG
5na*ytic COL5,D
Operationa* COLT,D
,a-e 87 of 87

You might also like