You are on page 1of 25

r;

llr

ij,:l$S'ii*
t&isi:t&

1
chapter
Databasesand
Database Users

atabasesand databasesystemsare an essential


component of everydaylife in modern society.
Daily, most of us encounter severalactivities that involve some interaction with a
database.For example,if we go to the bank to deposit or withdraw funds, if we make
a hotel or airline reservation,if we accessa computerized library catalog to search
for a bibliographic item, or if we purchasesomething online-such as a book, toy,
or computer-chances are that our activities will involve someone or some computer program accessinga database.Even purchasing items at a supermarket in
many cases,automatically updatesthe databasethat holds the inventory of grocery
items.
Theseinteractions are examplesof what we may call traditional databaseapplications, in which most of the information that is stored and accessedis either textual
or numeric. In the past few years,advancesin technology have led to erciting new
applicationsof databasesystems.New media technologyhas made it possibleto
store images,audio clips, and video streamsdigitally. Thesetypes of files are becoming an important component of multimedia databases.Geographic information
systems (GIS) can store and analyzemaps, weather data, and satelliteimages.Data
warehouses and online analytical processing (OLAP) systemsare used in many
companies to extract and analyzeuseful information from very large databasesto
support decision making. Real-time and active databasetechnology is usedto control industrial and manufacturing processes.And databasesearch techniques are
being applied to the World Wide Web to improve the searchfor information that is
neededby usersbrowsing the Internet.

i:A,iA:lilftj:'

and DatabaseUsers
Chaoter1 Databases

To understand the fundamentals of databasetechnology, however, we must start


from the basicsof traditional databaseapplications.In Section1.1we startby defining a database,and then we explain other basicterms.In Section1.2,we provide a
d a ta baseexampl e to i l l ustrate our di scussi on.S ecti on 1.3
s i m p l e U N IVE R S IT Y
of databasesystems,and Sections1.4and
describessomeof the main characteristics
1.5categorizethe typesof personnelwhosejobs involveusing and interactingwith
databasesystems.Sections1.6,L7, and 1.8 offer a more thorough discussionof the
variouscapabilitiesprovided by databasesystemsand discusssometypical database
applications.Section1.9summarizesthe chapter.
The reader who desiresa quick introduction to databasesystemsonly can study
Sections1.1through 1.5,then skip or browsethrough Sections1.6through 1.8and
go on to Chapter2.

1.1 lntroduction
Databasesand databasetechnology have a major impact on the growing use of
computers.It is fair to saythat databasesplay a critical role in almost all areaswhere
computersare used,including business,electroniccommerce,engineering,medicine, law, education, and library science.The word databaseis so commonly used
that we must begin by defining what a databaseis. Our initial definition is quite
generar.
A databaseis a collection of relateddata.l By data, we mean known factsthat can be
recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addressesof the people you know. You may have recordedthis
data in an indexed addressbook or you may have stored it on a hard drive, using a
personal computer and software such as Microsoft Accessor Excel.This collection
of relateddata with an implicit meaningis a database.
The preceding definition of databaseis quite general;for example,we may consider
the collection of rvordsthat make up this pageof text to be relateddata and henceto
constitute a database.However,the common use of the term databaseis usually
more restricted.A databasehas the following implicit properties:
w A databaserepresentssome aspect of the real world, sometimes called the
miniworld or tl-reuniverse of discourse (UoD). Changesto the miniworld
irre reflectedin the database.
w A databaseis a logically coherent collection of data with some inherent
meaning. A random assortment of data cannot correctly be referred to as a
database.
s A databaseis designed,built, and populatedwith data for a specificpurpose.
It has an intended group of usersand some preconceivedapplicationsin
which theseusersare interested.
1 . We will u se th e wo r d d a fa a s both srngul arand pl ural ,as i s contmoni n databasel i terature;contextw i l l
d e te r m r n ewh e th e rr t is s n g u la ror pl ural .In standardE ngl i sh,dal a i s used for pl ural ;datumi s used for stnq urar.

1.1 Introduct ion

u s t s tart
rr dc-flnr( r v i d e a
i i trn 1. 3
. l. -l a nd
l n q r \ ' ith
,l : o f the
i.r t.rbase

il l

\ t L ldV

I S a nd

r u se of
I: \\'hc'fe
i. nlc-cii: l rr . u 5sd
:r Q L llte
rt i.tn be
: c \ . t c le-J c d t h is
. usinq a
' l l c c t i on
a\)IlSider
r c n c e to
' u \ L r a lly

ill.'tl the
inirr'orld
n hc r e n t
,l to irs a
.rurpo s e.
l trr) n s i n

In other words,a databasehas somesourcefrom which data is derived,somedegree


of interactionrvith eventsin the real world, and au audiencethat is activelyintermay perform businesstransactions
estedin its contents.The end usersof a dirtabase
(fol example,a custofilerbuys a camera)or eventsmay happen (for example,ar-r
emplo,vee
hasa baby) that causethe inforn-rationin the databaseto change.In order
for a databaseto be accurateand reliableat all times,it must be a true reflectionof
changesmust be reflectedin the database
the miniworld that it represeutsi
therefcrle,
as soon as possible.
A datarbase
can be of any sizeand complexity.For exan.rple,
the list of namesand
addresses
referredto earliernrav consistof onlv a few }rundredrecorcls,eachwith a
simple structure.Or-rthe other hand, the computerizedcatalogof a large iibrary
may contain half a rnillion entries organizedunder dift-erentcategories-by primary author's last name, b,vsubject,bv book title-rvith each categoryorganized
A databaseof evengreatersizeand complexityis maintainedbr, the
alphabetically.
Internal RevenueService(lRS) to nronitor tax forms filed by U.S.taxpayers.If we
assumethat thereare 100million taxpayersand eachtaxpayerfilesan averageof five
forms with approximately 400 charactersof infbrmation per fbrm, we rvoulclhavea
databaseof 100x I06 x 400 x 5 characters(bytes)of information. If the IRS keeps
the past three returns of eachtaxpa).erin addition to the current return, we would
havea datarbase
of 8 x 10r1bytes(800 gigabvtes).This huge arnount of information
must be organizedand m.rnagedso that userscan searchfor, retrieve,and update
the data as needed.An exampleof a large conlmercialdatabaseis Amazon.corn.It
contains data for over 20 million books, CDs, videos, DVDs, garnes,electronics,
apparel,and other items.The databaseoccupiesor.er2 terabytes(a terabyteis 101:
bytes worth of storage)and is stored on 200 different computers (calledservers).
About l5 million visitors accessAmazon.corn each day and use the databaseto
make purchases.
The databaseis contiuuallyupdatedas new books and other items
are addedto the inventory and stock quantitiesare updatedas purchasesare transacted.About 100 people are responsiblefor keepingthe An-razondatabaseup-todate.
A databasemay be generatedand rraintained manually or it ma,vbe computerized.
For example,a library card catalogis a databasethat n-raybe createdand maintair-red
manually. A computerized databasernay be createdand maintained either by a
group of application programs written specificarlly
for that task or by a database
managementsystem.\Ve are only concernedwith cornputerizeddatabasesin this
book.
A databasemanagementsystem (DBMS) is a collectionof programsthat enables
so.ftu,are
sysThe DBMS is t generol-purPose
usersto createand maintain a datal'rase.
fcrn that facilitatesthe processesof deJining,cotrstructing,nnnipulating, and shuring
involvesspecclatabases
among various usersand applications.Defining a datarbase
ifuing the data types,structures,and constraintsof the data to be storedin the datab a s e .T h e d a ta b a s ed e ti n iti on or descri pti vei nforrnati on i s al so stored i n the
databasein the form of a databasecatalogor dictionary; it is called meta-data.
Constructing the databaseis the processof storing the data on some storage
nredium that is controlled by the DBMS. Manipulating a databaseincludesfunc-

Chaoter1 Databases
and DatabaseUsers

tions such as querying the databaseto retrieve specific data, updating the database
to reflectchangesin the miniworld, and generatingreports from the data. Sharing a
databaseallows multiple usersand programs to accessthe databasesimultaneously.
An application program accesses
the databaseby sending queries or requestsfor
the
DBMS.
A
queryr
data to
typically causessome data to be retrieved; a transaction may causesome data to be readand some data to be written into the database.
Other important functions provided by the DBMS include protectingthe database
and maintainizrgit over a long period of time. Protection includessystemprltection
against hardware or software malfunction (or crashes) and securityprofection
against unauthorized or malicious access.A typical large databasemay have a life
cycle of many years,so the DBMS must be able to maintain the databasesystemby
allowing the systemto evolveas requirements changeover time.
It is not necessary
to use general-purpose
DBMS softwareto implement a computerized database.We could write our own set of programs to createand maintain the
database,in effectcreatingour own special-purpose
DBMS software.In either casewhether we use a general-purposeDBMS or not-we usually have to deploy a considerableamount of complex software.In fact, most DBMSs are very complex
softwaresystems.
To cornplete our initial definitions, we will call the databaseand DBMS software
together a databasesystem. Figure L I illustratessome of the conceptswe have discussedso far.

1.2 An Example
Let us consider a simple example that rnost readers may be familiar with: a
UNIVERSITY
databasefor maintaining information concerningstudents,courses,
grades
in
and
a universityenvironment.Figure 1.2showsthe databasestructureand
some sample data for such a database.The databaseis organized as five files,eachof
which storesdata records of the same type.r The STUDENTfile storesdata on each
student,the COURSEfile storesdata on eachcourse,the SECTIONfile storesdata
on eachsectionof a course,the GRADE_REPORT
file storesthe gradesthat students
receivein the various sectionsthey have completed,and the PREREOUISITE
file
storesthe prerequisitesof eachcourse.
To defne this database,we must specifothe structure of the records of each file by
specifring the different types of data elements to be stored in eachrecord. In Figure
1 .2 , e a c h S T U D EN T re c o rd i ncl udes data to represent the student' s N ame,
Student_number,
Class(such as freshmanor'1', sophomoreor'2', and so forth), and
2 , T h e te r m q u e r y,o r ig in a llym e a n inga questi onor an nqu ry,rs l oosel yused for al l types of i nteracti ons
w th d a ta b a se s,ln clu d in gm o d fyin g the data,
3. We use the term ftle nlormally here. At a conceptua level,a ftle is a collectionof records that may or
may not be ordered.

I.2 A n Exam ple

l.rt.rbase
raring a
r c ou s l v.
. c : t s t or
ransac-lt.rtr.rse.

Users/Programmers

Application
Programs/Oueries

:,rt.rbase
. : ai tt( )ll
.' :,';t i L t ll

. r li f!
"c
. : cn r bv

. , ' _ rllllt ''t


'
"'

Softwareto Process
Oueries/Programs

Software to Access
Stored Data

: . r : irt he
: a. liL 'I J CO n-

\ )illPlex
. ' Iirr'are
. r ic d is-

StoredDatabase
D efi ni ti on
(Meta-Data)
Figur e 1. 1
A s m plf ieddat abase
sy sTerenvr
n r onm enL

r ' it h : a
-rtL lI S !S r

'.rrc..1nd
.r . h of
\r ll C a C h

:l: d;rta
. :ud ents
) lE tl le

r tilc'b.v
. I.i,,,,.p

N a me,
t h . a nd

Major(suchasmathematicsor'MAf H'and computerscienceor'CS'); eachCOURSE


recordincludesdata to representthe Course-name,
Credit-hours,
and
Course-number,
(the departmentthat offersthe course);and so on. We must alsospecirya
Deparlment
data tlpe for each data elemer.rtwithin a record. For example,we can specifz that
Name of STUDENTis a string of alphabeticcharacters,
Student_number
of STUDENT
is an integer,and Gradeof GRADE_REPORT
is a singlecharacterfrom the set { A,'Bl
'C', 'Dl 'F','I']. We may also use a coding schemeto representthe valuesof a data
item. For example,in Figure 1.2we representthe Classof a STUDENTas L for freshman, 2 for sophornore,3 for junior, 4 for senior,and 5 for graduatestudent.
To constructthe UNIVERSITYdatabase,we store data to representeach student,
course,section,grade report, and prerequisiteas a record in the appropriatefile.
Notice that records in the various files may be related.For example,the record for
file that
Smithin the STUDENTfile is relatedto two recordsin the GRADE-REPORT
specif' Smith'sgradesin tlvo sections.Similarly,each record in the PREREOUISITE
file relatestwo courserecords:one representingthe courseand the other representinclude many types of
and largedatabases
ing the prerequisite.Most n-redium-size
among the records.
recordsand haventany reltttionslrlps

Chaoter1 Databasesand DatabaseUsers

ST U D E N T
Student_number

Name

Class

Major

CS

CS

17

Smith
Brown

C OU R S E
Course name

Coursenumber Credit_hours Department

Intro to Comouter Science

c s 13 10

CS

DataStructures

cs3320

CS

DiscreteMathematics

MATH241
O

Database

css380

MATH
e

CS

SECTION
Section_identifier Course_number Semester

Instructor

85

MA TH 241
O

Fall

92

Fall

o4
o4

Anderson

102

c s 13 10
cs3320

Spring

05

Knuth

112

MATH241
O

Fall

05

Chang

119

c s 13 10
cs3380

Fall

05

Anderson

Fall

05

Stone

135

GRADE_REPORT
Student_number Section identifier

Grade

17

112

17

119

85

92

tJ

102

135

PR ER EQU ISIT E
Course_number
Figur e 1. 2
A database
thatstores
s t udentandc our s e
i nf or m at ion.

Year

cs3380
cs3380
cs3320

Prerequisite_number

cs3320
MATH241
O

csl 310

King

1.3 Characteristics
of the Database
Approach

Databasemanipulationinvolvesquerying and updating.Examplesof queriesare as


follows:
'' Retrievethe transclipt-a list of all coursesand grades-of 'Smith'
':' List the nar-nesclI studentsrvho took the section of the 'Database'course
offeredin fall 2005 and their graclesin that section
,.'l List the prerequisitesof the'Database'course
Examplesof updatesinclude the following:
, , Changethe classof 'Smith' to sophor.nore
., Createa new sectionfor the'Database'collrsetbr this semester
li; Enter a gradeof A for'Smith' in the 'Database'sectionof last semester.
Theseinformal queriesand updatesmust be specifiedpreciselyin the qr.rerylanguageof the DBMS befole they can be processed.
At this stage,it is usetul to describethe databaseas a part of a larger undertaking
known as an infornlation system rvithin ar-ryorganization. The Inforr-nation
Technology(lT) departn-ientrvithin a companydesignsirnd maintainsan information systemconsistingof various computers,storagesystems,applicationsoftware,
and databases.
Design of a new applicationfor an existingdatabaseor designof a
new ciatabasestarts ofT n,ith a phasecalled requirements definition and analysis.
Theserequirementsare documentedin detail and tr:rnsformedinto ir conceptual
design that can be representedand manipulatedusing some cor.nputerized
tools so
that it can be easilyrnaintained,modified, and transformedinto a databaseimplern e n ta ti o n .We w i l l i r.rtl o d ucea model cal l ed the E nti t,v-R el ati onshi rnodel
p
in
Chapter 3 that is used fbr this purpose.The design is then translatedto a logical
design that can be erpressedir.ra datirmodel implementedin a commercialDBMS.
In this book we will eurphasizea data rnodel knowl-ras the RelationalData rnodel
fiom Chapter 5 onward.This is currently the most popular approachfor designing
and implementing databases
using (relational)DBMSs. The final stageis physical
design,during which further specificationsare provided for storing anclaccessing
the database.The databasedesignis in.rplen.rented,
propr.rlated
with actualdata and
continuouslymaintainedto reflectthe stateof the rrinilvorld.

1.3 Characteristics
of the DatabaseApproach
A nur-nberof characteristics
distinguishthe databaseapproachtron-rthe traciitional
a p p ro a c h o f p ro g ra m m i n g r,vi thfi l es. In tradi ti onal fi l e processi ng,each user
clefinesand implementsthe filesneeded[or a specificsoflwareapplicationaspart of
programming the application. For example,one user,the grade reportingoftice,may
keepa file on studentsar-rdtheir grades.Programsto print a student'stranscriptand
to entel'nervgradesinto the file are implementedas part of the applicatior.r.
A sectrnd user,the accoLutting
oJlice,mav keeptrack of students'f'eesand their payu.rr'nts.
Although both usersare interestedin data about students,eachuser maintainssepirratefiles-and programs to manipulate thesefiles-because each requiressome

't0

Chapier 1 Databases and Database Users

data not availablefrom the other user'sfiles. This redundancy in defining and storing data results in wasted storagespaceand in redundant efforts to maintain common up-to-datedata.
In the databaseapproach, a single repository of data is maintained that is defined
once and then accessedby various users.In file systerns,each application is free to
name data elementsindependently.In contrast,in a database,
the namesor labelsof
data are defined once, and used repeatedlyby queries,transactions,and applications. The main characteristics
of the databaseapproachversusthe file-processing
approach are the following:
Self-describing
nature of a databasesystem
rt Insulationbetweenprogramsand data,and data abstraction
;ir: Support of multiple views of the data
, Sharingof data and multiuser transactionprocessing
We describeeachof thesecharacteristics
in a seuaratesection.Wewill discussadditional characteristics
of datirbase
systerrsin Seciions1.6throush I.8.

1.3.1 Self-Describing
Natureof a DatabaseSystem
A fundamentalcharacteristicof the databaseapproachis that the databasesystem
containsnot only the databaseitselfbut alsoa completedefinition or descriptionof
the databasestructureand constraints.This definition is storedin the DBMS catalog, which containsinformation suchas the structureof eachfile, the type and storageformat of eachdata iten, and various constrrrintson the data.The information
stored in the catalogis calledmeta-data,and it describesthe structure of the prima ry d a ta b a s e(F i g u re1 .1 ).
The catalogis usedby the DBMS softwareand alsoby databaseuserswho need information about the databasestructure.A general-purposeDBMS softwarepackageis
not written fbr a specrficdatabaseapplication. Therefore,it must refer to the catalog
to know the structure of the files in a specificdatabase,such asthe tvpe and format of
data it will access.The DBMS sofiware must work equally well with any number of
databaseapplications-for example,a r-rniversitydatabase,a banking database,or a
company database-as long as thc'databasedeflnition is stored in the catalog.
In traditional file proce'ssing,
data definition is typicallypart of the applicationprograms themselves.Hence,theseprograms irre constrainedto work with only one
speciJ'ic
dotobase,whose structure is declared in the application programs. For
example,an applicationprosram written in C++ may havestruct or classdeclarations, and a COBOL program has data division statementsto define its files.
Wherezrsfile-processingsoftrvarecalnaccessonly specificdatabases,DBMS software
can accessdiversedatabasesby extracting the databasedefinitions from the catalog
and then using thesedefinitions.
For the example shown in Figure 1.2,the DBMS catalogwill store the definitions of
all the files shown. Figure 1.3shorvssornesampleentries in a databasecatalog.These

'I.3 Characteristics
of the Database
Aooroach

I storcomc t rn ed
iree to
r cls of
'r. l i r r-

11

clefinitionsarespecitiecl
by the database
designerprior to creatingthe actualdatabase
irnd arestoredin the catalog.\\4rener,era re(luestis made to access,
say,the Nameof a
STUDENTrecord,the DBMS softwarerefersto the catalogto determinethe structure
o f th e ST U D E N Tfi l e a n d t he posi ti on and si ze of the N amedata i tem w i thi n a
STUDENTrecord.By contrast,in a typical file-processing
application,the file structure and, in the extremecase,the exactlocation of Namewithin a STUDENTrecord
lre alreadycoded within eachprogram that accesses
this datariten-r.

c':SI I1$

1.3.2 InsulationbetweenProgramsand Data,


and Data Abstraction

a dd i-

In traditional file processing,the structureof data files is embeddedin the application programs,so an\/ changesto the structure of a file may require changingoll progr:turrsthat accessthat trle. By contrast, DBlvlS accessprogranrs do not require sr-rcl'r
in most cases.
The structureof data t'ilesis storedin the DBMS catalogsc'pachar.rges
rately from the irccessprograms.We call this property program-data independence.

Figur e 1. 3
An exam ple
of a

RELATIONS
,\'sIem

Relation_name

N o of col umns

it.ln of

S T U D EN T

i cataI stor:'lJtion
r. ' n r i -

C OU R SE

S EC T ION

r', OI &

n P rolt' one
:. For
.'clara; t-iles.
fin'are
.ltirloe
o n s of
'Ihese

l^+^h^-^
u4tdudJg

GR AD E R E POR T
PR ER EOU ISIT E

r nt b r i.rgc is
.italog
nratof
tlrcrof

l^+^!.^-^
uaLdua)q

C OL U MN S
Column_name

Data_type

Belongs_to_relation

Name

Character(30)

S TU D E N T

Student_number

Character(4)

S TU D E N T

Class

Integer(1)

S TU D E N T

Major

Major_type

S TU D E N T

Course name

C h a r a c t e r( l 0 )

C OU R S E

C o u rs en u m b e r

XXXXNNNN

C OU R S E

Prerequisite_num
ber

XXXXNNNN

P R E R E OU IS ITE

Vo tc:Va jo r r T p er s o e ' r e d a s a er-'r-p'a'co type w rl '3 ^row "l rral ors.XX X X N N N N


s u se d to d e fin e a typ e with fo u r al phacharactersfo ow ed by four d g ts

lar4
-^+-

^^
vv

t^.
rur

;^ tr;^,,.^
rrr19utg

+A^
u rc
1 .)
r,z,

12

and DatabaseUsers
Chapter1 Databases

For example,a file accessprogram may be written in such a way that it can access
only STUDENTrecords of the structure shown in Figure 1.4. If we want to add
another pieceof data to eachSTUDENTrecord,saythe Birth_date,
such a program will
no longer work and must be changed.By contrast,in a DBMS environment, we only
need to changethe descriptionof STUDENTrecordsin the catalog(Figure 1.3) to
reflect the inclusion of the new data item Birth-date;no programs are changed.The
next time a DBMS program refersto the catalog,the new structure of STUDENT
recordswill be accessedand used.
In some types of databasesystems,such as object-orientedand object-relational
systems(seeChapters20 through 22), userscan defineoperationson data as part of
the databasedefinitions. An operation (also calleda iunction or methorl)is specified
in two parts. The interface(or signature)of an operation includes the operation
name and the data types of its arguments(or parameters).The implementation(or
method) of the operation is specifiedseparatelyand can be changedwithout affecting the interface.User applicationprograrrs can operateon the data by invoking
theseoperationsthrough their namesand arguments,regardless
of how the operations are implemented. This may be termed program-operation independence.
The characteristicthat allows program-data independenceand program-operation
independenceis called data abstraction. A DBMS provides userswith a conceptual
representation of data that does not include many of the details of how the data is
stored or how the operations are implemented. Informally, a data model is a type of
data abstractionthirt is used to provide this conceptualrepresentation.The data
model useslogical concepts,such as objects,their properties,and their interrelationships, that may be easierfor most users to understand than computer storage
concepts.Hence, the data model hides storageand implementation detirilsthat are
not of interestto most databaseusers.
For example,reconsiderFigures1.2 and 1.3.The internal implementationof a file
may be defined by its record length-the number of characters(bytes) in each
record-and eachdata item may be specifiedby its starting byte within a record and
its length in bytes.The STUDENTrecord would thus be representedas shown in
Figure 1.4.But a typical databaseuser is not concernedwith the location of eachdata
item within a record or its length; rather,the user is concernedthat when a reference
is made to Nameof STUDENT,the correct value is returned.A conceptuarlrepresentation of the STUDENTrecordsis shown in Figure L2. Many other detailsof file storage
organization-such asthe accesspaths specifiedon a file-can be hidden from databaseusersby the DBMSi we discussstoragedetailsin Chapters 13 and 14.

Figu re 1.4
lnternalstorageformat
fo r a STUDENT
record,based on the
databasecatalogin
Fig ure 1 .3.

Data ltem Name

StartingPositionin Record

Name

Studentnumber

1
JI

Lengthin Characters(bytes)
30
4

Class

Major

1.3 Characteristics
of the Database
Approach

n access
r t o add
ram will
\\'COnly
r 1. 3 )to
:cd. The
rU DENT
' l .rt ional
. ltart of
rccif-ied
. cr a t io n
: l rorr( o r
t .rttectnvoking
_'()pera:nce.
.r'ration
rceptual
.' data is
r t vpeof
h e d ata
Itr'rrela:torage
Ihat are
oi a file
i n e ach
ord and
rorvn in
rch data
-'t-erence
resentastorage
nr data-

In the databaseapproach,the detailedstructure and organizationof each file are


storedin the catalog.Databaseusersand applicationprogramsrefer to the conceptual representationof the files, and the DBMS extractsthe detailsof file storage
t}om the catalogwhen theseare neededby the DBMS file accessmodules.Many
clatamodelscan be usedto provide this data abstractionto databaseusers.A major
part of this book is devotedto presentingvariousdatamodelsand the conceptsthey
useto abstractthe representationofdata.
the abstractionprocessincludes
In object-orientedand object-relationaldatabases,
not only the data structurebut also the operationson the data. Theseoperations
provide an abstractionof miniworld activitiescommonly understoodby the users.
can be appliedto a STUDENTobjectto
For example,an operationCALCULATE_GPA
c a l c u l a teth e g ra d e p o i n t average.S uch operati onscan be i nvoked by the user
qLleriesor application programs without having to know the details of how the
operationsare implemented.In that sense,an abstractionof the miniworld activity
is rnade availableto the user as an abstract operation.

1.3.3 Supportof MultipleViewsof the Data


.\ databasetypically has many users,eachof whom may require a different perspecA view may be a subsetof the databaseor it may contive or view of the database.
tain virtual data that is derived from the databasefiles but is not explicitly stored.
Someusersmay not needto be awareof whether the data they refer to is storedor
tierived.A multiuser DBMS whoseusershavea varietyof distinct applicationsmust
provide facilitiesfor defining multiple views.For example,one userof the database
and prir-rtingthe transcriptof each
of Figure l.2may be interestedonly in accessing
student;the view fbr this useris shown in Figure 1.5(a).A seconduser,who is interestedonly in checking that students have taken all the prerequisitesof each course
tbr which they register,may requirethe view shown in Figure 1.5(b).

1.3.4 Sharingof Dataand MultiuserTransactionProcessing


the data,\ multiuserDBMS, asits name impiies,must allow multiple usersto access
is
to
be intebaseat the sametirne.This is essentialif data for multiple applications
must
include
concurrency
grated and maintained in a single database.The DBMS
control softwareto ensurethat severaluserstrying to update the same data do so in
when
.r controlledmanner so that the resultof the updatesis correct.For exarrrple,
flight,
the
DBMS
should
severalreservationclerkstry to assigna seaton an airline
by only one clerk at a time for nssignmentto a
ensurethat eachseatcan be accessed
pirssenger.
These types of applications are generallycalled online transaction pro(OLTP)
cessing
applications.A fundamental role of multiuser DBMS software is to
ensurethat concurrent transactionsoperatecorrectly aird efficiently.
The conceptof a transaction has becomecentralto many databaseapplications.A
transactionis an exeaftitrgprogratnor p,ocessthat includesone or more database
accesses,
such as readingor updatir-rgof databaserecords.Eachtransactionis supposedto executea logically correct databaseaccessif executedin its entirety without
interferencefrom other transactions.The DBMS must enforceseveraltransactiot.t

't 3

14

Chaoter 1 Databasesand DatabaseUsers

TRANSCRIPT
Student_name
S m i th

Brown

(a)

Student_transcript
Course_number Grade

Semester

cs1310

Fall

Year

Section_id

05

119

M AT H 241
O

Fall

05

112

M AT H 241
O

Fall

o4

85

csl 310
cs3320

Fall

04

92

S pri ng

05

102

cs33B0

Fall

05

135

COURSE_PREREOUISITES
Course_name

(b)

Course_number Prerequisites

Database

cs3380

DataStructures

cs3320

cs3320
MA TH 241
O

c s 13 10

Figur e 1. 5
T wov iewsder iv e idro mth e d a ta b a sien F i g u re1 .2 (. a)TheTR A N S C R IPviTew
(b)T heCO URS E -P R E R E ISIT
view.
OU E S

properties.The isolation property ensuresthat eachtransactionappearsto execute


in isolationfrom other transactior.rs,
eventhough hundredsof transactionsmay be
executingconcurrently. The atomicity property ensuresthat either all the database
operationsin a transacticlnare executedor none are. We discusstransactionsin
detail in Piirt 5.
The precedingcharacteristics
are rnost important in distinguishinga DBMS from
traditional file-processingsoftware.In Section 1.6 we discussadditional features
that characterizea DBMS. First, however,we categorizethe different types of people
w h o w o rk i n a d a ta b a s e
s vstenlenvi rorrrrrerrt.

1.4 Actors on the Scene


discussedin Section1.1,
For a small personaldatabase,such as the list of addresses
and there is
one persontypically defines,constructs,and manipulatesthe database,
no sharing.However,in large organizations,many people are involved in the design,
use,and maintenanceof a large databasewith hundreds of users.In this section we
identifz the people whosejobs involve the day-to-day useof a large database;we call
them the octorson tlrcscene.lnSection 1.5 we considerpeoplewho may be called
workersbehind the scene-those who rvork to maintain the databasesystem environment but who are not activelvinterestedin the databaseitself.

1 . 4 A c t o r s o n t he Sce n e

1.4.1 DatabaseAdministrators
----: lon

-l

lq

12

lro

I
I

rcl
)t

'-1
)2
,"1

h.rany organizationwheremany peopleusethe sameresources,


there is a needfor a
chief administratorto overseeand managetheseresources.
In a databaseenvironment, the primary resourceis the databaseitsell and the secondaryresourceis the
DBMS and relatedsoftrvare.Admir-ristering
theseresourcesis the responsibilityof
the databaseadministrator (DBA). The DBA is responsiblefor authorizingaccess
to the database,coordinatingand monitoring its use,and acquiring softwareand
hardwareresourcesas needed.The DBA is accountablefor problemssuch asbreach
of securityor poor systemresponsetime. In largeorganizations,the DBA is assisted
by a staff that carriesout thesefunctions.

1.4.2 DatabaseDesigners

c\ecute
nlaYbe
l.rtabase
t i o n s in
lS fiom
teatures
i people

io n 1 .1,
t he r ei s
d esign ,
' t lon we
: rr'ecall
c called
nr ent'i-

Databasedesigners are responsiblefor identifring the data to be stored in the databaseand for choosingappropriatestructuresto representand storethis data.These
tasksare mostly undertakenbeforethe databaseis actuallyimplementedand poputo communicatewith
lated with data. It is the responsibilityof databasedesigr-rers
all prospectivedatabaseusersin order to understandtheir requirementsand to create a designthat meetstheserequirements.In many cases,the designersare on the
sttrff of the DBA and may be assignedother staff responsibilitiesafter the database
clesignis completed.Databasedesignerstypicallyinteractwith eachpotentialgroup
of users and develop views of the databasethat meet the data and processing
requirements of these groups. Each view is then analyzed and integratedwith the
viewsof other usergroups.The final databasedesignmust be capableof supporting
the requirementsof all usergroups.

1 .4 .3E n d U se rs
End users are the peopie whose jobs require accessto the databasefor queryir-rg,
There
updating,and generatingreports;the databaseprimarily existsfor their r"rse.
categories
of
end
users:
are several
'

but they may needdifferCasualend usersoccasionallyaccessthe database,


ent information eachtime. They usea sophisticateddatabasequer)rlanguage
to specifr their requestsand are typically middle- or high-levelmanagersor
other occasionalbrowsers.

, Naive or parametric end users make up a sizableportion of databaseend


users.Their main job function revolvesaround constantly querying and
typesof queriesand updates-called
updating the database,using star-rdard
canned transactions-that havebeen carefully programmed and tested.The
tasksthat such usersperform arevaried:
Bank tellerscheckaccountbalancesand post withdrawalsand deposits.
Reservationclerks for airiines, hotels, and car rental companiescheck
availability for a given requestand make reservations.

15

16

Chapter1 Databasesand DatabaseUsers

Clerksat receivingstationsfor shipping companiesenter packageidentifications via bar codesand descriptiveinformation through buttons to
updatea centraldatabaseof receivedand in-transit packages'

cs Sophisticated end users include engineers,scientists,businessanalysts,and


others who thoroughly familiarize themselveswith the facilities of the
D B M S i n o rder to i mpl ement thei r appl i cati onsto meet thei r complex
requirernents.
w Standalone users maintain personaldatabasesby using ready-madeprogram packagesthat provide easy-to-usemenu-based or graphics-based
interfaces.An example is the user of a tax packagethat storesa variety of personalfinancialdata for tax purposes'
A typical DBMS provides rlultiple facilitiesto accessa database.Naive end users
n"Jto learn very little about the facilities provided by the DBMS; they simply have
to understand the user interfacesof the standard transactionsdesignedand implemented for their use. Casual users learn only a few facilities that they may use
repeatedly.Sophisticatedusers try to learn most of the DBMS facilities in order to
u.hi.u. their iomplex requirements.Standaloneuserstypically become very proficient in using a specificsofiware package'

1.4.4 System Analystsand ApplicationProgrammers


(SoftwareEngineers)
naiveaud
of end users,especially
Systemanalystsdeterminethe requirements
that meet
for cannedtransactions
parametricend users,and developspecifications
as
Application programmersimplementthesespecifications
iheserequiren-tents.
transacp.og.u-r; then they test,debug,document,and maintainthesecanned
ly referredto assoftwaredeveliioris. Sucnanalystsand programmer
opersor softwareengineers-shouldbe familiarwith the full rangeof capabilities
providedby the DBMSto accomplishtheir tasks.

1.5 Workersbehindthe Scene


in addition to those who design, use, and adrninister a database,others are associatedwith the design,development,and operationof the DBMS softwareutd systern
environment Theie p..ro.r, u.. typically not interestedin the databaseitself' We call
them the workersbehind the scene,and they include the following categories:
w DBMS system designers and implementers design and implement the
DBMS modules and interfacesas a softwarepackage.A DBMS is a very complex softwaresystemthat consistsof many components,or modules, including modules for implementing the catalog, processingquery language,
processilg the interface, accessingand buffering data, controlling conculi.n.y, unJ handling data recovery and security. The DBMS must interface
with other system *ft*u.. such as the operating system and compilers for
various programming languages.

1.6 Advantages
of Usingthe DBMS Approach

L-nti'ns to
; , a nd
i t he
: r p lex

P ro ra sed

LlSers
have

l Tool developers design and implernent tools-the software packagesthat


fa c i l i ta te d a ta b a s em o del i ng and desi gn, databasesystem desi gn, and
improved perfbrmance.Tools are optional packagesthat are often purchased
They include packagesfbr databasedesign,performancemoniseparately.
toring, natural languageor graphical interfaces,prototyping, simulation,
a n d te s t d a ta g e n e ra ti on.In many cases,i ndependentsoftw arevendors
develop and rnarket thesetools.
,, Operators and maintenance personnel (systemadministration personnel)
are responsiblefor the actualrunning and maintenanceof the hardwareand
softwareenvironmentfor the databasesystem.
Although thesecategoriesof workersbehind the sceneare instrumentalin making
the databasesystemavailableto end users,they typicallydo not usethe databasefor
their orvn purposes.

rr
n le "l ' _

u Ll se

: cr t o
.rofi-

' a nd
meet
ns as
r sa cevellities

s ocl'ttent
e call
t t he
:om:ludl a ge ,

lcurrt-ace
s tbr

1.6 Advantagesof Using the DBMS Approach


of using a DBMS and the capabilIn this sectionwe discusssome of the advantages
Thesecapabilitiesare in addition to the four
itiesthat a good DBMS should possess.
nririncharacteristics
discussedin Section1.3.The DBA rnust utilize thesecapabilities to accomplisha variety of objectivesrelatedto the design,administration,and
useof a largernultiuserdatabase.

R ed undancy
1 .6 .1G on trol l i ng
In traditional software developrnent utilizing file processing,every user group
applications.For example,
nraintainsits own files for handling its data-processing
databaseexampleof Section1.2;here,two groupsof users
considerthe UNIVERSITY
might be the courseregistrationpersonneland the accountingoffice.In the traditior-ralapproach,eachgroup independentlykeepsfiles on students.The accounting
and relatedbilling information, whereasthe regisofficekeepsdata on registratior.r
tration office keepstrack of studentcoursesand grades.Other groups may further
cluplicatesome or all of the samedata in their own files.
l'his redundanry in storing the samedata rnultipletimes leadsto severalproblems.
First,there is the need to perform a singlelogicalupdate-such as enteringdata on
a new student-n-rultiple times: once fbr each file where student data is recorded.
This leadsto duplicotiorrof eJfort.Second,stlrage spaceis wastedwhen the samedata
Third, files
and this problen-rmay be seriousfor largedatabases.
is storedrepeatedly,
This may happenbecausean
that representthe samedata n-raybecomeinconsistent.
Lrpdateis appliedto some of the filesbut not to others.Evenif an update-such as
adding a new student-is applied to all the appropriatefiles,the data concerning
becausethe updatesare appliedindependently
the student may stili be incorrsisterrf
by each user group. For example,one user group may enter a student'sbirthdate
r-rroneouslyas'lAN-19-1988',whereasthe other Llsergroups may enter the correct
v a l u eo f ' J A N -2 9 -I9 8 8 :

17

18

Chaoter1 Databasesand DatabaseUsers

In the databaseapproach, the views of different user groups are integrated during
databasedesign. Ideally, we should have a databasedesign that stores each logical
data iterr-such as a student'sname or birthdate-in only oneplacein the database.
This ensuresconsistencyand savesstoragespace.However, in practice, it is sometimes necessaryto use controlled redundancy to improve the performance
of queries.For example,we may storeStudent-name
and Course-number
redundantly
i n a GR A D E _ R E POR Tfi l e (Fi gure 1.6(a)) becausew henever w e retri eve a
GRADE_REPORT
record,we want to retrievethe studentname and coursenumber
along with the grade,student number, and section identifier. By placing all the data
together,we do not have to searchmultiple frlesto collectthis data.In such cases,
the DBMS should havethe capabilityro controlthis redundancyin order to prohibit
inconsistencies
among the files.This may be done by automaticallycheckingthat
the Student_name-student_number
valuesin any GRADE_REPORT
record in Figure
1.6(a)match one of the Name-Student_number
valuesof a STUDENTrecord (Figure
1.2).Sirnilarly,the Section-identifier-Course-number
valuesin GRADE_REPORT
can
be checkedagainstSECTIONrecords.Such checkscan be specifiedto the DBMS
during databasedesign and autorratically enforced by the DBMS whenever the
GRADE_REPORT
flle is updated.Figure1.6(b)showsa GRADE_REPORT
recordthat
is inconsistentrvith the STUDENTfile of Figure 1.2, rvhich may be entered erroneouslyif the redundar.rcy
is trotcorffrLtlled.

1.6.2 RestrictingUnauthorizedAccess
When multiple userssharea large dirtabase,
it is likely that most userswill not be
autirorizedto accessall inforrnation in the database.For example,financialdata is

F igur e 1. 6
storageof Student_name
andCourse_name
in GRADE_REPORT
Redundant
(a) Consistent
record.
data (b) Inconsrstent
GRADE REPORT
Student_number Student name
'17
S mth

(a)

Section identifier Coursenumber

Grade

112

MATH241
O

17

Smrth

119

cs1310

Brown

B5

MA TH 241
O

Brown

92

Brown

102

Brown

135

cs1 310
cs3320
cs3380

B
A

GRADE REPORT
Student_number Student_name Section_identifier
Course_number Grade
(b)

17

Brown

112

MATH241
O

1.6 A dvantages
of U si ngthe D B MSA p pr oacn
d u r ing
logical
rabase.
somentance
dantly
le ve a
u r n ber
lc data
aJSe s,
'o hibi t
rrr
'5 t"h'-'e t
F
iurr
' 't"
'- rp

Figure
i T can

)ts\{s

cr t he
-d that
erro -

r o t be
la t a is

otten consideredconfider-rtial,
and oniy authorized personsare allowed to access
sr.rch
data.In addition, solre usersmav only be pe1p111.6to retrievedata,whereas
others are allowed to retrieveand upclate.Hence,the type of accessoperationre-trievalor updirte-must also be controlled. Typically,usersor user groups are
givenaccountnrilirbersprotectedbv passrvords,
which tl'reycan useto gtrinaccessto
the database.A DBMS should provide a security and authorization subsystem,
rr.hichthe DBA usesto createaccountsand to specifuaccountrestrictions.Then, the
t)BMS should enfbrce rheserestrictionsautomaticallv.Notice that we can apply
sirr-rilar
controls to the DBMS software.For example,only the DBA's staff may be
.rllolvedto use certain privileged software, such as the software for creirting ner.v
accounts.Similarly,parametric usersmay be allowed to accessthe databaseonly
through the cannedtransactionsdevelopedfor their use.

1.6.3 ProvidingPersistentStoragefor programObjects


[)atabases
can be usedto provide persistentstoragefbr program objectsand data
structures.This is one of the n.rainreasonsfor object-oriented database systems.
I)ro g ra m m i n gl a n g u a g e stypi cal l l ' have compl ex data structures,such as record
tvpesin Pascalor classdefinitionsin c++ or Ja',,a.
The valuesof program variables
are discardedonce a prograrn terminates,unlessthe programmer explicitly stores
thenl in permanentfiles,which ofien involvescor.rverting
thesecomplex structures
irlto a format suitablefor file storage.When the need arisesto read this data once
tllore, the programmer must convert from the file format to the progran variable
\tru c tu re .o b j e c t-o ri e n te dd arrbasesysrems
arecompati bl ew i th pi ogi arnmi ngl anguagessuch as C++ and Java,and the DBMS softwareautomaticallyperforms any
necessary
conversions.Hence,a complex object in c++ can be storedpermanentlv
in an object-orientedDBMS. Suchan objectis saidto be persistent,sinceit survives
the termination of progr:rmexecutionand can laterbe directlyretrievedby another
C + + p ro g ra m.
The persistentstorageoiprograrn objectsand data structuresis an irnportant functior.rof databasesystems.Traditionaldatabasesvstemsofien sufferedfrorn the socalled impedance mismatch problem, since the data structuresprovided by the
l )B M S w ' e rei n c o rn p a ti h l ew i th the programrni ng l anguage' si ata structures.
Object-oriented databasesystemstypicalll, offer data structure compatibility rvith
otle or nrore object-orientedprogramming languages.

1.6.4 ProvidingStorageStructuresfor Efficient


Query Processing
Databasesystemsmust provide capabilities tor efficientlyexecutirtgtltteriesoncl
updates.Becausethe databaseis typicallv storedon disk, the DBMS must provide
specialized
datirstructuresto speedup clisksearchfbr the desiredrecords.Auxiiiary
tlles calledindexes are used for tl'rispurpose. Indexesare typically basedon tree data
structuresor hash data structures,suitably modified for disk search.In order to
processthe databaserecordsneededbv a particular query, those recordsmust be

19

20

Chaoter1 Databasesand DatabaseUsers

copied from disk to memory. Therefore, the DBMS often has a buffering module
that maintains parts of the databasein main memory buffers. In other cases,the
DBMS may use the operating systemto do the buffering of disk data.
The query processing and optimization module of the DBMS is responsible for
choosing an efficient query executionplan for eachquery basedon the existing storage structures.The choice of which indexesto createand maintain is part of physical databasedesignand tuning, which is one of the responsibilitiesof the DBA staff.
We discussthe query processing,optimization, and tuning in detail in Chapters 15
and 16.

1.6.5 ProvidingBackupand Recovery


A DBMS must provide facilities for recoveringfrom hardware or software failures.
The backup and recovery subsystem of the DBMS is responsiblefor recovery.For
example, if the computer system fails in the middle of a complex update transaction, the recovery subsystemis responsiblefor making sure that the databaseis
restoredto the stateit was in before the transaction started executing.Alternatively,
the recoverysubsystemcould ensurethat the transaction is resumed from the point
at which it was interrupted so that its full effect is recorded in the database.

1.6.6 ProvidingMultipleUser Interfaces


Becausemany types of userswith varying levelsof technical knowledge use a database,a DBMS should provide a variety of user interfaces.Theseinclude query languages for casual users, programming language interfaces for application
programmers,forms and command codesfor parametric users,and menu-driven
interfacesand natural languageinterfacesfor standaloneusers.Both forms-style
interfacesand rnenu-driven interfacesare commonly known as graphical user
interfaces (GUIs). Many specializedlanguagesand environments exist for specifuing GUIs. Capabilities for providing Web GUI interfacesto a database-or Webenabling a database-are also quite common.

1.6.7 Representing
ComplexRelationships
among Data
A databasemay include numerous varietiesof data that are interrelatedin many
ways. Consider the example shown in Figure 1.2. The record for'Brown'in the
STUDENTfile is relatedto four recordsin the GRADE REPORTfile. Similarly,each
sectionrecord is relatedto one courserecord and to a number of GRADE_REPORT
records-one for eachstudentwho comoletedthat section.A DBMS must havethe
capability to representa variety of complex relationshipsamong the data, to define
new relationships as they arise, and to retrieve and update related data easily and
efficiently.

1.6.8 EnforcinglntegrityConstraints
Most databaseapplications have certain integrity constraints that must hold for
the data.A DBMS should provide capabilitiesfor defining and enforcing thesecon-

1.6 Advantages
of Usingthe DBMS Approach

j module
:rses,the
r'iblefor
: ing storpltysi'ti
tll.\ staff.
r rr t rrs

l5

lailu re s.
r crr'.For
I r Jn s ac: . r b asei s
hc point

. a data: er v l an' l i c i i t i on
-i-driven

nrs-style
cal user

straints.The simplest type of integrity constraint involvesspeciryinga data type for


each data item. For example,in Figure 1.3,we specifiedthat the value of the Class
data item within each STUDENTrecord must be a one digit integer and that the
value of Namemust be a string of no more than 30 alphabeticcharacters.To restrict
the value of Class between I and 5 would be an additional constraint that is not
shown in the current catalog.A more complex type of constraint that frequently
occurs involves specifying that a record in one file must be related to records in
other files. For example,in Figure 1.2,we can specify that everysectionrecordmust
be relatedto a courserecord.Another type ofconstraint specifiesuniquenesson data
item values,such as everycourserecordffiust havea unique valuefor Course_number.
Theseconstraints are derived from the meaning or semantics of the data and of the
miniworld it represents.It is the responsibility of the databasedesignersto identiS'
integrity constraintsduring databasedesign.Some constraintscan be specifiedto
the DBMS and automatically enforced.Other constraintsmay haveto be checkedby
updateprogramsor at the time of data entry.For typical largeapplications,it is customary to call such constraintsasbusinessrules.
A data item may be entered erroneously and still satisfythe specifiedintegrity constraints.For example,if a studentreceivesa gradeof A but a gradeof 'C'is entered
in the database,the DBMS cannotdiscoverthis error automaticallybecause'C' is a
valid value for the Grade data type. Such data entry errors can only be discovered
manually (when the student receivesthe grade and complains) and corrected later
by updating the database.However, a grade of 'Z' would be rejected automatically
by the DBMS because'Z'is not a valid value for the Gradedata type.When we discusseachdata model in subsequentchapters,we will introduce rulesthat pertain to
that model implicitly.For example,in the Entity-Relationshiprnodelin Chapter3, a
relationship must involve at least two entities. Such rules are inherent rules of the
data model and are automaticallyassumedto guaranteethe validity of the model.

.pecifrtrr \\reb-

; n m anv
r ' in t he
rlr',each
I EPO R T
havethe
o define
. ilv an d

rold fbr
a5CCO n-

1.6.9 PermittingInferencingand ActionsUsing Rules


Some databasesystemsprovide capabilitiesfor defining deductionrules for inferencing new information from the stored databasefacts.Such systemsare called deductive databasesystems.For example,there may be complex rules in the miniworld
application for determinir-rgwhen a student is on probation. Thesecan be specified
declarativelyas rules, which when compiled arndmaintained by the DBMS can determine all studentson probation. In a traditional DBMS, an explicit proceduralprogram
codewouldhaveto be written to support such applications.But if the miniworld rules
change,it is generallymore convenientto changethe declareddeduction rulesthan to
recodeprocedural programs. In today's reiational databasesystems,it is possibleto
associatetriggers with tables.A trigger is a form of a rule activatedby updatesto the
table, which resultsin performing some additional operations to some other tables,
sendingmessages,
and so on. More involvedproceduresto enforcerules are popularly
called stored procedures; they become a part of the overall databasedefinition and
are invoked appropriatelywhen certain conditions are met. More powerful functionality is provided by activedatabasesystems,which provide activerulesthat can automatically initiate actionswhen certain eventsand conditions occur.

21

22

Chapter1 Databases and Database Users

1 .6 .1 0A d d i ti o n allm plications
of Using
the DatabaseApproach
This sectiondiscusses
sonreadditional implicationsof using the databaseapproach
that can benefit most orsanizations.
Potential for Enforcing Standards. The databaseapproachpermits the DBA to
define and enforce standirrdsamong databasensersin a large organization.This
facilitatescomrnunicationancl cooperationanrong vtrriousdepartnrents.projects.
and userswithin the organization.Star.rdards
car.rbe definedfor namesand forrnats
of data elements,display formats, report structures,terminology,and so on. The
DBA can enforcestandardsin a centralizeddatabaseenvironment more easilythan
in an environrrent whereeachusergroup has control of its own filesand softlvare.
Reduced Application Development Time. A prime sellingfeatureof the databaseapproachis that developinga ne'rvapl.rlication-suchas the retrievalof certain
fbr printing .1new report-takes very little time. Designing
data from the clatabase
and implementinga nervdatabasefrom scratchmay take more time than writing a
Horvever,once a databaseis up and running, subsinglespecializedflle applicatior-r.
stantially less tinre is generallyrecluiredto createnerv appiicationsusing DBNIS
facilities.Developrnenttime using a DtsMS is estirrratedto be one-sixth to onefourth ofthat for a traditional file systenl.
Flexibility. It may be necessaryto changetire structure of a databaseas requirements change.For example,a nelv user group may emergethat needsinformation
not currently in the databirse.In response,it rr.raybe nec.'ssaryto add a file to the
file. lvlodern DBNISsallow
databaseor to extend the data elenrentsin ar.rexistir-rg
c e rta i n ty p e s o f e v o l u t i onary changesto the structure of the databasew i tl -rout
affectingthe storeddata ar-rdthe existingapplicationprograms.
Availability of Up-to-Date Information. A DBMS makesthe databaseavailableto
all users.As soon as one user'supdateis appliedto the database,all other userscan
immediatelysee'this update.This availabilityof up-to-dateinformtrtion is esserrtial
for many trans;rction-i.rrocessing
applications,such as reservationsvstemsor banking clatabases,
and it is made possibleby the concurrencycontrol and recoverysubsystenlsof a DBMS.
of data and
Economies of Scale. l'he DBNIS approach perrnits consolidatior.r
applications,thus reducing the amount of rvastefuloverlap between activitiesof
personnelin differentprojectsor departmentsas well as redundandata-processing
c i e s a mo n g a p p l i c a ti o n s.Thi s enabl esthe w hol e orgrrni zati onto i nvest i n ntore
porverful processors,storagedevices,or comnrunication gear.rather than having
eachdepartment purchaseits own (weaker)equiprnent.'fhis reducesoverall costs
of operationand management.

Applications
1.7 A BriefHistoryof Database

1.7 A Brief Historyof DatabaseApplications


'.rro.rcI1

)ll.\ to
'. . Th is
. ()icct 5,
) nlli.tts

: r .'l'he
r thart
l \\ . lrc.
: .l. t t a - ( rta I l l
, . ' rt i.,'D
n.t
':"

: t ln g a
t . \ ub -

.) l t \ l s
r ( )lle -

rltl I l'!tr.rt ioll


i(t the
.rllou'
l ti r()ut

rlrleto

We now give a brief historical overview of the applicationsthat use DBMSs and how
theseapplications provided the impetus for Irew types of databasesystems'

1.7.1EarlyDatabaseApplicationsUsing
and NetworkSYstems
Hierarchical
Many early databaseapplicationsmaintainedrecordsin largeorgarlzationssuch as
corporations,universities,hospitals,and banks.In many of theseapplications,there
*.." lutg" numbers of recordsof similar structure. For example,in a university
applicatiln, similar information would be kept fbr eachstudent,eachcourse,each
giud. t".otd, and so on. There rverealsomany typesof recordsand many interrelati o n s h i p sa mo n g th e n r
One of the main problemswith earlydatabasesystemswas the interrnixingof conceptualrelationshipswith the physicalstorageirnd placemeutof recordson disk.
For example,the giade recordsof a particular student could be physicallystored
next to the stqdentrecord.Although this providedvery efficientaccessfor the original queriesand transactionsthat the databaservasdesignedto handle,it did not
and transprovide enough flexibilitv to accessrecordsefficientl,vlvhen new qr,reries
storage
different
a
required
that
queries
new
particular,
In
actions were identified.
it
efficiently.
implen'rent
to
were
quite
ditficult
organization fbr eftlcie.ntprocessing
the
to
were
made
rvhen
changes
the
database
r,vaialso laborior.rsto reorganize
requirementsof the application.
Another shortcoming of eariv systemswas that they provided only programming
languageinterthces.Tl-risntade it tirne-consuming and expensiveto iuplenrent nert'
qn".lai and transactiotrs,sitrce lle\v frrogranls had to be written' tested, and
debugged.Most of thesedatabasesystemswere implemented on large and expensive
mainiiame computersstartingin the mid-1960sand continuing through the 1970s
and 1980s.The main types of early systemswere basedon three main paradigms:
hierarchicalsystems,network rlodel basedsystems,atld inverted file systems.

'f\ Cilll

.cntial
i..tnkr. r t b -

. r. rn cl
iir's of
rtr-lanlt) o fe
t. lVllt g

I co s ts

1.7.2 ProvidingApplicationFlexibility
with RelationalDatabases
Relationaldatabaseswere originirllv proposed to sefraratethe phvsicalstorageof
ar.rdto provide a mathematicalfbundation
data from its conceptualrepreser-rtation
alsointroducedhigh-levelquerl'lanfor contentstorage.The relationaldata rr-rodel
guagesthat provided an alternativeto prograrnrninglanguageinterfhces;hence,it
of data somewhat
ivas'a lot quicker to write nerv queries.Relartionalrepreseutatior.r
were initially
systerns
Relational
resemblesiheexamplc.we presentedin Figiire L2.
provide flexto
meant
were
but
as earliersystens,
targetedto the sameapplicatior-rs
as requiredatabase
the
ibil"ityto develop new queriesquickly and to reorganize
ments changed.

Chaoter 1 Databases and Database Users

Early experimental relational systemsdevelopedin the late 1970sand the commercial re\ationa\ databasemanagement systems (RDBMS) introduced in the ear\y
\980s wete qurte s\ow, since they d,id,not use physica\ stotaBe porn\ers ot tecotd
placementto accessrelated data records.With the deve\opmentof new storageand
indexing techniquesand better query processingand optimization, their performance improved. Eventually,reiational databasesbecame the dominant type of
databasesystemfor traditional databaseapplications.Relationaldatabasesnow exist
computers to large servers.
from small persor-ral
on almost all types of compr"rters,

Applicationsand the Need


1.7.3Obiect-Oriented
for MoreComplexDatabases
The emergenceof object-orientedprogramming languagesin the 1980sand the
n e e d to s to re a n d s h a re compl ex-structuredobj ectsl ed to the devel opmentof
object-orienteddatabases(OODB). Initially, OODB were considereda competitor
sincethey provided more generaldata structures.They also
to relationaldatabases,
incorporatedmany of the useful object-orientedparadigms,such as abstractdata
types,encapsulationof operations,inheritance,and object identity. However,the
complexityof the model and the lack of ar.rearlystandardcontributedto their limited use.They are now mainly used ir-rspecializedapplications,such as engineering
design,multimedia publishing, and manufacturing systems.Despiteexpectatious
that they will make a big impact, their overallpenetrationinto the databaseproducts market remains under 5olotoday.

Dataon the Webfor E-Commerce


1.7.4Interchanging
The World Wide Web provides a large netrvork of interconnectedcomputers. Users
can createdocumentsusing a Web pubiishinglanguage,such as HyperTextMarkup
Language(HTML), and store thesedocurnentson Web serverswhere other users
(clients)can accessthern.Documentscan be linked through hyperlinks, which are
p o i n te rs to o th e r d o c u ments.In the 1990s,el ectroni ccommerce(e-commerce)
emergedasa major applicationon the Web.It quickly becameapparentthat partsof
the information on e-commerceWeb pageswere often dynamically extracted data
from DBMSs.A variety of techniqueswere developedto allow the interchangeof
data on the Web.Currently,eXtendedMarkup Language(XML) is consideredto be
and
the primary standardfor interchangingdata among various typesof databases
Web pages.XML combinesconceptsfrom the models used in document systems
with databasemodeling concepts.Chirpter27 is devotedto the discussionof XML.

1.7.5ExtendingDatabaseCapabilitiesfor New Applications


The successof databasesystemsin traditional applicationsencourageddevelopers
of other types of applicationsto attempt to use them. Such applicationstraditionally usedtheir own specializedfile and data structures.The following are examples
o f th e s ea p p l i c a ti o n s :

1.7 A BriefHistoryof Database


Applications

II n nt cr -

)c r 'a rl V
rr-cord
l g c and
l'crtorr r pg ef
) \ \ 'e \ lst
\ crs.

m Scientific applications that store large amounts of data resulting from scientific experimentsin areassuch as high-energy physicsor the mapping of the
human genome.
$i ${s1ngsand retrieval of images, from scanned news or personal photographs,to satellitephotograph imagesand imagesfrom medicalprocedures
such asx-raysor MRI (magneticresonanceimaging).
ss Storageand retrieval of videos such as movies, or video clips from news or
personaldigital carneras.
i*t Data mining applications that analyzelarge amounts of data searchingfor
the occurrencesof specificpatternsor relationships.

nt l t h e
t f n t of
i rct l t or
c V a l so
. t data
c r, t h e
' i r l i n rr,','rin
q
'-'"'b

l. l t i o n s
prod-

w Spatial applicationsthat store spatialIocationsof data such asweather information or maps usedin geographicalinformation systems.
ts Time series applications that store information such as economic data at
regular points in time (for example,daily salesor monthly gross national
product figures).
It was quickly apparent that basic relational systemswere not very suitablefor many
of theseapplications,usuallyfor one or more of the following reasons:
*s More complex data structureswere neededfor modeling the application
than the simple relationalrepresentation.
e New data typeswere neededin addition to the basicnumeric and character
string types.
ffi New operations and query languageconstructs were necessaryto manipulate the new data types.

. L'sers
hrkup
r Llscrs
ich a re
ricrce)
' . rr t so f
rl clata
rr(rrr

rrf

.l ro be
C :A n d
'

\t C lllS

\\I L .

iopc'15
li t i o n rr r l)1 , . \

& New storageand indexing structureslvere needed.


This led DBMS developersto add functionality to their systems.Some functionality
was generalpurpose, such as incorporating conceptsfrom object-orienteddatabasesinto relational systems.Other functionality was specialpurpose, in the form
of optional modulesthat could be usedfor specificapplications.For example,users
could buy a time seriesmodule to use with their relational DBMS for their time
seriesapplication.
Today,most large organizations use a variety of software application packagesthat
work closely with database back-ends. The databaseback-end representsone or
more databases,possibly from different vendors and different data models that
maintain the data that is manipulated by these packagesfor supporting transactions, generatingreports, and answeringad-hoc queries.One of the most commonly used systems includes Enterprise Resource Planning (ERP) used to
consolidatea variety of functional areaswithin an organization, including production, sales,distribution, marketing,finance,human resources,and so on. Another
popular type of system is Customer Relationship Management (CRM) software
that spansorder processingand marketing and customer support functions. These
applications are Web-enabledin that internal and external usersare given a variety
of Web-portal interfacesto interact with the back-end databases.

Chapter 1 Databasesand DatabaseUsers

1.7.6DatabasesversusInformationRetrieval
applies to structured and formatted data that
Traditionally,databasetechr-rology
business,and industry.Databasetechgovernment,
in
applications
in
routine
arises
banking' insurance,finance,and
retail,
in
manufacturing,
used
is
heavily
lology
as forms such as invoicesor
originates
data
structured
where
industries
healthcare
deveiopmentof a field
a
concurrent
l-ras
been
There
documents.
patient registration
and various
manuscripts,
(IR)
u'itl-r
books,
deals
that
retrieval
called information
usingkeyannotated
and
cataloged,
is
indexed,
Data
rrrticles.
forms of library-based
and
key'words,
these
on
based
n-rirterial
fbr
searching
with
words. IR is concerned
protext
free-fbrm
and
processing
rvith
document
dealing
problems
with the many
cessing.There has been a considerableamollnt of rvork done on searchingfor text
autobasedon keywords,finding documentsand ranking them basedon relevance,
the
advent
With
on'
so
trnd
topics,
text
b,v
of
ciassification
matic text categorization,
of the Web and the proliferation of HTML pagesrunning into billions, there is a
data on the Web. Data on
need to apply mar-ryof the IR techniquesto processir.rg
are.activeand change
that
objects
and
text,
images,
Web pagestypically contains
that requires
problem
new
Web
is
a
on
the
infbrmation
of
dynamiially. Retrieval
combinations.
novel
of
variety
in
a
be
applied
IR
to
and
techniquesfrom databases

1.8 When Not to Usea DBMS


In spite of the advantagesof using a DBMS, there are some situationsin which a
DBMS may involveun,i...rrury overheadcoststhat would not be incurred in traditional file processing.The overheadcostsof using a DBMS are due to the following:
; f{igl initial investmentin hardware,software,and training
,,, The generalitythat a DBMS providesfor defining and processingdata
,,' Overhead for providing security, concurrency control. recovery, and
integrity functions
Additional problems may ariseif the databasedesignersand DBA do not properly
clesignthe iatabase or if the databasesystemsapplicationsare not implemented
proi..ly. Hence, it may be more desirableto use reguiar files under the following
circumstances:
,,, Simpie,well-defineddatabaseapplicatior-rs
that are not expectedto change
for some programsthat may not be met
S tri n g e n t,re a l -ti l re requi retnents
becauseof DBMS overhead
to dat.r
N o mu l ti P l e -u s ear c c ess
C e rta i n i n d u s tri e s a n d a p pl i cati onshave el ected not to use general -purP ose
designtools (CAD) usedby mechaniDBMSs.For example,.ut.t1'.ot.t-tputer-aided
cal and civil engineershave proprietary file and data managementsoftwarethat is
of drawingsand 3D objects'Similarly,comgearedfor the iirternal n.rar-ripulations
i-runication and switching systemsdesignedby companieslike AT&T rvereearly
rnanifestationsof databasesoftt"a.e that rvasmade to run very fast with hierarchi-

1.9 Sum m ar y

.lt.l that
i .c t cch ) .c, a nd
,, iccsor
't '. rf lel d
r .rriou s
: : ig ker': . i: , . t tl d
.\ r P ro:( )f t eY t
. f. .l t l to : . t t l vent
l i' : C l S a

- ) .rt . ron
:(q u ires
'_.1t
r()ns.

, ' hich a
:t trtcli.1rrr'ir-tg:

: ' ,. al td
'!-()perlY
:ncn t ed
Iitrrvir.rg

cally organizeddata for quick alccess


and routing of calls.Similarly,GIS implementationsoften implement their own data organizationschemesfor efficientlyimplementing functions relatedto processingmaps, physicalcontours,lines, polygons,
and so on. General-purposeDBMSs are inadequatefor their purpose.

1.9 Summary
In this chapter we defined a databaseas a collection of related data,where data
meansrecordedfacts.A typical databaserepresentssome aspectof the real world
and is usedfor specificpurposesby one or rnore groups of users.A DBMS is a generalizedsoftwarepackagefor implernentingand maintaining a computerizeddatabase.The databaseand softrvaretogether forn a databasesystem.We identified
severalcharacteristics
that distinguishthe databaseapproachfrom traditional fileprocessingapplications,and rvediscussedthe main categoriesof databaseusers,or
Lheactorson the scene.We noted that in addition to dntabaseusers,there are several
categoriesof support personnel,or workersbehird the scene,in a databaseenvironment.
We presenteda list of capabilitiesthat should be provided by the DBMS softwareto
the DBA, databasedesigners,and usersto help them design,administer,and use a
database.Then we gavea brief historicalperspectiveon the evolution of database
with information
applications.We pointed out the marriage of databasetechr-rology
retrieval technology,which will play an important role due to the popularity of the
Web.Finally,we discussedthe overheadcostsof using a DBMS and discussedsome
to useone.
situationsin which it may not be advantaseous

ReviewOuestions
:. :, Define the following rerms: datn, dstabase,DBMS, databasesystenl,datebase
userview,DBA, end user,carurcdtransaccatalog,program-dataindependence,
tion, deductivedatabasesystem,persistentobject,nteta-data,and transactiottprocessing appl icnti ort.
, Jr, What four main typesof actionsinvolvedaterbases?
Briefly discusseach.

[rc nret

and hor'vit differs


l"$, Discussthe main characteristics
of the databaseaprproach
from traditional file systems.
1,'q,What are the responsibilities
of the DBA and the databasedesigners?

\ ri r P o se
.'. l'r.rIri ..t h . rt is
\ . a ()l nrc r'.trlY
-'r.irc h i-

'i ,5" What are the different typesof databaseend users?Discussthe main activities of each.
i.{: Discussthe capabilitiesthat should be provided by a DBMS.
; ;

Discussthe differencesbetween databasesystemsand information retrieval


systems.

You might also like