Professional Documents
Culture Documents
llr
ij,:l$S'ii*
t&isi:t&
1
chapter
Databasesand
Database Users
i:A,iA:lilftj:'
and DatabaseUsers
Chaoter1 Databases
1.1 lntroduction
Databasesand databasetechnology have a major impact on the growing use of
computers.It is fair to saythat databasesplay a critical role in almost all areaswhere
computersare used,including business,electroniccommerce,engineering,medicine, law, education, and library science.The word databaseis so commonly used
that we must begin by defining what a databaseis. Our initial definition is quite
generar.
A databaseis a collection of relateddata.l By data, we mean known factsthat can be
recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addressesof the people you know. You may have recordedthis
data in an indexed addressbook or you may have stored it on a hard drive, using a
personal computer and software such as Microsoft Accessor Excel.This collection
of relateddata with an implicit meaningis a database.
The preceding definition of databaseis quite general;for example,we may consider
the collection of rvordsthat make up this pageof text to be relateddata and henceto
constitute a database.However,the common use of the term databaseis usually
more restricted.A databasehas the following implicit properties:
w A databaserepresentssome aspect of the real world, sometimes called the
miniworld or tl-reuniverse of discourse (UoD). Changesto the miniworld
irre reflectedin the database.
w A databaseis a logically coherent collection of data with some inherent
meaning. A random assortment of data cannot correctly be referred to as a
database.
s A databaseis designed,built, and populatedwith data for a specificpurpose.
It has an intended group of usersand some preconceivedapplicationsin
which theseusersare interested.
1 . We will u se th e wo r d d a fa a s both srngul arand pl ural ,as i s contmoni n databasel i terature;contextw i l l
d e te r m r n ewh e th e rr t is s n g u la ror pl ural .In standardE ngl i sh,dal a i s used for pl ural ;datumi s used for stnq urar.
u s t s tart
rr dc-flnr( r v i d e a
i i trn 1. 3
. l. -l a nd
l n q r \ ' ith
,l : o f the
i.r t.rbase
il l
\ t L ldV
I S a nd
r u se of
I: \\'hc'fe
i. nlc-cii: l rr . u 5sd
:r Q L llte
rt i.tn be
: c \ . t c le-J c d t h is
. usinq a
' l l c c t i on
a\)IlSider
r c n c e to
' u \ L r a lly
ill.'tl the
inirr'orld
n hc r e n t
,l to irs a
.rurpo s e.
l trr) n s i n
Chaoter1 Databases
and DatabaseUsers
tions such as querying the databaseto retrieve specific data, updating the database
to reflectchangesin the miniworld, and generatingreports from the data. Sharing a
databaseallows multiple usersand programs to accessthe databasesimultaneously.
An application program accesses
the databaseby sending queries or requestsfor
the
DBMS.
A
queryr
data to
typically causessome data to be retrieved; a transaction may causesome data to be readand some data to be written into the database.
Other important functions provided by the DBMS include protectingthe database
and maintainizrgit over a long period of time. Protection includessystemprltection
against hardware or software malfunction (or crashes) and securityprofection
against unauthorized or malicious access.A typical large databasemay have a life
cycle of many years,so the DBMS must be able to maintain the databasesystemby
allowing the systemto evolveas requirements changeover time.
It is not necessary
to use general-purpose
DBMS softwareto implement a computerized database.We could write our own set of programs to createand maintain the
database,in effectcreatingour own special-purpose
DBMS software.In either casewhether we use a general-purposeDBMS or not-we usually have to deploy a considerableamount of complex software.In fact, most DBMSs are very complex
softwaresystems.
To cornplete our initial definitions, we will call the databaseand DBMS software
together a databasesystem. Figure L I illustratessome of the conceptswe have discussedso far.
1.2 An Example
Let us consider a simple example that rnost readers may be familiar with: a
UNIVERSITY
databasefor maintaining information concerningstudents,courses,
grades
in
and
a universityenvironment.Figure 1.2showsthe databasestructureand
some sample data for such a database.The databaseis organized as five files,eachof
which storesdata records of the same type.r The STUDENTfile storesdata on each
student,the COURSEfile storesdata on eachcourse,the SECTIONfile storesdata
on eachsectionof a course,the GRADE_REPORT
file storesthe gradesthat students
receivein the various sectionsthey have completed,and the PREREOUISITE
file
storesthe prerequisitesof eachcourse.
To defne this database,we must specifothe structure of the records of each file by
specifring the different types of data elements to be stored in eachrecord. In Figure
1 .2 , e a c h S T U D EN T re c o rd i ncl udes data to represent the student' s N ame,
Student_number,
Class(such as freshmanor'1', sophomoreor'2', and so forth), and
2 , T h e te r m q u e r y,o r ig in a llym e a n inga questi onor an nqu ry,rs l oosel yused for al l types of i nteracti ons
w th d a ta b a se s,ln clu d in gm o d fyin g the data,
3. We use the term ftle nlormally here. At a conceptua level,a ftle is a collectionof records that may or
may not be ordered.
l.rt.rbase
raring a
r c ou s l v.
. c : t s t or
ransac-lt.rtr.rse.
Users/Programmers
Application
Programs/Oueries
:,rt.rbase
. : ai tt( )ll
.' :,';t i L t ll
. r li f!
"c
. : cn r bv
Softwareto Process
Oueries/Programs
Software to Access
Stored Data
: . r : irt he
: a. liL 'I J CO n-
\ )illPlex
. ' Iirr'are
. r ic d is-
StoredDatabase
D efi ni ti on
(Meta-Data)
Figur e 1. 1
A s m plf ieddat abase
sy sTerenvr
n r onm enL
r ' it h : a
-rtL lI S !S r
'.rrc..1nd
.r . h of
\r ll C a C h
:l: d;rta
. :ud ents
) lE tl le
r tilc'b.v
. I.i,,,,.p
N a me,
t h . a nd
ST U D E N T
Student_number
Name
Class
Major
CS
CS
17
Smith
Brown
C OU R S E
Course name
c s 13 10
CS
DataStructures
cs3320
CS
DiscreteMathematics
MATH241
O
Database
css380
MATH
e
CS
SECTION
Section_identifier Course_number Semester
Instructor
85
MA TH 241
O
Fall
92
Fall
o4
o4
Anderson
102
c s 13 10
cs3320
Spring
05
Knuth
112
MATH241
O
Fall
05
Chang
119
c s 13 10
cs3380
Fall
05
Anderson
Fall
05
Stone
135
GRADE_REPORT
Student_number Section identifier
Grade
17
112
17
119
85
92
tJ
102
135
PR ER EQU ISIT E
Course_number
Figur e 1. 2
A database
thatstores
s t udentandc our s e
i nf or m at ion.
Year
cs3380
cs3380
cs3320
Prerequisite_number
cs3320
MATH241
O
csl 310
King
1.3 Characteristics
of the Database
Approach
1.3 Characteristics
of the DatabaseApproach
A nur-nberof characteristics
distinguishthe databaseapproachtron-rthe traciitional
a p p ro a c h o f p ro g ra m m i n g r,vi thfi l es. In tradi ti onal fi l e processi ng,each user
clefinesand implementsthe filesneeded[or a specificsoflwareapplicationaspart of
programming the application. For example,one user,the grade reportingoftice,may
keepa file on studentsar-rdtheir grades.Programsto print a student'stranscriptand
to entel'nervgradesinto the file are implementedas part of the applicatior.r.
A sectrnd user,the accoLutting
oJlice,mav keeptrack of students'f'eesand their payu.rr'nts.
Although both usersare interestedin data about students,eachuser maintainssepirratefiles-and programs to manipulate thesefiles-because each requiressome
't0
data not availablefrom the other user'sfiles. This redundancy in defining and storing data results in wasted storagespaceand in redundant efforts to maintain common up-to-datedata.
In the databaseapproach, a single repository of data is maintained that is defined
once and then accessedby various users.In file systerns,each application is free to
name data elementsindependently.In contrast,in a database,
the namesor labelsof
data are defined once, and used repeatedlyby queries,transactions,and applications. The main characteristics
of the databaseapproachversusthe file-processing
approach are the following:
Self-describing
nature of a databasesystem
rt Insulationbetweenprogramsand data,and data abstraction
;ir: Support of multiple views of the data
, Sharingof data and multiuser transactionprocessing
We describeeachof thesecharacteristics
in a seuaratesection.Wewill discussadditional characteristics
of datirbase
systerrsin Seciions1.6throush I.8.
1.3.1 Self-Describing
Natureof a DatabaseSystem
A fundamentalcharacteristicof the databaseapproachis that the databasesystem
containsnot only the databaseitselfbut alsoa completedefinition or descriptionof
the databasestructureand constraints.This definition is storedin the DBMS catalog, which containsinformation suchas the structureof eachfile, the type and storageformat of eachdata iten, and various constrrrintson the data.The information
stored in the catalogis calledmeta-data,and it describesthe structure of the prima ry d a ta b a s e(F i g u re1 .1 ).
The catalogis usedby the DBMS softwareand alsoby databaseuserswho need information about the databasestructure.A general-purposeDBMS softwarepackageis
not written fbr a specrficdatabaseapplication. Therefore,it must refer to the catalog
to know the structure of the files in a specificdatabase,such asthe tvpe and format of
data it will access.The DBMS sofiware must work equally well with any number of
databaseapplications-for example,a r-rniversitydatabase,a banking database,or a
company database-as long as thc'databasedeflnition is stored in the catalog.
In traditional file proce'ssing,
data definition is typicallypart of the applicationprograms themselves.Hence,theseprograms irre constrainedto work with only one
speciJ'ic
dotobase,whose structure is declared in the application programs. For
example,an applicationprosram written in C++ may havestruct or classdeclarations, and a COBOL program has data division statementsto define its files.
Wherezrsfile-processingsoftrvarecalnaccessonly specificdatabases,DBMS software
can accessdiversedatabasesby extracting the databasedefinitions from the catalog
and then using thesedefinitions.
For the example shown in Figure 1.2,the DBMS catalogwill store the definitions of
all the files shown. Figure 1.3shorvssornesampleentries in a databasecatalog.These
'I.3 Characteristics
of the Database
Aooroach
I storcomc t rn ed
iree to
r cls of
'r. l i r r-
11
clefinitionsarespecitiecl
by the database
designerprior to creatingthe actualdatabase
irnd arestoredin the catalog.\\4rener,era re(luestis made to access,
say,the Nameof a
STUDENTrecord,the DBMS softwarerefersto the catalogto determinethe structure
o f th e ST U D E N Tfi l e a n d t he posi ti on and si ze of the N amedata i tem w i thi n a
STUDENTrecord.By contrast,in a typical file-processing
application,the file structure and, in the extremecase,the exactlocation of Namewithin a STUDENTrecord
lre alreadycoded within eachprogram that accesses
this datariten-r.
c':SI I1$
a dd i-
In traditional file processing,the structureof data files is embeddedin the application programs,so an\/ changesto the structure of a file may require changingoll progr:turrsthat accessthat trle. By contrast, DBlvlS accessprogranrs do not require sr-rcl'r
in most cases.
The structureof data t'ilesis storedin the DBMS catalogsc'pachar.rges
rately from the irccessprograms.We call this property program-data independence.
Figur e 1. 3
An exam ple
of a
RELATIONS
,\'sIem
Relation_name
N o of col umns
it.ln of
S T U D EN T
i cataI stor:'lJtion
r. ' n r i -
C OU R SE
S EC T ION
r', OI &
n P rolt' one
:. For
.'clara; t-iles.
fin'are
.ltirloe
o n s of
'Ihese
l^+^h^-^
u4tdudJg
GR AD E R E POR T
PR ER EOU ISIT E
r nt b r i.rgc is
.italog
nratof
tlrcrof
l^+^!.^-^
uaLdua)q
C OL U MN S
Column_name
Data_type
Belongs_to_relation
Name
Character(30)
S TU D E N T
Student_number
Character(4)
S TU D E N T
Class
Integer(1)
S TU D E N T
Major
Major_type
S TU D E N T
Course name
C h a r a c t e r( l 0 )
C OU R S E
C o u rs en u m b e r
XXXXNNNN
C OU R S E
Prerequisite_num
ber
XXXXNNNN
P R E R E OU IS ITE
lar4
-^+-
^^
vv
t^.
rur
;^ tr;^,,.^
rrr19utg
+A^
u rc
1 .)
r,z,
12
and DatabaseUsers
Chapter1 Databases
For example,a file accessprogram may be written in such a way that it can access
only STUDENTrecords of the structure shown in Figure 1.4. If we want to add
another pieceof data to eachSTUDENTrecord,saythe Birth_date,
such a program will
no longer work and must be changed.By contrast,in a DBMS environment, we only
need to changethe descriptionof STUDENTrecordsin the catalog(Figure 1.3) to
reflect the inclusion of the new data item Birth-date;no programs are changed.The
next time a DBMS program refersto the catalog,the new structure of STUDENT
recordswill be accessedand used.
In some types of databasesystems,such as object-orientedand object-relational
systems(seeChapters20 through 22), userscan defineoperationson data as part of
the databasedefinitions. An operation (also calleda iunction or methorl)is specified
in two parts. The interface(or signature)of an operation includes the operation
name and the data types of its arguments(or parameters).The implementation(or
method) of the operation is specifiedseparatelyand can be changedwithout affecting the interface.User applicationprograrrs can operateon the data by invoking
theseoperationsthrough their namesand arguments,regardless
of how the operations are implemented. This may be termed program-operation independence.
The characteristicthat allows program-data independenceand program-operation
independenceis called data abstraction. A DBMS provides userswith a conceptual
representation of data that does not include many of the details of how the data is
stored or how the operations are implemented. Informally, a data model is a type of
data abstractionthirt is used to provide this conceptualrepresentation.The data
model useslogical concepts,such as objects,their properties,and their interrelationships, that may be easierfor most users to understand than computer storage
concepts.Hence, the data model hides storageand implementation detirilsthat are
not of interestto most databaseusers.
For example,reconsiderFigures1.2 and 1.3.The internal implementationof a file
may be defined by its record length-the number of characters(bytes) in each
record-and eachdata item may be specifiedby its starting byte within a record and
its length in bytes.The STUDENTrecord would thus be representedas shown in
Figure 1.4.But a typical databaseuser is not concernedwith the location of eachdata
item within a record or its length; rather,the user is concernedthat when a reference
is made to Nameof STUDENT,the correct value is returned.A conceptuarlrepresentation of the STUDENTrecordsis shown in Figure L2. Many other detailsof file storage
organization-such asthe accesspaths specifiedon a file-can be hidden from databaseusersby the DBMSi we discussstoragedetailsin Chapters 13 and 14.
Figu re 1.4
lnternalstorageformat
fo r a STUDENT
record,based on the
databasecatalogin
Fig ure 1 .3.
StartingPositionin Record
Name
Studentnumber
1
JI
Lengthin Characters(bytes)
30
4
Class
Major
1.3 Characteristics
of the Database
Approach
n access
r t o add
ram will
\\'COnly
r 1. 3 )to
:cd. The
rU DENT
' l .rt ional
. ltart of
rccif-ied
. cr a t io n
: l rorr( o r
t .rttectnvoking
_'()pera:nce.
.r'ration
rceptual
.' data is
r t vpeof
h e d ata
Itr'rrela:torage
Ihat are
oi a file
i n e ach
ord and
rorvn in
rch data
-'t-erence
resentastorage
nr data-
't 3
14
TRANSCRIPT
Student_name
S m i th
Brown
(a)
Student_transcript
Course_number Grade
Semester
cs1310
Fall
Year
Section_id
05
119
M AT H 241
O
Fall
05
112
M AT H 241
O
Fall
o4
85
csl 310
cs3320
Fall
04
92
S pri ng
05
102
cs33B0
Fall
05
135
COURSE_PREREOUISITES
Course_name
(b)
Course_number Prerequisites
Database
cs3380
DataStructures
cs3320
cs3320
MA TH 241
O
c s 13 10
Figur e 1. 5
T wov iewsder iv e idro mth e d a ta b a sien F i g u re1 .2 (. a)TheTR A N S C R IPviTew
(b)T heCO URS E -P R E R E ISIT
view.
OU E S
1 . 4 A c t o r s o n t he Sce n e
1.4.1 DatabaseAdministrators
----: lon
-l
lq
12
lro
I
I
rcl
)t
'-1
)2
,"1
1.4.2 DatabaseDesigners
c\ecute
nlaYbe
l.rtabase
t i o n s in
lS fiom
teatures
i people
io n 1 .1,
t he r ei s
d esign ,
' t lon we
: rr'ecall
c called
nr ent'i-
Databasedesigners are responsiblefor identifring the data to be stored in the databaseand for choosingappropriatestructuresto representand storethis data.These
tasksare mostly undertakenbeforethe databaseis actuallyimplementedand poputo communicatewith
lated with data. It is the responsibilityof databasedesigr-rers
all prospectivedatabaseusersin order to understandtheir requirementsand to create a designthat meetstheserequirements.In many cases,the designersare on the
sttrff of the DBA and may be assignedother staff responsibilitiesafter the database
clesignis completed.Databasedesignerstypicallyinteractwith eachpotentialgroup
of users and develop views of the databasethat meet the data and processing
requirements of these groups. Each view is then analyzed and integratedwith the
viewsof other usergroups.The final databasedesignmust be capableof supporting
the requirementsof all usergroups.
1 .4 .3E n d U se rs
End users are the peopie whose jobs require accessto the databasefor queryir-rg,
There
updating,and generatingreports;the databaseprimarily existsfor their r"rse.
categories
of
end
users:
are several
'
15
16
Clerksat receivingstationsfor shipping companiesenter packageidentifications via bar codesand descriptiveinformation through buttons to
updatea centraldatabaseof receivedand in-transit packages'
1.6 Advantages
of Usingthe DBMS Approach
L-nti'ns to
; , a nd
i t he
: r p lex
P ro ra sed
LlSers
have
rr
n le "l ' _
u Ll se
: cr t o
.rofi-
' a nd
meet
ns as
r sa cevellities
s ocl'ttent
e call
t t he
:om:ludl a ge ,
lcurrt-ace
s tbr
R ed undancy
1 .6 .1G on trol l i ng
In traditional software developrnent utilizing file processing,every user group
applications.For example,
nraintainsits own files for handling its data-processing
databaseexampleof Section1.2;here,two groupsof users
considerthe UNIVERSITY
might be the courseregistrationpersonneland the accountingoffice.In the traditior-ralapproach,eachgroup independentlykeepsfiles on students.The accounting
and relatedbilling information, whereasthe regisofficekeepsdata on registratior.r
tration office keepstrack of studentcoursesand grades.Other groups may further
cluplicatesome or all of the samedata in their own files.
l'his redundanry in storing the samedata rnultipletimes leadsto severalproblems.
First,there is the need to perform a singlelogicalupdate-such as enteringdata on
a new student-n-rultiple times: once fbr each file where student data is recorded.
This leadsto duplicotiorrof eJfort.Second,stlrage spaceis wastedwhen the samedata
Third, files
and this problen-rmay be seriousfor largedatabases.
is storedrepeatedly,
This may happenbecausean
that representthe samedata n-raybecomeinconsistent.
Lrpdateis appliedto some of the filesbut not to others.Evenif an update-such as
adding a new student-is applied to all the appropriatefiles,the data concerning
becausethe updatesare appliedindependently
the student may stili be incorrsisterrf
by each user group. For example,one user group may enter a student'sbirthdate
r-rroneouslyas'lAN-19-1988',whereasthe other Llsergroups may enter the correct
v a l u eo f ' J A N -2 9 -I9 8 8 :
17
18
In the databaseapproach, the views of different user groups are integrated during
databasedesign. Ideally, we should have a databasedesign that stores each logical
data iterr-such as a student'sname or birthdate-in only oneplacein the database.
This ensuresconsistencyand savesstoragespace.However, in practice, it is sometimes necessaryto use controlled redundancy to improve the performance
of queries.For example,we may storeStudent-name
and Course-number
redundantly
i n a GR A D E _ R E POR Tfi l e (Fi gure 1.6(a)) becausew henever w e retri eve a
GRADE_REPORT
record,we want to retrievethe studentname and coursenumber
along with the grade,student number, and section identifier. By placing all the data
together,we do not have to searchmultiple frlesto collectthis data.In such cases,
the DBMS should havethe capabilityro controlthis redundancyin order to prohibit
inconsistencies
among the files.This may be done by automaticallycheckingthat
the Student_name-student_number
valuesin any GRADE_REPORT
record in Figure
1.6(a)match one of the Name-Student_number
valuesof a STUDENTrecord (Figure
1.2).Sirnilarly,the Section-identifier-Course-number
valuesin GRADE_REPORT
can
be checkedagainstSECTIONrecords.Such checkscan be specifiedto the DBMS
during databasedesign and autorratically enforced by the DBMS whenever the
GRADE_REPORT
flle is updated.Figure1.6(b)showsa GRADE_REPORT
recordthat
is inconsistentrvith the STUDENTfile of Figure 1.2, rvhich may be entered erroneouslyif the redundar.rcy
is trotcorffrLtlled.
1.6.2 RestrictingUnauthorizedAccess
When multiple userssharea large dirtabase,
it is likely that most userswill not be
autirorizedto accessall inforrnation in the database.For example,financialdata is
F igur e 1. 6
storageof Student_name
andCourse_name
in GRADE_REPORT
Redundant
(a) Consistent
record.
data (b) Inconsrstent
GRADE REPORT
Student_number Student name
'17
S mth
(a)
Grade
112
MATH241
O
17
Smrth
119
cs1310
Brown
B5
MA TH 241
O
Brown
92
Brown
102
Brown
135
cs1 310
cs3320
cs3380
B
A
GRADE REPORT
Student_number Student_name Section_identifier
Course_number Grade
(b)
17
Brown
112
MATH241
O
1.6 A dvantages
of U si ngthe D B MSA p pr oacn
d u r ing
logical
rabase.
somentance
dantly
le ve a
u r n ber
lc data
aJSe s,
'o hibi t
rrr
'5 t"h'-'e t
F
iurr
' 't"
'- rp
Figure
i T can
)ts\{s
cr t he
-d that
erro -
r o t be
la t a is
otten consideredconfider-rtial,
and oniy authorized personsare allowed to access
sr.rch
data.In addition, solre usersmav only be pe1p111.6to retrievedata,whereas
others are allowed to retrieveand upclate.Hence,the type of accessoperationre-trievalor updirte-must also be controlled. Typically,usersor user groups are
givenaccountnrilirbersprotectedbv passrvords,
which tl'reycan useto gtrinaccessto
the database.A DBMS should provide a security and authorization subsystem,
rr.hichthe DBA usesto createaccountsand to specifuaccountrestrictions.Then, the
t)BMS should enfbrce rheserestrictionsautomaticallv.Notice that we can apply
sirr-rilar
controls to the DBMS software.For example,only the DBA's staff may be
.rllolvedto use certain privileged software, such as the software for creirting ner.v
accounts.Similarly,parametric usersmay be allowed to accessthe databaseonly
through the cannedtransactionsdevelopedfor their use.
19
20
copied from disk to memory. Therefore, the DBMS often has a buffering module
that maintains parts of the databasein main memory buffers. In other cases,the
DBMS may use the operating systemto do the buffering of disk data.
The query processing and optimization module of the DBMS is responsible for
choosing an efficient query executionplan for eachquery basedon the existing storage structures.The choice of which indexesto createand maintain is part of physical databasedesignand tuning, which is one of the responsibilitiesof the DBA staff.
We discussthe query processing,optimization, and tuning in detail in Chapters 15
and 16.
1.6.7 Representing
ComplexRelationships
among Data
A databasemay include numerous varietiesof data that are interrelatedin many
ways. Consider the example shown in Figure 1.2. The record for'Brown'in the
STUDENTfile is relatedto four recordsin the GRADE REPORTfile. Similarly,each
sectionrecord is relatedto one courserecord and to a number of GRADE_REPORT
records-one for eachstudentwho comoletedthat section.A DBMS must havethe
capability to representa variety of complex relationshipsamong the data, to define
new relationships as they arise, and to retrieve and update related data easily and
efficiently.
1.6.8 EnforcinglntegrityConstraints
Most databaseapplications have certain integrity constraints that must hold for
the data.A DBMS should provide capabilitiesfor defining and enforcing thesecon-
1.6 Advantages
of Usingthe DBMS Approach
j module
:rses,the
r'iblefor
: ing storpltysi'ti
tll.\ staff.
r rr t rrs
l5
lailu re s.
r crr'.For
I r Jn s ac: . r b asei s
hc point
. a data: er v l an' l i c i i t i on
-i-driven
nrs-style
cal user
.pecifrtrr \\reb-
; n m anv
r ' in t he
rlr',each
I EPO R T
havethe
o define
. ilv an d
rold fbr
a5CCO n-
21
22
1 .6 .1 0A d d i ti o n allm plications
of Using
the DatabaseApproach
This sectiondiscusses
sonreadditional implicationsof using the databaseapproach
that can benefit most orsanizations.
Potential for Enforcing Standards. The databaseapproachpermits the DBA to
define and enforce standirrdsamong databasensersin a large organization.This
facilitatescomrnunicationancl cooperationanrong vtrriousdepartnrents.projects.
and userswithin the organization.Star.rdards
car.rbe definedfor namesand forrnats
of data elements,display formats, report structures,terminology,and so on. The
DBA can enforcestandardsin a centralizeddatabaseenvironment more easilythan
in an environrrent whereeachusergroup has control of its own filesand softlvare.
Reduced Application Development Time. A prime sellingfeatureof the databaseapproachis that developinga ne'rvapl.rlication-suchas the retrievalof certain
fbr printing .1new report-takes very little time. Designing
data from the clatabase
and implementinga nervdatabasefrom scratchmay take more time than writing a
Horvever,once a databaseis up and running, subsinglespecializedflle applicatior-r.
stantially less tinre is generallyrecluiredto createnerv appiicationsusing DBNIS
facilities.Developrnenttime using a DtsMS is estirrratedto be one-sixth to onefourth ofthat for a traditional file systenl.
Flexibility. It may be necessaryto changetire structure of a databaseas requirements change.For example,a nelv user group may emergethat needsinformation
not currently in the databirse.In response,it rr.raybe nec.'ssaryto add a file to the
file. lvlodern DBNISsallow
databaseor to extend the data elenrentsin ar.rexistir-rg
c e rta i n ty p e s o f e v o l u t i onary changesto the structure of the databasew i tl -rout
affectingthe storeddata ar-rdthe existingapplicationprograms.
Availability of Up-to-Date Information. A DBMS makesthe databaseavailableto
all users.As soon as one user'supdateis appliedto the database,all other userscan
immediatelysee'this update.This availabilityof up-to-dateinformtrtion is esserrtial
for many trans;rction-i.rrocessing
applications,such as reservationsvstemsor banking clatabases,
and it is made possibleby the concurrencycontrol and recoverysubsystenlsof a DBMS.
of data and
Economies of Scale. l'he DBNIS approach perrnits consolidatior.r
applications,thus reducing the amount of rvastefuloverlap between activitiesof
personnelin differentprojectsor departmentsas well as redundandata-processing
c i e s a mo n g a p p l i c a ti o n s.Thi s enabl esthe w hol e orgrrni zati onto i nvest i n ntore
porverful processors,storagedevices,or comnrunication gear.rather than having
eachdepartment purchaseits own (weaker)equiprnent.'fhis reducesoverall costs
of operationand management.
Applications
1.7 A BriefHistoryof Database
)ll.\ to
'. . Th is
. ()icct 5,
) nlli.tts
: r .'l'he
r thart
l \\ . lrc.
: .l. t t a - ( rta I l l
, . ' rt i.,'D
n.t
':"
: t ln g a
t . \ ub -
.) l t \ l s
r ( )lle -
rlrleto
We now give a brief historical overview of the applicationsthat use DBMSs and how
theseapplications provided the impetus for Irew types of databasesystems'
1.7.1EarlyDatabaseApplicationsUsing
and NetworkSYstems
Hierarchical
Many early databaseapplicationsmaintainedrecordsin largeorgarlzationssuch as
corporations,universities,hospitals,and banks.In many of theseapplications,there
*.." lutg" numbers of recordsof similar structure. For example,in a university
applicatiln, similar information would be kept fbr eachstudent,eachcourse,each
giud. t".otd, and so on. There rverealsomany typesof recordsand many interrelati o n s h i p sa mo n g th e n r
One of the main problemswith earlydatabasesystemswas the interrnixingof conceptualrelationshipswith the physicalstorageirnd placemeutof recordson disk.
For example,the giade recordsof a particular student could be physicallystored
next to the stqdentrecord.Although this providedvery efficientaccessfor the original queriesand transactionsthat the databaservasdesignedto handle,it did not
and transprovide enough flexibilitv to accessrecordsefficientl,vlvhen new qr,reries
storage
different
a
required
that
queries
new
particular,
In
actions were identified.
it
efficiently.
implen'rent
to
were
quite
ditficult
organization fbr eftlcie.ntprocessing
the
to
were
made
rvhen
changes
the
database
r,vaialso laborior.rsto reorganize
requirementsof the application.
Another shortcoming of eariv systemswas that they provided only programming
languageinterthces.Tl-risntade it tirne-consuming and expensiveto iuplenrent nert'
qn".lai and transactiotrs,sitrce lle\v frrogranls had to be written' tested, and
debugged.Most of thesedatabasesystemswere implemented on large and expensive
mainiiame computersstartingin the mid-1960sand continuing through the 1970s
and 1980s.The main types of early systemswere basedon three main paradigms:
hierarchicalsystems,network rlodel basedsystems,atld inverted file systems.
'f\ Cilll
.cntial
i..tnkr. r t b -
. r. rn cl
iir's of
rtr-lanlt) o fe
t. lVllt g
I co s ts
1.7.2 ProvidingApplicationFlexibility
with RelationalDatabases
Relationaldatabaseswere originirllv proposed to sefraratethe phvsicalstorageof
ar.rdto provide a mathematicalfbundation
data from its conceptualrepreser-rtation
alsointroducedhigh-levelquerl'lanfor contentstorage.The relationaldata rr-rodel
guagesthat provided an alternativeto prograrnrninglanguageinterfhces;hence,it
of data somewhat
ivas'a lot quicker to write nerv queries.Relartionalrepreseutatior.r
were initially
systerns
Relational
resemblesiheexamplc.we presentedin Figiire L2.
provide flexto
meant
were
but
as earliersystens,
targetedto the sameapplicatior-rs
as requiredatabase
the
ibil"ityto develop new queriesquickly and to reorganize
ments changed.
Early experimental relational systemsdevelopedin the late 1970sand the commercial re\ationa\ databasemanagement systems (RDBMS) introduced in the ear\y
\980s wete qurte s\ow, since they d,id,not use physica\ stotaBe porn\ers ot tecotd
placementto accessrelated data records.With the deve\opmentof new storageand
indexing techniquesand better query processingand optimization, their performance improved. Eventually,reiational databasesbecame the dominant type of
databasesystemfor traditional databaseapplications.Relationaldatabasesnow exist
computers to large servers.
from small persor-ral
on almost all types of compr"rters,
II n nt cr -
)c r 'a rl V
rr-cord
l g c and
l'crtorr r pg ef
) \ \ 'e \ lst
\ crs.
m Scientific applications that store large amounts of data resulting from scientific experimentsin areassuch as high-energy physicsor the mapping of the
human genome.
$i ${s1ngsand retrieval of images, from scanned news or personal photographs,to satellitephotograph imagesand imagesfrom medicalprocedures
such asx-raysor MRI (magneticresonanceimaging).
ss Storageand retrieval of videos such as movies, or video clips from news or
personaldigital carneras.
i*t Data mining applications that analyzelarge amounts of data searchingfor
the occurrencesof specificpatternsor relationships.
nt l t h e
t f n t of
i rct l t or
c V a l so
. t data
c r, t h e
' i r l i n rr,','rin
q
'-'"'b
l. l t i o n s
prod-
w Spatial applicationsthat store spatialIocationsof data such asweather information or maps usedin geographicalinformation systems.
ts Time series applications that store information such as economic data at
regular points in time (for example,daily salesor monthly gross national
product figures).
It was quickly apparent that basic relational systemswere not very suitablefor many
of theseapplications,usuallyfor one or more of the following reasons:
*s More complex data structureswere neededfor modeling the application
than the simple relationalrepresentation.
e New data typeswere neededin addition to the basicnumeric and character
string types.
ffi New operations and query languageconstructs were necessaryto manipulate the new data types.
. L'sers
hrkup
r Llscrs
ich a re
ricrce)
' . rr t so f
rl clata
rr(rrr
rrf
.l ro be
C :A n d
'
\t C lllS
\\I L .
iopc'15
li t i o n rr r l)1 , . \
1.7.6DatabasesversusInformationRetrieval
applies to structured and formatted data that
Traditionally,databasetechr-rology
business,and industry.Databasetechgovernment,
in
applications
in
routine
arises
banking' insurance,finance,and
retail,
in
manufacturing,
used
is
heavily
lology
as forms such as invoicesor
originates
data
structured
where
industries
healthcare
deveiopmentof a field
a
concurrent
l-ras
been
There
documents.
patient registration
and various
manuscripts,
(IR)
u'itl-r
books,
deals
that
retrieval
called information
usingkeyannotated
and
cataloged,
is
indexed,
Data
rrrticles.
forms of library-based
and
key'words,
these
on
based
n-rirterial
fbr
searching
with
words. IR is concerned
protext
free-fbrm
and
processing
rvith
document
dealing
problems
with the many
cessing.There has been a considerableamollnt of rvork done on searchingfor text
autobasedon keywords,finding documentsand ranking them basedon relevance,
the
advent
With
on'
so
trnd
topics,
text
b,v
of
ciassification
matic text categorization,
of the Web and the proliferation of HTML pagesrunning into billions, there is a
data on the Web. Data on
need to apply mar-ryof the IR techniquesto processir.rg
are.activeand change
that
objects
and
text,
images,
Web pagestypically contains
that requires
problem
new
Web
is
a
on
the
infbrmation
of
dynamiially. Retrieval
combinations.
novel
of
variety
in
a
be
applied
IR
to
and
techniquesfrom databases
1.9 Sum m ar y
.lt.l that
i .c t cch ) .c, a nd
,, iccsor
't '. rf lel d
r .rriou s
: : ig ker': . i: , . t tl d
.\ r P ro:( )f t eY t
. f. .l t l to : . t t l vent
l i' : C l S a
- ) .rt . ron
:(q u ires
'_.1t
r()ns.
, ' hich a
:t trtcli.1rrr'ir-tg:
: ' ,. al td
'!-()perlY
:ncn t ed
Iitrrvir.rg
1.9 Summary
In this chapter we defined a databaseas a collection of related data,where data
meansrecordedfacts.A typical databaserepresentssome aspectof the real world
and is usedfor specificpurposesby one or rnore groups of users.A DBMS is a generalizedsoftwarepackagefor implernentingand maintaining a computerizeddatabase.The databaseand softrvaretogether forn a databasesystem.We identified
severalcharacteristics
that distinguishthe databaseapproachfrom traditional fileprocessingapplications,and rvediscussedthe main categoriesof databaseusers,or
Lheactorson the scene.We noted that in addition to dntabaseusers,there are several
categoriesof support personnel,or workersbehird the scene,in a databaseenvironment.
We presenteda list of capabilitiesthat should be provided by the DBMS softwareto
the DBA, databasedesigners,and usersto help them design,administer,and use a
database.Then we gavea brief historicalperspectiveon the evolution of database
with information
applications.We pointed out the marriage of databasetechr-rology
retrieval technology,which will play an important role due to the popularity of the
Web.Finally,we discussedthe overheadcostsof using a DBMS and discussedsome
to useone.
situationsin which it may not be advantaseous
ReviewOuestions
:. :, Define the following rerms: datn, dstabase,DBMS, databasesystenl,datebase
userview,DBA, end user,carurcdtransaccatalog,program-dataindependence,
tion, deductivedatabasesystem,persistentobject,nteta-data,and transactiottprocessing appl icnti ort.
, Jr, What four main typesof actionsinvolvedaterbases?
Briefly discusseach.
[rc nret
\ ri r P o se
.'. l'r.rIri ..t h . rt is
\ . a ()l nrc r'.trlY
-'r.irc h i-
'i ,5" What are the different typesof databaseend users?Discussthe main activities of each.
i.{: Discussthe capabilitiesthat should be provided by a DBMS.
; ;