You are on page 1of 25

Introduction to Database Systems

What Is a DBMS?
A

very large, integrated collection of da


ta.
Models real-world enterprise.

Entities (e.g., students, courses)


Relationships (e.g., Madonna is taking CS5
64)

Database Management System (DBM


S) is a software package designed to st

Historical Perspective
Early

1960s

Integrated data store, first general-purpose D

BMS designed by Charles Bachman at GE


Formed basis for network data model
Bachman received Turing Award in 1973 for h
is work in database area

Historical Perspective
Late

1960s

IBM developed Information Management Sys

tem (IMS), used even today in many major ins


tallations
IMS formed the basis for hierarchical data mo
del
American Airlines and IBM jointly developed
SABRE for making airline reservations
SABRE is used today to populate Web-based t
ravel services such as Travelocity

Historical Perspective

1970
Edgar Codd, at IBMs San Jose Research Laboratory, pro

posed relational data model.


It sparked the rapid development of several DBMSs bas
ed on relational model, along with a rich body of theor
etical results that placed the field on a firm foundation.
Codd won 1981 Turing Award.
Database systems matured as an academic discipline
The benefits of DBMS were widely recognized, and the
use of DBMSs for managing corporate data became sta
ndard practice.

Historical Perspective
1980s
Relational data model consolidated its positio

n as dominant DBMS paradigm, and database


systems continued to gain widespread use
SQL query language, developed as part of IB
Ms System R project, is now the standard que
ry language
SQL was standardized in late 1980s, and curre
nt standard SQL:1999 was adopted by ANSI a
nd ISO

Historical Perspective

Late 1980s till 1990s


Considerable research into more powerful query langu

age and richer data model, with emphasis on supportin


g complex analysis of data from all parts of an enterpri
se
Several vendors, e.g., IBMs DB2, Oracle 8, Informix UD
S, extended their systems with the ability to store new
data types such as images and text, and to ask more co
mplex queries
Data warehouses have been developed by many vendo
rs to consolidate data from several databases, and for c
arrying out specialized analysis

File Systems vs DBMS


Must write special programs to answer each qu
estion a user may want to ask about data
Must protect data from inconsistent changes m
ade by different users accessing data concurren
tly
Must cope with system crashes to ensure data c
onsistency
Need to enforce security policies in which differ
ent users have permission to access different s
ubsets of the data

Why Use a DBMS?


Data

independence (see next page) an


d efficient access.
Reduced application development time
.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from cras
hes.

Program-data dependence --- Three fil


e processing systems at Some Compa
ny

File descriptions are stored within each application program that


accesses a given file. Any change to a file structure requires changes
to the file descriptions for all programs that access the file.

Why Study Databases??


Shift

from computation to information

at the low end: scramble to webspace (a me


ss!)
at the high end: scientific applications

Datasets

me.

increasing in diversity and volu

Digital libraries, interactive video, Human Gen


ome project, EOS project
... need for DBMS exploding

DBMS

encompasses most of CS

Data Models
data model is a collection of concepts for des
cribing data.
A schema is a description of a particular collecti
on of data, using the a given data model.
The relational model of data is the most widely
used model today.
A

Main concept: relation, basically a table with rows a


nd columns.
Every relation has a schema, which describes the col
umns, or fields.

Levels of Abstraction

Many views, single conce


ptual (logical) schema an
d physical schema.

Views describe how users s


ee the data.

View 1 View 2 View 3


Conceptual Schema
Physical Schema

Conceptual schema defines


logical structure
Physical schema describes t
he files and indexes used.

Schemas are defined using DDL; data is modified/queried using DML.

Example: University Databas


e
Conceptual

Students(sid: string, name: string, login: string,


age: integer, gpa:real)
Courses(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string, grade:string)

Physical

schema:

Relations stored as unordered files.


Index on first column of Students.

External

schema:

Schema (View):

Course_info(cid:string,enrollment:integer)

Data Independence
Applications

insulated from how data is s


tructured and stored.
Logical data independence: Protection fr
om changes in logical structure of data (t
he capacity to change the conceptual sch
ema without having to change external sc
hemas or application programs).
One of the most important benefits of using a DBMS!

Data Independence (cont.)


Physical

data independence: Protection


from changes in physical structure of dat
a (the capacity to change the internal sch
ema without having to change the concep
tual (or external) schemas).

One of the most important benefits of using a DBMS!

Concurrency Control
Concurrent

execution of user programs


ssential for good DBMS performance.

is e

Because disk accesses are frequent, and relatively slow, i


t is important to keep the cpu humming by working on s
everal user programs concurrently.

Interleaving

actions of different user programs can


lead to inconsistency: e.g., check is cleared while a
ccount balance is being computed.
DBMS ensures such problems dont arise: users ca
n pretend they are using a single-user system.

Transaction: An Execution of a DB Progr


am
concept is transaction, which is an atomic sequ
ence of database actions (reads/writes).
Each transaction, executed completely, must leave t
he DB in a consistent state if DB is consistent when
the transaction begins.
Key

Users can specify some simple integrity constraints on th


e data, and the DBMS will enforce these constraints.
Beyond this, the DBMS does not really understand the se
mantics of the data. (e.g., it does not understand how th
e interest on a bank account is computed).
Thus, ensuring that a transaction (run alone) preserves c
onsistency is ultimately the users responsibility!

Scheduling Concurrent Transactions


DBMS

ensures that execution of {T1, ... , Tn} is equiv


alent to some serial execution T1 ... Tn.

Before reading/writing an object, a transaction requests


a lock on the object, and waits till the DBMS gives it the lo
ck. All locks are released at the end of the transaction.
(Strict 2PL locking protocol.)
Idea: If an action of Ti (say, writing X) affects Tj (which per
haps reads X), one of them, say Ti, will obtain the lock on
X first and Tj is forced to wait until Ti completes; this effe
ctively orders the transactions.
What if Tj already has a lock on Y and Ti later requests a l
ock on Y? (Deadlock!) Ti or Tj is aborted and restarted!

Ensuring Atomicity
ensures atomicity (all-or-nothing propert
y) even if system crashes in the middle of a Xact.
Idea: Keep a log (history) of all actions carried ou
t by the DBMS while executing a set of Xacts:
DBMS

Before a change is made to the database, the corresp


onding log entry is forced to a safe location. ( WAL pro
tocol; OS support for this is often inadequate.)
After a crash, the effects of partially executed transact
ions are undone using the log. (Thanks to WAL, if log e
ntry wasnt saved before the crash, corresponding cha
nge was not applied to database!)

The Log
The

following actions are recorded in the log:

Ti writes an object: the old value and the new value.


Log record must go to disk before the changed page!

Ti commits/aborts: a log record indicating this action.

Log

records chained together by Xact id, so its easy t


o undo a specific Xact (e.g., to resolve a deadlock).
Log is often duplexed and archived on stable storag
e.
All log related activities (and in fact, all CC related acti
vities such as lock/unlock, dealing with deadlocks et
c.) are handled transparently by the DBMS.

Overview of System Architecture


Database Server

Database Cache

Log Buffer
read
write

begin

Database
Database
Page
Page

commit, rollback
write

Volatile
Memory
Stable
Storage

Stable
Database

fetch

Database
Database
Page
Page

flush

Log
Log Entry
Entry
force

Stable
Log

Log
Log Entry
Entry

Databases make these folks happ


y ...
End

users and DBMS vendors


DB application programmers

E.g. smart webmasters

Database

administrator (DBA)

Designs logical /physical schemas


Handles security and authorization
Data availability, crash recovery
Database tuning as needs evolve

Must understand how a DBMS works!

Structure of a DBMS

These layers
must consider
concurrency
control and
recovery

A typical DBMS has a la


Query Optimization
and Execution
yered architecture.
The figure does not sh
Relational Operators
ow the concurrency co
ntrol and recovery com Files and Access Methods
ponents.
Buffer Management
This is one of several p
ossible architectures; e Disk Space Management
ach system has its own
variations.
DB

Summary
DBMS

used to maintain, query large datasets.


Benefits include recovery from system crashe
s, concurrent access, quick application develo
pment, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs
and are well-paid!
DBMS R&D is one of the broadest,
most exciting areas in CS.

You might also like