Professional Documents
Culture Documents
2433-001
Database Systems
DBMS
Layers
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Tuples/Records
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Relations
Files
Pages
Blocks
Sectors
Example:
Suppose we have a relation with fields: name, age, and salary.
How will do we sort it?
Index
What is an index?
Records
Index
It is:
a data structure
a pointer (called data entry in the textbook) to a
data record
organized based on search key
Hash-Based Indexing
Use a hash function h(r) where r is a field
value
The output of h(r) points to a bucket.
Bucket = primary page plus zero or more
overflow pages
The buckets contain <key, rid> or <key, ridlist> pairs.
Hash-based indexes are best for equality
selections cannot support range searches
Hash-Based Indexing
Tree-Based Indexing
heap file
File sorted on <age,sal>
clustered B+ tree file sorted on <age,sal>
heap file with unclustered B+ tree index on <age,sal>
heap file with an unclustered hash index on <age,sal>
(a) Scan
(b) Equality
(c ) Range
(d) Insert
(e) Delete
BD
0.5BD
BD
2D
Search
+D
Search
+BD
Search
+D
Search
+ 2D
Dlog 2 B +
# matches
1.5BD
(3) Clustered
Dlog F 1.5B Dlog F 1.5B
+ # matches
(4) Unclustered BD(R+0.15) D(1 +
Dlog F
Tree index
log F
0.15B
0.15B)
+ # matches
(5) Unclustered BD(R+0.1 2D
BD
Hash index
25)
(2) Sorted
BD
Dlog 2B
Search
+ BD
Search
+D
D(3 +
log F
0.15B)
4D
Search
+ 2D
The Table above shows and average of the I/O cost only.
Index Choice
What indexes should we create?
Clustered? Hash/tree?
Hash-based are optimized for equality
Tree-based supports equality and range
Sorted file is pretty expensive to maintain
Index Choice
One approach:
Guidelines
Attributes in WHERE clause are candidates for index
keys.
Exact match condition suggests hash index.
Range query suggests tree index.
Examples
SELECT E.dno
FROM Emp E
WHERE E.age>40
Clustered
B+ tree index on E.age can be used to
get qualifying tuples.
Examples
SELECT E.dno, COUNT (*)
FROM Emp E
WHERE E.age>10
GROUP BY E.dno
Examples
To retrieve Emp records with
age=30 AND sal=4000, an index
on <age,sal> would be better
than an index on age or an index
on sal.
If condition is: 20<age<30 AND
3000<sal<5000:
AND
11,80
11
12,10
12
12,20
13,75
<age, sal>
10,12
20,12
75,13
10
cal 11
80
joe 12
20
sue 13
75
12
13
<age>
10
Data records
sorted by name
80,11
<sal, age>
20
75
80
<sal>
Data entries
sorted by <sal>
Data
Page
Data
Page
Full Pages
Header
Page
Data
Page
Data
Page
Data
Page
Pages with
Free Space
Data
Page
Data
Page
Full Pages
Header
Page
Data
Page
Data
Page
Data
Page
Pages with
Free Space
Directory of Pages
Data
Page 1
Header
Page
Data
Page 2
DIRECTORY
Data
Page N
Directory of Pages
Data
Page 1
Header
Page
Data
Page 2
DIRECTORY
Data
Page N
Page Format
Page abstraction is useful for I/O issues.
Higher levels of DBMS see data as
collection of records.
How can a collection of records be
arranged on a page?
Page Format:
Fixed Length Records
Record slots are uniform and can be
arranged consecutively within a page.
At any instant: some slots are occupied
by records and some are not.
How do we keep track of empty slots
and how do we locate all records on a
page?
Page Format:
Fixed Length Records
Alternative 1:
Store records in the first N slots
Whenever a record is deleted move the last
record on the page into the vacated slot.
Advantage: Can locate the ith record on the
page with simple offset calculation.
Advantage: All empty slots appear at the end
of the page.
Disadvantage: Does not work if there are
external references to the record that moved.
Page Format:
Fixed Length Records
Alternative 2:
Handle deletion using array of bit
One bit per slot to keep track of free slots
When a page is deleted, its bit is turned
off (i.e. 0).
Locating records on the page requires
scanning the bit array to locate slots whose
bit is on.
Page Format:
Fixed Length Records
Slot 1
Slot 2
Slot 1
Slot 2
Free
Space
...
Slot N
...
Slot N
Slot M
N
PACKED
1 . . . 0 1 1M
number
of records
M ... 3 2 1
UNPACKED, BITMAP
number
of slots
Page Format:
Variable Length Records
Cannot divide the page into fixed-length
slots
Challenge: When a new record is to be
inserted, we have to find an empty slot
of just the right length.
Challenge: We must ensure that the
free space on the page is contiguous.
So The ability to move records on a
page becomes very important
Page Format:
Variable Length Records
Directory of slots
<record offset, record length> per slot
record offset: offset in bytes from the
start of the data area to the start of
the record
Deletion: setting record offset to -1
rid <page id, slot id> does not change
when a record moves.
Page Format:
Variable Length Records
Maintain a pointer to the start of the free
space area
When a new record does not fit into the
remaining free space move records in
the page to reclaim space deleted earlier.
Cannot always remove a slot of a deleted
record (or the rid of the other slots will
change).
When a new record is inserted, the
directory is scanned for an element not
pointing to a record.
Page Format:
Variable Length Records
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20
N
...
16
2
SLOT DIRECTORY
24
N
1 # slots
Pointer
to start
of free
space
Page Format
Beside slots information, a page usually
contains file-level information (e.g. id of
the next page, etc).
The slotted page organization used for
variable length records can also be used
for fixed-length records
Fixed-Length Records
F1
L1
F2
F3
F4
L2
L3
L4
Address = B+L1+L2
Variable-Length Records
Alternative 1:
Alternative 2:
Variable-Length Records
4
Field
Count
F2
F3
F4
Conclusions
Many alternative file organizations exist,
each appropriate in some situation.
If selection queries are frequent, sorting
the file or building an index is important.
Indexes support efficient retrieval of
records based on the values in some fields.
Understanding the nature of the workload
for the application, and the performance
goals, is essential to developing a good
design.