You are on page 1of 50

Databases 2010

Algorithms
and
Data Structures

Michael I. Schwartzbach
Computer Science, University of Aarhus
Implementing Queries

ƒ Implementation of queries requires:


• data structures
• algorithms

ƒ Only a few basic operations to consider:


• sorting
• selection (σ)
• projection (π)
• join (ZY )
• remove duplicates

Algorithms and Data Structures 2


Data Storage

ƒ Relations are stored as bits.


ƒ Functionality requirements
• Sequentially process rows
• Search for rows that meet some condition
• Insert and delete rows
ƒ Performance objectives
• Achieve a high packing density (little wasted space)
• Achieve fast response time

Algorithms and Data Structures 3


File Organization Overview

ƒ File
• A database is stored as a collection of files.
• Storage (usually a disk)
• Random access (requires disk arm move)
• Non-volatile
ƒ Records
• A file is a set of records, generally of the same type.
• A record is a sequence of fields.
• Record size
• Fixed - same number of fields of same size
• Variable - may have a different number of fields or a field may
vary in size
Algorithms and Data Structures 4
Disk Concepts – Files

ƒ Modern file systems


• Blocks not necessarily contiguous
report.doc book.htm report.doc report.doc
• block
Allocatedunused
in blocks block 1
1 block 2 block 3

ƒ Contiguous block I/O faster (avoids disk seek)


• 8 msec to seek, read
• .4 msec for contiguous block read
ƒ Can reorganize to make block chains contiguous

Algorithms and Data Structures 5


Disk Concepts - Blocks

ƒ Input/Output entire blocks


ƒ Block size is O/S dependent (e.g., 4-8KB)
ƒ Block size is usually much bigger than record size
• Many records in each block
• Fill percentage – percentage of filled space per block

compact

room to grow

50% filled 100% filled

Algorithms and Data Structures 6


Storing a Table

ƒ A table is typically stored on disk


ƒ Several rows fit into one disk block
ƒ Disks are slow:
• accessing a random block is 8ms
• accessing the next consecutive block is 0.4ms
• RAM access time 8-10 ns (L1+L2 cache even faster)
• one disk access = 1,000,000 RAM accesses!
• Justifies ”count only I/Os” model of complexity

ƒ Part of the table may be in RAM (buffer pool)


ƒ The table can be stored in sorted order
Algorithms and Data Structures 7
Heap File
ƒ Records are stored in blocks, no particular order
ƒ Example: Assume records are names
blocks
Leonardo Tom Gerard
Racquel Mufasa
Amanda Arnold
Omar Jamie

ƒ Assume n records pr table and R records pr block


• Search for 1 record: ⎡ n/2R ⎤ accesses O(n) - slow
• Insertion: 2 O(1) - fast
• Deletion (k records): n/R O(n) - slow
• Modification (k records): n/R O(n) - slow
Algorithms and Data Structures 8
Sequential File

ƒ Suitable for applications that need sequential


access.
ƒ The records in the file are ordered by a search-
key.
ƒ Example: Each record is a name.
Amanda Jamie Racquel
Arnold Leonardo Tom
Gerard Mufasa
Omar

blocks

Algorithms and Data Structures 9


Sequential File

ƒ Search for key value, O(log2(n)) cost


ƒ Deletion: 1 disk I/O (after search)
ƒ Insertion: locate where record is to be inserted
• If there is free space, insert
• If no free space, insert the record in an overflow block
or shift records to previous or next block
ƒ Reorganize
• Restore block contiguity, fill percentage
• Remove empty blocks
• O(n) cost

Algorithms and Data Structures 10


Sorting a Table

ƒ In RAM, many sorting algorithms are available


• typical time complexity is O(n log2(n))

ƒ Those can also be performed on disks


• but they often perform poorly
• only I/O accesses need to be counted

ƒ Specialized versions of algorithms are needed

Algorithms and Data Structures 11


An Example Scenario

ƒ A 1 GB table with:
• 10,000,000 rows
• each row is 100 bytes

ƒ A disk with 4K blocks:


• each holding 40 rows
• 250,000 blocks for the entire table

ƒ 50MB RAM:
• 1/20 of the table

Algorithms and Data Structures 12


Recursive Merge Sort

ƒ The merge sort algorithm:


• split the list into two sublists
• recursively sort the sublists
• merge the sorted sublists to get the sorted list
• time complexity is O(n log2(n))

ƒ Each row is read and written log2(107) = 23 times

ƒ Time consumed:
• 23 × 2 × 10,000,000 × 8ms = 43 days

Algorithms and Data Structures 13


Two-Phase Multiway Merge Sort

ƒ Load RAM with 12,500 blocks = 500,000 rows


• sort those (for ”free”) using any RAM sorting algorithm
ƒ Do this 20 times to obtain 20 sorted sublists
• store the 20 sorted sublists on 20 disks
ƒ Merge the sublists using a RAM buffer for each
• only consecutive reads of blocks
• only consecutive writes of blocks
ƒ Each row is read and written 2 times

ƒ Time consumed:
• 2 × 2 × 20 ×12,500 × 0.4ms = 6.7 minutes
Algorithms and Data Structures 14
Lessons Learned

ƒ Naive algorithms won’t work

ƒ The reality of storage must be considered:


• use entire block contents
• read blocks consecutively
• buffer information in RAM

Algorithms and Data Structures 15


Selection

ƒ SELECT *
FROM R
WHERE condition;

ƒ Full table scan:


• read all rows in the table
• report those that satisfy the condition

ƒ Fine if many rows will actually be selected:


• rule of thumb is 5-10%

Algorithms and Data Structures 16


Range Query

ƒ SELECT *
FROM Meetings
WHERE date >= ’2008-08-25’ AND
date < ’2008-12-24’;

ƒ Optimization if Meetings is sorted on date:


• find first row with the start date
• report all rows until the end date (consecutive blocks)

Algorithms and Data Structures 17


Point Query

ƒ SELECT *
FROM People
WHERE userid = ’amoeller’;

ƒ We know that userid is a key


ƒ Optimization if People is sorted on userid:
• full table scan can stop sooner

ƒ Binary search not necessarily better:


• random disk access vs. sequential access
ƒ So, what can help us?
Algorithms and Data Structures 18
Indexes

ƒ A table can be equipped with an index:


• a data structure that helps you find rows quickly
• rows are identified by a subset of the attributes

ƒ A table may have several indexes:


• whereas it can only be sorted on one criterion

ƒ Pros and cons of indexes:


• make (certain) queries faster
• make all modifications slower

Algorithms and Data Structures 19


Indexes in SQL

ƒ CREATE INDEX DateIndex


ON Meetings(date);

ƒ CREATE INDEX ExamIndex


ON Exams(vip,date,time);

ƒ An index on several attributes also gives an index


for any prefix of those attributes

ƒ Think of this as a virtual sorting of the table

ƒ Each primary key has by default an index


Algorithms and Data Structures 20
Using Indexes

ƒ CREATE INDEX Idx ON R(a1,a2,...,an);


ƒ Some queries are now ”easy”:
• a range query or point query on a1
• a point query on a1 combined with a range query on a2
• a range query on a1, a2, and a3
ƒ Others are not really easier:
• a range query on a17

ƒ In case of large modifications of the table:


• DROP INDEX Idx;
• rebuild the index afterwards
Algorithms and Data Structures 21
Indexed File

ƒ Suitable for applications that require random


access
ƒ Usually combined with sequential file
ƒ A single-level index is an auxiliary file of entries
<search-key, pointer to record>
ordered on the search-key.
ƒ Index is separate from data file
• Usually smaller
10-20% rule of thumb, take with a grain of salt!
• Can have multiple indexes on same relation

Algorithms and Data Structures 22


Searching a Single-Level Index

ƒ Sequential search
• Faster than linear search of main file.
• Index is smaller than the main file
• Worst-case search cost is still O(n).
ƒ Binary search
• Key space:

• Search cost is O(log2(n)) time (n = size of the index).

Algorithms and Data Structures 23


B-Trees

ƒ A data structure for indexes on table


• a variation of search trees
• trades some extra space to gain better performance
ƒ Supports the necessary operations:
• insert a new row
• delete an exisiting row
• search for a row given the index attributes
ƒ ”Perfect” for disk storage
• high fanout
• very robust to data changes, data volumes, etc.
• used by ALL RDBMSes
Algorithms and Data Structures 24
B-Tree Example

100

120
150
180
30

179
150
156
120
130

180
200
100
101
110
30
35
11
3
5

ƒ Each node is stored in one disk block


ƒ Each row is pointed to by a leaf node

Algorithms and Data Structures 25


B-Tree Internal Node

95
57

81
to keys to keys to keys to keys
< 57 57≤ k< 81 81≤ k< 95 ≥ 95

Algorithms and Data Structures 26


B-Tree Leaf Node

to next

95
57

81
leaf node

to record to record to record


with key 57 with key 81 with key 95

Algorithms and Data Structures 27


B-Tree Invariants

ƒ Assume each node (block) holds at most k keys


• typically k is several hundreds

ƒ Each node must hold at least ⎣(k+1)/2⎦ pointers


• except for the root: may have down to 2 pointers
ƒ All leaves must be at the same level

ƒ This ensures that the tree remains balanced:


• its height with n rows is at most 1+logk/2(n)
• in practice the height is 3 or 4 (1-2 top levels in RAM)

Algorithms and Data Structures 28


B-Tree Point Query

100

120
150
180
30

179
150
156
120
130

180
200
100
101
110
30
35
11
3
5

ƒ Search path for key 101


ƒ Time proportional to the height of the tree

Algorithms and Data Structures 29


B-Tree Range Query

100

120
150
180
30

179
150
156
120
130

180
200
100
101
110
30
35
11
3
5

ƒ Subtree for keys between 101 and 166


ƒ Time proportional to height + size of range

Algorithms and Data Structures 30


B-Tree Insertion (1/4)

100

120
150
180
30

179
150
156
120
130

180
200
100
101
110
30
35
11
3
5

ƒ Inserting 33 (simple case)

Algorithms and Data Structures 31


B-Tree Insertion (1/4)

100

120
150
180
30

179
150
156
120
130

180
200
100
101
110
30
33
35
11
3
5

ƒ Inserting 33 (simple case)

Algorithms and Data Structures 32


B-Tree Insertion (2/4)

100

120
150
180
30

179
150
156
120
130

180
200
100
101
110
30
33
35
11
3
5

ƒ Inserting 7 (split leaf)

Algorithms and Data Structures 33


B-Tree Insertion (2/4)

100

120
150
180
30
7

179
150
156
120
130

180
200
100
101
110
30
33
35
11
3
5

ƒ Inserting 7 (split leaf)

Algorithms and Data Structures 34


B-Tree Insertion (3/4)

100

120
150
180
30
7

179
150
156
120
130

180
200
100
101
110
30
33
35
11
3
5

ƒ Inserting 160 (split internal node)

Algorithms and Data Structures 35


B-Tree Insertion (3/4)

100
160

120
150

180
30
7

150
156

179
120
130

160

180
200
100
101
110
30
33
35
11
3
5

ƒ Inserting 160 (split internal node)

Algorithms and Data Structures 36


B-Tree Insertion (4/4)

10
20
30
10
12

20
25

30
32
40
1
2
3

ƒ Insert 45 (split the root)

Algorithms and Data Structures 37


B-Tree Insertion (4/4)

30
10
20

40
10
12

20
25

30
32

40
45
1
2
3

ƒ Insert 45 (split the root)


ƒ The height increases
Algorithms and Data Structures 38
B-Tree Deletion

ƒ Balanced deletion is also possible


• there are similar case-based algorithms

ƒ Generally, deleted rows are left as tombstones


• the overhead of deletion is too large

ƒ Most tables tend to grow with time


• the tombstones quickly get reused

ƒ Otherwise, periodically rebuild the index


• or perform online reorg of the index

Algorithms and Data Structures 39


Cluster Index

ƒ Generally, indexed rows are scattered in the


table
ƒ A clustered index has consecutive rows:

ƒ Equivalent to sorting the table

Algorithms and Data Structures 40


Clustering Index

ƒ At most one index can be the clustering index


• but other indexes may happen to be clustered too
• attributes may be correlated

ƒ CREATE INDEX ExamIndex


ON Exams
CLUSTER(vip,date,time);

ƒ A cluster index on a primary key is a bad idea


• range queries are not often meaningful
• keys should not not carry information themselves

Algorithms and Data Structures 41


Index Queries

ƒ If the query only uses index attributes:


• ”virtually constant” time evaluation

ƒ SELECT date
FROM Exams
WHERE vip = ’amoeller’;

ƒ SELECT vip, COUNT(date) AS Dates


FROM Exams
GROUP BY vip;
Algorithms and Data Structures 42
Boolean Index Selection

ƒ SELECT *
FROM R
WHERE x=42 AND y>87;

ƒ We have one index for x and another for y


• use index scan to find row pointers for x=42
• use index scan to find row pointers for y>87
• compute the intersection of those pointer set

ƒ Similarly, OR corresponds to disjunction

Algorithms and Data Structures 43


Projection and Duplicates

ƒ Projection on a superkey:
• no duplicates
• full table scan

ƒ Removing duplicates:
• any index structure on the remaining attributes help
• otherwise, use a variation of multiway merge sort

Algorithms and Data Structures 44


Join

ƒ Many different join algorithms, in particular:


• nested loop join – often good for ”small joins”
• merge scan join – often good for ”large joins”

ƒ Which to use depends on many factors:


• sizes of the input tables
• expected size of the result table
• existence of indexes
• degree of clustering

ƒ Query plan selected based on cost estimates


Algorithms and Data Structures 45
Join Query Structure

ƒ SELECT a1, a2, ...,an


FROM R,S
WHERE localpred(R) AND
localpred(S) AND
joinpred(R,S);

ƒ localpred(R) is local to R
ƒ localpred(S) is local to S
ƒ joinpred(R,S) uses both R and S attributes

Algorithms and Data Structures 46


Nested Loop Join

ƒ scan the outer R table


for each row satisfying localpred(R)
search the inner table S
select rows satisfying localpred(S) and joinpred(R,S)
if row(s) exist
concatenate the rows from R and S
else
discard row for inner join
pad with NULLs for left outer join

Algorithms and Data Structures 47


Nested Loop Join in Practice

ƒ Read R and S consecutively using blocks

ƒ Store as much of R in RAM as possible


• concatenate S rows with many R rows at once

ƒ If possible, use indexes for the local predicates

ƒ Assume k rows in a block, m rows in RAM


ƒ Time complexity is:
O(|R|/k + |R||S|/k2m)

Algorithms and Data Structures 48


Merge Scan Join

ƒ sort R and S on attributes in joinpred(R,S)


merge the two tables
look at the rows with smallest value
if it satisifies all predicates with the other row
combine all rows with these values
else
read next row in that table
if one table runs out of rows
discard row for inner join
pad with NULLs for full outer join

Algorithms and Data Structures 49


Merge Scan Join in Practice

ƒ Read R and S consecutively using blocks

ƒ Store as much of R and S in RAM as possible


• when combining rows in the two tables

ƒ If possible, use indexes for local predicates

ƒ Assume k rows in a block, m rows in RAM


ƒ Time complexity is:
O(|R|/k + |S|/k +|R ZY S|/k + |R ZY S|/m)

Algorithms and Data Structures 50

You might also like