Professional Documents
Culture Documents
Algorithms
and
Data Structures
Michael I. Schwartzbach
Computer Science, University of Aarhus
Implementing Queries
File
• A database is stored as a collection of files.
• Storage (usually a disk)
• Random access (requires disk arm move)
• Non-volatile
Records
• A file is a set of records, generally of the same type.
• A record is a sequence of fields.
• Record size
• Fixed - same number of fields of same size
• Variable - may have a different number of fields or a field may
vary in size
Algorithms and Data Structures 4
Disk Concepts – Files
compact
room to grow
blocks
A 1 GB table with:
• 10,000,000 rows
• each row is 100 bytes
50MB RAM:
• 1/20 of the table
Time consumed:
• 23 × 2 × 10,000,000 × 8ms = 43 days
Time consumed:
• 2 × 2 × 20 ×12,500 × 0.4ms = 6.7 minutes
Algorithms and Data Structures 14
Lessons Learned
SELECT *
FROM R
WHERE condition;
SELECT *
FROM Meetings
WHERE date >= ’2008-08-25’ AND
date < ’2008-12-24’;
SELECT *
FROM People
WHERE userid = ’amoeller’;
Sequential search
• Faster than linear search of main file.
• Index is smaller than the main file
• Worst-case search cost is still O(n).
Binary search
• Key space:
100
120
150
180
30
179
150
156
120
130
180
200
100
101
110
30
35
11
3
5
95
57
81
to keys to keys to keys to keys
< 57 57≤ k< 81 81≤ k< 95 ≥ 95
to next
95
57
81
leaf node
100
120
150
180
30
179
150
156
120
130
180
200
100
101
110
30
35
11
3
5
100
120
150
180
30
179
150
156
120
130
180
200
100
101
110
30
35
11
3
5
100
120
150
180
30
179
150
156
120
130
180
200
100
101
110
30
35
11
3
5
100
120
150
180
30
179
150
156
120
130
180
200
100
101
110
30
33
35
11
3
5
100
120
150
180
30
179
150
156
120
130
180
200
100
101
110
30
33
35
11
3
5
100
120
150
180
30
7
179
150
156
120
130
180
200
100
101
110
30
33
35
11
3
5
100
120
150
180
30
7
179
150
156
120
130
180
200
100
101
110
30
33
35
11
3
5
100
160
120
150
180
30
7
150
156
179
120
130
160
180
200
100
101
110
30
33
35
11
3
5
10
20
30
10
12
20
25
30
32
40
1
2
3
30
10
20
40
10
12
20
25
30
32
40
45
1
2
3
SELECT date
FROM Exams
WHERE vip = ’amoeller’;
SELECT *
FROM R
WHERE x=42 AND y>87;
Projection on a superkey:
• no duplicates
• full table scan
Removing duplicates:
• any index structure on the remaining attributes help
• otherwise, use a variation of multiway merge sort
localpred(R) is local to R
localpred(S) is local to S
joinpred(R,S) uses both R and S attributes