Professional Documents
Culture Documents
Outline
Cost of searching with and without indices Definition and structure Inserting Deleting
B+ trees
B+ tree operations
accountnum
1230001 8220002 5270008 1210004 6210005 3220003 1200022 1230018
Datafile(accountstable)
branch
Aberdeen Aberdeen Bolton Croydon Cambridge Dublin Edinburgh Glasgow
balance
3,455 68,000 73,500 55,345 25,114 5,210 85,300 15,772
P0 P1
Diskpage
. P2 . . . . . .
P
i
Glasgow
London
Pj Pm
. . .
4250618 4250406 4230309 1230518
3
Manchester
Assume
NR : number of records in the table FR : data blocking factor, # of data records that fit in a block Fi : index blocking factor, # of index entries that fit in a block Assume a dense primary index (#records=#index entries) Binary search on index, then fetch data block
220 (over 1 million records) 10 (400 byte records, and 4KB pages)
256 = 28 (16 byte search-key,pointer pairs in 4KB pages) NR / FR = 220 / 10 = 104858 da (disk accesses, ~ 18 minutes) 1 + log2(NR / Fi) = 1 + log2(220/28) = 13 da (130 msec)
(Calculationsassume10msecperdiskaccess) 5
Tree-structured indices
... ... ...
Each tree node is stored in a single disk block Each node packs in a large number of key,pointer pairs
Number of children of a node is called fan-out (m) Trees Reduce search cost significantly
Example revisited
220 (over 1 million records) 10 (400 byte records, and 4KB pages)
256 = 28 (16 byte key,pointer pairs in 4KB pages) 1 + logm(NR / Fi) 1 + log128(220/28) = 1 + log128(212) = 3 da (30msec)
7
Assume all nodes in the tree are half-full (fan out m = Fi /2 = 128)
B trees
General form of multi-level index Generalise binary search trees Balanced tree
All leaves are at same depth At the expense of some space overhead
B+ tree
B tree
Data pointers may be stored in internal nodes Every value of the search field appears once
B+ tree
Data pointers stored only in leaf nodes Internal nodes contain only keys and tree pointers
Can pack more pointers in internal nodes Improved search time due to fewer levels in the tree No wasted space due to null tree pointers in leaf nodes
10
B+ tree
100
k<100
30
k>=100
120 150 180
11
30 35
120 130
180 200
Datapointers
Where q <=p : The B+ tree is said to be of order p Order of internal node Order of leaf node
Ki-1 <= X < Ki for 1<i<q X < Ki for i=1 Ki-1 <= X for i=q
12
Each internal node has at most p pointers Each internal node has at least p/2 pointers
Except for the root Root node has at least two pointers unless it is a leaf node
13
120
150
180
Tokeys K<120
120K<150
150K<180
K180
14
<(K1,Pr1), P2, (K2,Pr2), ..,(Kq-1, Prq-1), Pnext> q <= p Pnext (sequence pointer) points to next leaf node of B+ tree Pri is a data pointer, points to record whose key value is Ki
Within each leaf node, K1 < K2 < ... < Kq-1 Each leaf node has at least p/2 values All leaf nodes are at the same level
15
11
Sequencepointer (Tonextleafnode)
Torecord withkey3
Torecord withkey5
Torecord withkey11
16
B+ tree search
100
k<100
30
k>=100
120 150 180
k<30
3 5 11
k>=30
30 35 100 101 110 120 130 150 156 179 180 200
Datapointers
Follow the associated pointer (Pi) If no such key found, follow last pointer in node ...until leaf node
17
Leaf nodes usually contain record pointers If a search value matches more than one record
18
Because nodes may overflow or underflow Insert data record with search-key k
Add data record to file, create indirect block if there isn't one Add record pointer to indirect block Add data record to file Insert (k, data pointer) in leaf node Such that all search keys in leaf node remain in order
19
Find leaf node with search-key k Find data record pointer, delete data record from file Remove (k, record pointer) entry from leaf node if there is no indirect block associated with that entry
20
B+ insert
Four cases
1. Simple case: There is space in leaf node 2. Leaf node overflow 3. Internal node overflow 4. New root
21
Treeoforder=3 k>=100
120
k<100
30
11
30 35 42
120 130
Datapointers
22
Treeoforder=3 k>=100
120
k<100
30
11
30 35 42
120 130
Datapointers
23
30
120
11
30 35
24
30
120
150
120 130
25
30
120
150
120 130
150 156
165 179
26
30
120
165
120 130
150 156
165 179
27
30
120
165
120 130
150 156
165 179
28
Rootnode
120 130
29
120 130
150 156
170 179
30
120 130
150 156
170 179
31
Newrootnode
170
120 130
150 156
170 179
32
B+ tree delete
Delete110
120 150
170
120 130
150 156
170 179
33
B+ tree delete
Delete130
120 150
170
120 130
150 156
170 179
34
Simple case
Borrow one key from adjacent node Redistribute evenly between adjacent nodes
35
Coalesce with sibling node Update pointers at parent node Parent node may underflow
36
Index summary
B+tree supports equality and range searching Dynamic schemes that grow/shrink with data file Support equality search But no support for range queries
Indices in SQL
Which field(s) to create indices on? In principle, the DBMS should automatically figure this out
Based on cost of index maintenance And the benefit to query workload Although some DBMSs provide tools to assist
38
Indices in SQL
create index <index-name> on <table-name> (<attribute list>) create index c-index on account (cust-city)
Index creation fails if table contains duplicates values for the search-key Once index is created, insertion of duplicate values for that field are rejected
39