You are on page 1of 39

More on indexing: B+ trees

amiri advanced databases '05

Outline

Motivation: Search example

Cost of searching with and without indices Definition and structure Inserting Deleting

B+ trees

B+ tree operations

amiri advanced databases '05

Dense ordered index on accounts


Indexfile
Searchkey Pointer
Aberdeen Aberdeen Bolton

accountnum
1230001 8220002 5270008 1210004 6210005 3220003 1200022 1230018

Datafile(accountstable)
branch
Aberdeen Aberdeen Bolton Croydon Cambridge Dublin Edinburgh Glasgow

balance
3,455 68,000 73,500 55,345 25,114 5,210 85,300 15,772

P0 P1

Diskpage

. P2 . . . . . .
P
i

Glasgow

London

Pj Pm

. . .
4250618 4250406 4230309 1230518
3

Manchester

London London Manchester Manchester

35,314 225,210 865,300 1,135,772

amiri advanced databases '05

Index access performance

Assume

NR : number of records in the table FR : data blocking factor, # of data records that fit in a block Fi : index blocking factor, # of index entries that fit in a block Assume a dense primary index (#records=#index entries) Binary search on index, then fetch data block

Cost of searching for a data record (in disk I/Os)


Cost = 1 + log2(NR / Fi) Assuming index is stored as a sequential file of blocks

Note that this makes updating the index expensive


4

amiri advanced databases '05

Index access Example

NR : number of records in the table

220 (over 1 million records) 10 (400 byte records, and 4KB pages)

FR : data blocking factor, # of data records that fit in a block

Fi : index blocking factor, # of index entries that fit in a block

256 = 28 (16 byte search-key,pointer pairs in 4KB pages) NR / FR = 220 / 10 = 104858 da (disk accesses, ~ 18 minutes) 1 + log2(NR / Fi) = 1 + log2(220/28) = 13 da (130 msec)
(Calculationsassume10msecperdiskaccess) 5

Cost of search (without an index)

Cost of search (using binary search of ordered index)

amiri advanced databases '05

Tree-structured indices
... ... ...

Each tree node is stored in a single disk block Each node packs in a large number of key,pointer pairs

Number of children of a node is called fan-out (m) Trees Reduce search cost significantly

Best case search cost: 1 + logm(NR / Fi )


6

amiri advanced databases '05

Tree Index access performance

Example revisited

NR : number of records in the table

220 (over 1 million records) 10 (400 byte records, and 4KB pages)

FR : record blocking factor, # of data records that fit in a block

Fi : index blocking factor, # of index entries that fit in a block

256 = 28 (16 byte key,pointer pairs in 4KB pages) 1 + logm(NR / Fi) 1 + log128(220/28) = 1 + log128(212) = 3 da (30msec)
7

Assume all nodes in the tree are half-full (fan out m = Fi /2 = 128)

amiri advanced databases '05

Not all trees are good trees


...

Trees should be short


Nodes should be relatively full Tree should be balanced


8

amiri advanced databases '05

B trees

General form of multi-level index Generalise binary search trees Balanced tree

All leaves are at same depth At the expense of some space overhead

Efficient insert and delete

amiri advanced databases '05

B+ tree

Popular variant of the B tree

B tree

Data pointers may be stored in internal nodes Every value of the search field appears once

at some level in the tree

B+ tree

Data pointers stored only in leaf nodes Internal nodes contain only keys and tree pointers

Can pack more pointers in internal nodes Improved search time due to fewer levels in the tree No wasted space due to null tree pointers in leaf nodes
10

amiri advanced databases '05

B+ tree
100

k<100
30

k>=100
120 150 180

11

30 35

100 101 110

120 130

150 156 179

180 200

Datapointers

Internal node of order p=4, leaf nodes of order 3

i.e. Internal/leaf nodes must have between 2 and 4 pointers

Leaf nodes are chained using sequence pointers


11

amiri advanced databases '05

B+ tree internal nodes

Each internal node of B tree is of the form

<P1, K1, P2, K2,...,Pq-1,Kq-1, Pq>


Where q <=p : The B+ tree is said to be of order p Order of internal node Order of leaf node

Pi is a tree pointer, points to another node in the B+ tree

Within each internal node, K1 < K2 < .. < Kq-1

For all key field values X in the subtree pointed to by Pi

Ki-1 <= X < Ki for 1<i<q X < Ki for i=1 Ki-1 <= X for i=q
12

amiri advanced databases '05

B+ tree internal nodes (continued)


Each internal node has at most p pointers Each internal node has at least p/2 pointers

Except for the root Root node has at least two pointers unless it is a leaf node

Internal node with q pointers has q-1 search field values

amiri advanced databases '05

13

B+ trees internal node

120

150

180

Tokeys K<120

120K<150

150K<180

K180

amiri advanced databases '05

14

B+ tree Leaf nodes

Each leaf node is of the form


<(K1,Pr1), P2, (K2,Pr2), ..,(Kq-1, Prq-1), Pnext> q <= p Pnext (sequence pointer) points to next leaf node of B+ tree Pri is a data pointer, points to record whose key value is Ki

Or points to an indirect block of pointers to data records

If search key is a nonkey field

Within each leaf node, K1 < K2 < ... < Kq-1 Each leaf node has at least p/2 values All leaf nodes are at the same level
15

amiri advanced databases '05

B+ tree leaf node

11
Sequencepointer (Tonextleafnode)

Torecord withkey3

Torecord withkey5

Torecord withkey11

amiri advanced databases '05

16

B+ tree search
100

k<100
30

k>=100
120 150 180

k<30
3 5 11

k>=30
30 35 100 101 110 120 130 150 156 179 180 200

Datapointers

At each level, find smallest key Ki larger than search-key


Follow the associated pointer (Pi) If no such key found, follow last pointer in node ...until leaf node
17

amiri advanced databases '05

B+ tree built on a nonkey field

B+ tree can be built on a search-key that is not unique

Many records may have the same search-key value

Leaf nodes usually contain record pointers If a search value matches more than one record

Leaf node entry stores a pointer to an indirect block

Block contains pointers to all records with that search-key

amiri advanced databases '05

18

B+ tree operations insert

Insert and delete are efficient but a bit complicated

Because nodes may overflow or underflow Insert data record with search-key k

Ignoring node overflow and underflow

Find leaf node where k would appear If search-key k found


Add data record to file, create indirect block if there isn't one Add record pointer to indirect block Add data record to file Insert (k, data pointer) in leaf node Such that all search keys in leaf node remain in order
19

If search-key k not found


amiri advanced databases '05

B+ tree operations delete

Ignoring node overflow and underflow

Delete data record with search-key k


Find leaf node with search-key k Find data record pointer, delete data record from file Remove (k, record pointer) entry from leaf node if there is no indirect block associated with that entry

Or if indirect block becomes empty as a result of the deletion

amiri advanced databases '05

20

B+ insert

Four cases

1. Simple case: There is space in leaf node 2. Leaf node overflow 3. Internal node overflow 4. New root

amiri advanced databases '05

21

B+ tree insert case 1


Insert42
100

Treeoforder=3 k>=100
120

k<100
30

11

30 35 42

100 101 110

120 130

Datapointers

amiri advanced databases '05

22

B+ tree insert case 2


Insert9
100

Treeoforder=3 k>=100
120

k<100
30

11

30 35 42

100 101 110

120 130

Datapointers

amiri advanced databases '05

23

B+ tree insert case 2 (p2)


Insert9
100

30

120

11

30 35

amiri advanced databases '05

24

B+ tree insert case 3


Insert165
100

30

120

150

100 101 110

120 130

150 156 179

amiri advanced databases '05

25

B+ tree insert case 3 (p2)


Insert165
100

30

120

150

100 101 110

120 130

150 156

165 179

amiri advanced databases '05

26

B+ tree insert case 3 (p3)


Insert165
100

30

120

165

100 101 110

120 130

150 156

165 179

amiri advanced databases '05

27

B+ tree insert case 3 (p4)


Insert165
100 150

30

120

165

100 101 110

120 130

150 156

165 179

amiri advanced databases '05

28

B+ tree insert case 4 (new root)


Insert170
120 150

Rootnode

100 101 110

120 130

150 156 179

amiri advanced databases '05

29

B+ tree insert case 4 (p2)


Insert170
120 150

100 101 110

120 130

150 156

170 179

amiri advanced databases '05

30

B+ tree insert case 4 (p3)


Insert170
120 170

100 101 110

120 130

150 156

170 179

amiri advanced databases '05

31

B+ tree insert case 4 (p4)


Insert170
120 150

Newrootnode
170

100 101 110

120 130

150 156

170 179

amiri advanced databases '05

32

B+ tree delete
Delete110
120 150

170

100 101 110

120 130

150 156

170 179

amiri advanced databases '05

33

B+ tree delete
Delete130
120 150

170

100 101 110

120 130

150 156

170 179

amiri advanced databases '05

34

B+ tree operations delete

Simple case

Deletion does not cause underflow at leaf Redistribute keys


Underflow case Key redistribution

Borrow one key from adjacent node Redistribute evenly between adjacent nodes

Update parent nodes

amiri advanced databases '05

35

B+ tree operations delete

Underflow case Coalescing nodes


Coalesce with sibling node Update pointers at parent node Parent node may underflow

Recursively apply deletion procedure up the tree

amiri advanced databases '05

36

Index summary

B+tree, a fast and efficient multi-level index


Dynamic balanced data structure Efficient insert and delete

At the expense of some space overhead

B+tree supports equality and range searching Dynamic schemes that grow/shrink with data file Support equality search But no support for range queries

Dynamic hashing schemes


B+ trees widely implemented in commercial DBMSs


37

amiri advanced databases '05

Indices in SQL

Index optimisation not a trivial task


Which field(s) to create indices on? In principle, the DBMS should automatically figure this out

Based on cost of index maintenance And the benefit to query workload Although some DBMSs provide tools to assist

Not quite automatic today

E.g. index wizard

amiri advanced databases '05

38

Indices in SQL

Index creation via SQL DDL

create index <index-name> on <table-name> (<attribute list>) create index c-index on account (cust-city)

If we want to declare that search-key is a candidate key

create unique index <index-name> on <table-name> (<attribute list>)

Index creation fails if table contains duplicates values for the search-key Once index is created, insertion of duplicate values for that field are rejected

amiri advanced databases '05

39

You might also like