You are on page 1of 28

B-Trees

Why B-Trees?

 Trees studied so far are for storing data in


memory

 B-Trees are better suited for storing data


in memory AND on secondary storage.

 Better suited for balancing data than some


other three ADTs.
The Problem With Unbalanced Trees

1 The levels are sparsely


filled resulting in deep
2 paths. This defeats the
purpose of binary trees

5
Possible Solutions To Unbalanced
Trees
 Periodically balance the tree

Don’t let a tree get too unbalanced


when inserting or deleting
AVL Trees: Sometimes called HB[1] trees.
Invented by Adel’son-Vel’skii and Landis
~early 1960s.

B-Trees: Proposed by R. Bayer & E.M. Creight


What Is A B-Tree?
 It is a type of “multiway” tree.

 It is NOT a binary search tree, nor is it a


binary tree.

 It provides a fast way to index into a multi-


level set of nodes.

 Each node in the B-Tree contains a sorted


array of key values.
Motivation For Multiway Tree
 Secondary storage (e.g., disks) is typically divided into equal-
sized blocks (e.g., 512, 1024, …, 4096, …)

 The basic I/O operation reads and writes blocks rather than
single bytes at a time between secondary storage and memory.

 Goal is to devise a multiway search tree that will minimize file


access by exploiting disk reads.

 Each access to secondary storage is approximately equal to


250K instructions … depending on the speed of the CPU
Multiway Search Tree (order m)

 A generalization of a binary search trees.

 Each node has at most m children.


If k<=m is the number of children, then the node
has exactly k-1 keys.

The tree is ordered.


Multiway Search Tree (cont.)

Nodes in
k1 k2 k3 k4 k5
a multiway
tree

k2 < keys < k3 k5 < keys


keys < k1
Definition Of A B-Tree
 A B-Tree of order m is a m-way tree such that

 All leaves are on the same level

 All internal nodes except the root node are constrained to


have at most m non-empty children and at least m/2 non-
empty children

 The root node has at most m non-empty children

 A leaf node must contain atleast ((m/2) – 1) keys


Three Important Properties Of B-Trees
 All nodes in the B-Tree are at least half-full
(root node is an exception at times)

 The B-tree is always balanced. That is, an


identical number of nodes must be read into
memory in order to locate all keys at any
given level in the tree.

 A well organized B-Tree will have just a small


number of levels relative to the number of
nodes.
Where are B-Tree Used?
 B-Trees are commonly found in database and
file systems.

 B-Trees allow logarithmic time insertions and


deletions.

 They generally grow from the bottom


upwards as elements are inserted, whereas
most binary trees grow downward.
The Six Rules Governing B-Trees
 R1: A B-Tree might be empty, if not, then
each node has some specified MINIMUM
number of entries in each node.

 R2: The MAXIMUM number of entries is twice


the MINIMUM.
The Six Rules Governing B-Trees (cont)
 R3: The entries of each B-Tree node are
stored in a partially filled array, sorted from
the smallest entry (at index 0) to the largest
entry (at the final position of the array).

B-Tree node The data in such


an array can be
stored in a block
on a disk

* B-Trees can
h k k* n . . . . support duplicate
keys
0 n-1
The Six Rules Governing B-Trees (cont)
 R4: The number of subtrees below a non-leaf node is
always one more than the number of entries in the
node.
0 1 2 3

4 entries in a 45 55 67 82
non-leaf node

Keys > 82
Keys < 45
subtree 4
Keys > 45 Keys > 67
subtree 0 & < 55 & < 82 subtree 3
Keys > 55
subtree 1
& < 67
5 subtrees subtree 2
The Six Rules Governing B-Trees (cont)
 R5: For any non-leaf node:
An entry at index i is greater than all the
entries in subtree i of the node, and
An entry at index i is less than all the entries
at entry i+1 of the node.

 R6: Every leaf node in a B-Tree has the same


depth (i.e., at the same level)
Example B-Tree
MIN = 1
30 80 MAX = 2

20 50 60 90

10 25 35 40 55 72 82 85 95
Searching For A Target In B-Trees
 Start with root node and search for target in the array
at that node. If found, then done and return success.
 If the target is not in the root and there are no
children, then also done, but return failure.
 If the target is not in the root node, and there are
children, then if the target exists, then it can only be
in one subtree.
 Compare the target with the listed keys and traverse
first subtree i for which target is < key_array[i]
… while search key_array from left to right … up to
data_count.
Repeat the process at the new root node
Inserting Into A B-Tree

Add the new key


to the appropriate leaf
node

Overflow?

Yes No

Split the node into two nodes


on the same level, and promote
the median key
Example
6 17 MIN = 1
MAX = 2

12 19 22
4

Insert 18
6 | 17 Excess Entry
(problem child)

12 18 | 19 | 22
4
Contnd.
MIN = 1 Split problem child, and
6, 17, 19 promote middle key to
MAX = 2
parent node. Still have
excess.

18 22
4 12

17 Fix excess by repeating the


process. Split node and promote
middle key to new root node.
6 19

18 22
4 12
Insert In Class Exercise

MIN = 1
 Insert 5, then insert 7 and MAX = 2
15.

6 17

12 19 22
4
Deleting From A B-Tree
Deletion (cont.)
 Case 1: The key is in a leaf , which has more than
the minimum number of keys. If subset[i] has extra
entries, then just delete the data
 Delete 21 MIN = 2
MAX = 4

6, 17 6, 17

2, 4 10,13 19, 21, 22 2, 4 10, 13 19, 22


Deletion
 Case 2: Key is in a leaf which has just the minimum number of
keys. If subset[i-1] has extra entries, then transfer the entry to
subset[i]
 Delete 22
MIN = 2
MAX = 4

6, 17 6, 15

2, 4 10, 12, 15 19,22 2, 4 10, 12 17, 19


Deletion (cont.)
 Case 3: If subset[i+1] has extra entries, then
transfer the entry to subset[i] (Similar to Case 2)
 Delete 13
MIN = 2
MAX = 4

6, 17 6, 19

2, 4 10,13 19, 21, 22 2, 4 10, 17 21, 22


Deletion (cont.)
 Case 4: The key is in a leaf and the leaf and its
siblings have just the minimum number of
keys. Combine subset[i] with subset[i-1]
Delete 22

6, 17 6

2, 4 10, 12 19,22 2, 4 10, 12, 17, 19


Deletion (cont.)
 Case 5 : key is in an internal node. Child node that has the successor of the
key is located and if this node has more entries, then the key to be deleted is
replaced by the successor and that value in the leaf is deleted.

Delete 95

62

25, 45 85,95

15,20 50, 54 97,100,150


75, 80
90 , 92
30,40
Contd. Case 5

62

25, 45 85,97

15,20 50, 54 100,150


75, 80

90 , 92
30,40

You might also like