B Trees

B+ Tree
What is a B+ Tree
A B+ tree is a balanced tree in which every path from

the root of the tree to a leaf is of the same length.
Each non-leaf node of the tree has between [n/2] and
[n] children, where n is fixed for a particular tree.
It contains index pages and data pages.
The internal nodes store key values that serve as
place markers to guide the search for a record in a leaf
node.
Leaf nodes can store entire records. When a B+-tree is
used as an index file they store keys and pointers to
records.
Contd
In practice, B-trees are almost never used. Instead a

variant called the B+-tree is commonly implemented.
The main difference with B-trees is that records are
stored only in the leaf nodes in a B+-tree.
Depending on the size of a record compared to a key,
the leaf nodes may store more or less than m records.
The only requirement is that the leaf nodes store
enough records to remain at least half full.
Example B+-tree
9
5
1 3
16 30
5 6
index node
leaf/data node
16 17
30 40
+
B
Trees
Dictionary pairs are in leaves only.

Leaves form a doubly /singly linked list.
Remaining nodes have following structure:
j a0 k 1 a1 k 2 a2 k j aj
j = number of keys in node.
ai is a pointer to a subtree.
ki <= smallest key in subtree ai and > largest in ai-1.
B+ Tree Example with 4 keys
B+ Tree - Example
Question: Is this a valid B+ Tree?

C
Answer
Both tree in slide 5 and slide 6 are valid;

How you store data in B+ Tree depend on
your algorithm when it is implemented.
As long as the number of data in each leaf
are balanced, it doesnt matter how many
data you store in the leaves.
For example: in the previous question,
the n can be 3 or 4, but can not be 5 or more than
5.
Benefits of B+ Tree
Every data structure has its benefit to solve a
particular problem over other data structures.
The two main benefits of B+ tree are:
Based on its definition, it is easy to maintain
its balance.
For example:
Do you have to check your B+ trees balance after

you edit it?
No, because all B+ trees are inherently balanced,
which make it easy for us to manipulate the data.
Cont
The searching time in a B+ tree is much

shorter than most of other kinds of trees.
Range queries can be searched faster.
For example:
To search a data in one million key-values, a
balanced binary requires about 20 block reads,
in contrast only 4 block reads as required in B+
Tree.
B+-tree overview
B-tree structure
of guide keys
Linked-list structure
of records
Properties
The leaf nodes of a B+-tree form a linked-list of nodes.
The interior nodes of the B+-tree are treated as in a Btree structure to search for a certain key range.
Searching
All the keys in the left child of a key have keys less
than that key, and all the keys in the right child are
greater than, or equal to the key.
Search the same way as done for B-trees. The
internal node keys guide the search to the record in a
leaf node.
The leaf nodes in a B+-tree are linked together in a
linked-list structure to facilitate range queries as
discussed earlier.
The key at the start of the range is searched first.
Subsequently, the leaf node linked-list structure is
traversed until the key at the end of the range is
found.
Searching
Since no structure change in a B+ tree during a

searching process, so just compare the key value
with the data in the tree, then give the result back.
For example: find the value 45, and 15 in below tree
For the value of 45, not found and for 15, return the position
where the pointer located.
B+ Tree Insertion
Insert at bottom level

If leaf page can accommodate key simply insert it.
When insert node is overfull, check adjacent sibling.
If leaf page overflows, split page and copy middle
element to next index page
When a leaf node is split during insertion, unlike the
procedure for B-trees, a copy of the key of the middle
key is promoted. Both middle keys are kept in the
leaf nodes, since they contain the actual records that
are stored in the tree.
The insertion of the promoted node into the tree is
performed in the same way as insertion into a B-tree.
Insertion into a B+-tree

Insertions into B+-trees are done the same way as B-trees.
B+-tree:
10
12
23
33
48
33
Insert 50:
10
12
23
33
Several further insertions
10
12
15
18
20
48
18
21
23
31
50
33
48
33
45
47
48
50
52
Insertion Example #1
Since insert a value into a B+ tree may cause

the tree unbalance, so rearrange the tree if
needed.
Insert 28 into the below tree.
25 28 30
Insertion
Result:
Insertion
Insert 70 into below tree
Insertion
Process: split the tree
50
50 55
55
60 65
60
70
65 70
Overflow
Insertion
Result: chose the middle key 60, and place it

in the index page between 50 and 75.
Summary of insertion in B+ Tree

Data
Page Full
Index Page
Full
Action
NO
NO
Place the record in sorted position in the appropriate leaf page
YES
NO
YES
YES
Split the leaf page

Place Middle Key in the index page in sorted order.
Left leaf page contains records with keys below the middle key.
Right leaf page contains records with keys equal to or greater than
the middle key.
Split the leaf page.
Records with keys < middle key go to the left leaf page.
Records with keys >= middle key go to the right leaf page.
Split the index page.
Keys < middle key go to the left index page.
Keys > middle key go to the right index page.
The middle key goes to the next (higher level) index.
IF the next level index page is full, continue splitting the index
pages.
Insertion
Insert a key value 95 to the below tree.
Overflow
.
75 80 85 90 95
25 50 60 75 85
75 80 85 90 95
25 50
60 75 85
Insertion
Insert Example #2
9
5
1 2
16 30
5 6
16 17 30 40
Insert a pair with key = 3.

New pair goes into a 3-node.
Insert Into A 3-node
Insert new pair so that the keys are in

ascending order.
123
Split into two nodes.

1
23
Insert smallest key in new node and pointer

to this new node into parent.
2
1
23
Insert
9
5
16 30
2 3
5 6
16 17 30 40
Insert an index entry 2 plus a pointer into parent.
Insert
9
17
2 5
2 3
16 30
5 6
16
17 18
30 40
Now, insert a pair with key = 18.
Insert an index entry17 plus a pointer into parent.
Insert
17
2 5
2 3
16
5 6
30
16
17 18
30 40
Insert an index entry17 plus a pointer into parent.
Insert
9 17
2 5
2 3
16
5 6
30
16
17 18
30 40
B+ Tree Deletion
Delete key and data from leaf page

If leaf page underflows, merge with
sibling and delete key in between them
If index page underflows, merge with
sibling and move down key in between
them
Deletion Example #1
Same as insertion, the tree has to be rebuild if the

deletion result violate the rule of B+ tree.
Delete 70 from the tree
This is
OK.
60
65
Deletion
Result:
Deletion
Delete 25 from below tree, but 25 appears in the

index page.
But
This is
OK.
28 30
Deletion
Result: replace 28 in the index page.
Add 28
Deletion
Delete 60 from the below tree
65
50 55
65
Underflow
Deletion
Result: delete 60 from the index page and

combine the rest of index pages.
Delete algorithm for B+ trees

Data Page Below Fill
Factor
Index Page Below Fill

Factor
Action
NO
NO
Delete the record from the leaf page. Arrange

keys in ascending order to fill void. If the key
of the deleted record appears in the index
page, use the next key to replace it.
YES
NO
Combine the leaf page and its sibling. Change

the index page to reflect the change.
YES
YES
Combine the leaf page and its sibling.

Adjust the index page to reflect the
change.
Combine the index page with its sibling.
Continue combining index pages until
you reach a page with the correct fill
factor or you reach the root page.
Delete Example #2
9
2 5
2 3
16 30
5 6
16 17 30 40
Delete pair with key = 16.
Note: delete pair is always in a leaf.
Delete
9
2 5
2 3
16 30
5 6
17

Note: delete pair is always in a leaf.
30 40
Delete
9
2 5
2 3
16 30
5 6
17
30 40

Get >= 1 from adjacent sibling and update parent key.
Delete
9
3 5
16 30
5 6
17
30 40

Get >= 1 from sibling and update parent key.
Delete
9
3 5
16 30
5 6
17
30 40

Merge with sibling, delete in-between key in parent.
Delete
9
16 30
5 6
17
30 40
Get >= 1 from sibling and update parent key.
Delete
9
16 30
6
17
30 40
Delete
9
30
17
30 40
Delete
9
16 30
6
17
30 40
Delete
9
16 30
5
17
30 40
Index node becomes deficient.
Get >= 1 from sibling, move last one to parent, get

parent key.
Delete
16
30
17
30 40
Delete 9.
Delete
16
30
5
17
30 40

Merge with sibling and in-between key in parent.
Delete
16 30
17
30 40

Its the root; discard.
Primary vs. secondary storage

Computer storage is classified into primary (main) memory and
secondary (peripheral) storage
Primary storage is usually RAM; ~10ns access time
Secondary storage is usually hard disk drives; ~10ms access
time
Secondary storage is a lot cheaper than primary, 3/Mb
vs.$1/Mb.
Secondary storage is persistent.
Databases are stored in secondary memory, and large databases
are manipulated in secondary storage.
We need to minimise disk accesses when accessing and
manipulating databases.
Typical Hard Disk Drive Layout
Organization of data on disk drives

A file on disk logically appears as a contiguous series of
bytes.
Physically, the file is usually fragmented throughout various
locations on the disk.
A collection of tracks (5 to 10) form a cylinder
Tracks are split into sectors (10 to 30), which are usually the
smallest unit that data can be read/written to a disk.
A filesystem breaks a file into blocks which consist of one of
more sectors. A block (512 to 4096 bytes) is the smallest unit
of allocation for a file.
Disk drive access times

There are 3 stages in reading/writing to a disk drive:
Seek time - head movement to the correct track;
~6ms.
Rotational latency - wait for the desired sector to
arrive under the head; ~4ms.
Data transfer - relatively small.
Therefore, reading from physically adjacent blocks
(called an extent) is relatively cheap.
Index files
What file structures can we use to organise a large collection
of records stored in secondary storage?
Hash tables provide outstanding performance for searches of the form
find a record with the key value k
They are not suited for searches such as find all records whose key
value has the same property or lies within a given range.
There may be several keys for a record:
The primary key is a unique identifier for a record. Ex., student
number
A secondary key is some other record field which which does not
necessarily have unique values for each record. Ex., date of birth or
G.P.A.
Most searches are done using secondary keys. Ex., find all students
with a G.P.A. greater than 3.6.
Index files facilitate such search requirements

Student #
Rec #
GPA
Student #
Name
Addr
...
Primary key
index
Secondary key
index
Main database
There is one primary index file and several secondary index files in
a database.
Sorted linear lists may be used for index files, but they are
unsuitable for databases with frequent updates. [Why?]
Tree indexing solves this problem.
B+-trees are the standard file organisation for application requiring
insertion, deletion and key range searches.
B+-trees are a variant of B-trees (described next).
Performance analysis (cont...)

A three level B+-tree must have at least 1984 records (2 second level
nodes with 32 children, each containing 31 records), and can have up
to 246,078 records (63 second level nodes with 63 full children each).
A b+-tree of height 3 would be sufficient to store all the student
records at IIT!
A search for a student record would require at most four disk
accesses. Three accesses to find the record in the index file, and one
access to read the record from the main database.
Unlike text files, binary files in C can be accessed randomly using a
combination of fread(), fwrite(), and fseek().

B Trees

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B Trees

Uploaded by

Copyright:

Available Formats

B+ Tree

A B+ tree is a balanced tree in which every path from

In practice, B-trees are almost never used. Instead a

Dictionary pairs are in leaves only.

B+ Tree Example with 4 keys

Question: Is this a valid B+ Tree?

Both tree in slide 5 and slide 6 are valid;

Do you have to check your B+ trees balance after

The searching time in a B+ tree is much

Since no structure change in a B+ tree during a

Insert at bottom level

Insertion into a B+-tree

Several further insertions

Since insert a value into a B+ tree may cause

Insert 70 into below tree

Process: split the tree

Result: chose the middle key 60, and place it

Summary of insertion in B+ Tree

Place the record in sorted position in the appropriate leaf page

Split the leaf page

Insert a key value 95 to the below tree.

Insert a pair with key = 3.

Insert Into A 3-node

Insert new pair so that the keys are in

Split into two nodes.

Insert smallest key in new node and pointer

Insert an index entry 2 plus a pointer into parent.

Now, insert a pair with key = 18.

Insert an index entry17 plus a pointer into parent.

Now, insert a pair with key = 18.

Insert an index entry17 plus a pointer into parent.

Now, insert a pair with key = 7.

Delete key and data from leaf page

Same as insertion, the tree has to be rebuild if the

Delete 25 from below tree, but 25 appears in the

Result: replace 28 in the index page.

Delete 60 from the below tree

Result: delete 60 from the index page and

Delete algorithm for B+ trees

Index Page Below Fill

Delete the record from the leaf page. Arrange

Combine the leaf page and its sibling. Change

Combine the leaf page and its sibling.

Delete pair with key = 16.

Note: delete pair is always in a leaf.

Delete pair with key = 16.

Delete pair with key = 1.

Delete pair with key = 1.

Delete pair with key = 2.

Delete pair with key = 3.

Get >= 1 from sibling and update parent key.

Delete pair with key = 9.

Merge with sibling, delete in-between key in parent.

Delete pair with key = 6.

Merge with sibling, delete in-between key in parent.

Index node becomes deficient.

Get >= 1 from sibling, move last one to parent, get

Merge with sibling, delete in-between key in parent.

Index node becomes deficient.

Index node becomes deficient.

Primary vs. secondary storage

Typical Hard Disk Drive Layout

Organization of data on disk drives