You are on page 1of 6

A simple Binary Search Tree written in

C#
Herbert Sauro, 18 Aug 2008 BSD

334.1K

12.3K

138

4.80 (34 votes)

Rate:vote 1vote 2vote 3vote 4vote 5


A simple Binary Search Tree written in C# that can be used to store and retrieve
large amounts of data quickly.
Download demo and source files - 27.1 KB

Introduction
In Computer Science, a binary tree is a hierarchical structure of nodes, each node
referencing at most to two child nodes. Every binary tree has a root from which the
first two child nodes originate. If a node has no children, then such nodes are usually
termed leaves, and mark the extent of the tree structure.
A particular kind of binary tree, called the binary search tree, is very useful for
storing data for rapid access, storage, and deletion. Data in a binary search tree are
stored in tree nodes, and must have associated with them an ordinal value or key;
these keys are used to structure the tree such that the value of a left child node is
less than that of the parent node, and the value of a right child node is greater than
that of the parent node. Sometimes, the key and datum are one and the same.
Typical key values include simple integers or strings, the actual data for the key will
depend on the application. In this article, I describe a binary search tree that stores
string/double pairs. That is, the key is the string value and the data associated with
the key is a double value. Developers can search the tree using string values.

Background
There are a number of basic operations one can apply to a binary search tree, the
most obvious include, insertion, searching, and deletion.
To insert a new node into a tree, the following method can be used. We first start at
the root of the tree, and compare the ordinal value of the root to the ordinal value of
the node to be inserted. If the ordinal values are identical, then we have a duplicate
and we return to the caller indicating so. If the ordinal value is less than the root,
then we follow the left branch of the root, else we follow the right branch. We now
start the comparison again but at the branch we took, comparing the ordinal value of
the child with the node to be inserted. Traversal of the tree continues in this manner
until we reach a left or right node which is empty and we can go no further. At this

point, we insert the new node into this empty location. Note that new nodes are
always inserted as leaves into the tree, and strictly speaking, nodes are thus
appended rather than inserted.
Searching a binary search tree is almost identical to inserting a new node except that
we stop the traversal when we find the node we're looking for (during an insertion,
this would indicate a duplicate node in the tree). If the node is not located, then we
report this to the caller.
Both insertion and searching are naturally recursive and are, arguably, easier to
understand when considered in terms of their unit operation. A basic recursive
search algorithm will look like:
Hide Copy Code

node search (node, key) {


if node is null then return null;
if node.key = key then
return node
if key < node then
return search (node.left, key);
else
return search (node.right, key);

In the source code provided with this article, insertion is implemented recursively,
while searching uses an iterative approach.
Deletion is a little bit more complicated but boils down to three rules. The three rules
refer to deleting nodes without any children, nodes with one child, and nodes with
two children. If a node has no children, then the node is simply deleted. If the node
has one child, then the node is deleted and the child node is brought forward to link
to the parent. The complication occurs when a node has two children. However, even
here, the rules are straightforward when stated. To delete a node with two children,
the next ordinal node (called the successive node) on the right branch is used to
replaced the deleted node. The successive node is then deleted. The successive
node will always be the left most node on the right branch (likewise, the predecessor
node will be the right most node on the left branch). The figure below illustrates the
deletion rules.

A common alternative to using binary search tree is to use Hash tables. Hash tables
have better search and insertion performance metrics. In theory, the time it takes to
insert or search for an item in a Hash table is independent of the number of data
items stored. In contrast, a binary search tree scales with log (N) where N is the
number of data items (still far better than a linear search). The .NET libraries contain
explicit support for Hash tables.

Balanced Trees
The time taken to insert or search for a specific item in a tree will be affected by a
tree's depth. Deep trees take longer to search, and the insertion order into a tree can
affect a tree's shape. A random insertion order will generally produce a more bushy
and hence shallower tree compared to an ordered insert. Bushy trees are often called
balanced trees, and although not implemented here, balancing a tree is a highly
desirable feature for a binary search tree implementation. Certain algorithms such as
the red-black tree will auto-balance as the tree is constructed (see Red/Black tree
animation). The figure below shows three trees generated by three identical data
sets but inserted in a different order. The first is the most balanced and hence the
most shallow of the three examples.

Implementing the search and insertion methods using a recursive approach has the
potential to yield poor performance, particularly when the trees are unbalanced.

Using the Code


Using the source code provided with this article is very easy. The following code
illustrates the instantiation of a new binary tree, the insertion of data into the tree,
and subsequent retrieval. The method insert() is used to insert new data, and the
method findSymbol() is used to locate and retrieve data. If findSymbol() fails to
locate the data item, it returns null. When successful, findSymbol() returns
a TTreeNode object which has two properties, name and value. The following code
illustrates how to use the binary search tree. The class name of the binary tree
is TBinarySTree, and the individual nodes have class type TTreeNode.
Hide Copy Code

// Create a new binary tree


bt = new TBinarySTree();
// Insert data
bt.insert ("Canis Minoris", 5.37);
bt.insert ("Gamma Cancri", 4.66);
bt.insert ("Phi Centauri", 3.83);
bt.insert ("Kappa Tauri", 4.21);
// Retrieve data
TTreeNode symbol = bt.findSymbol ("Phi Centauri");

if (symbol != null)
Console.WriteLine ("Star {1} has magnitude = {0}", symbol.name, symbol.value);

Other methods of interest include:


Hide Copy Code

bt.clear();
bt.count();

count returns the number of nodes in the tree.


Hide Copy Code

bt.delete (key);

delete will delete the node with the given key. If the method fails to locate the node,
the method throws a simple exception.
The source code is licensed under the BSD license. The source should compile on C#
2.0.
To use the source code, unpack the source, load the binary tree solution
(binaryTree.sln) and compile the BinaryTree project. In your own project, include as a
reference to BinaryTree.dll. This version was written using Visual Studio 2003.

Points of Interest
The following graphs compare the performance of the Binary Tree search with .NET's
built-in Hashtable class. The implementations follow the expected scaling laws as
the number of stored data items increase. The X axis indicates the number of data
items stored, ranging from 1000 to 1,000,000 items (21 intervals in all). The Y axis
indicates the average time required to retrieve one data item (averaged over 20,000
retrieval attempts). TheHashtable follows roughly O(1), that is the time taken to
retrieve data is independent of the number of data items stored. In contrast, the
binary search tree scales roughly O(log (N)). However, this is far better than a linear
search which would scale as O(N); that is, doubling the number of stored items
doubles the average time taken to retrieve a single data item. The graph on the left
shows the data plotted on log axes.

Times were computed using the QueryPerformanceCounter() method. The code for
timing was derived fromTobi+C#=T#.

Other Possibilities
One should consider the implementation outlined here as the minimum practical
implementation. The project where this implementation originated did not require
any further sophistication. However, there are a number of areas where it could be
significantly improved. In particular, two areas warrant further work:
1.

The current implementation is specific to storing name/value pairs. Ideally,


one would prefer a more generic implementation where a developer could employ
their own object type.
2.
The implementation may suffer performance degradation when subjected to
large data sets if the trees become significantly unbalanced. Ideally one could
implement the Red/Black variant to avoid this issue (see reference 1. at the end of
the article for details).
Other minor changes include using a property in place of the public method count,
adding further utility methods, and changing the naming convention on the classes
and methods to make them more consistent with .NET.

You might also like