You are on page 1of 143

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=cover.

htm[09/05/2014 9:51:55 AM]


Chapter 1
Introducing Algorithms and Data Structures

Computer science is a field of study that solves a variety of problems by using computers. The problem to be
solved could be as simple as performing the addition of two numbers, or it can be as complex as designing a
robot capable of making decisions in a real-time environment. To solve a given problem by using computers, you
need to design an algorithm. The nature of an algorithm often depends closely on the nature of the data on which
the algorithm works. Therefore, the study of algorithms also involves the study of the data structures that the
algorithms work on.

This chapter discusses the role of algorithms and data structures in problem solving through computers. It also
discusses some standard techniques that can be used to design algorithms. In addition, it explains the effect of the
selected algorithm on the efficiency of the solution.

Objectives

In this chapter, you will learn to:

Explore the role of algorithms and data structures in problem solving


Design algorithms and measure their efficiency

Explore the Role of Algorithms and Data Structures in Problem Solving


Problem solving is an essential part of every scientific discipline. In todays world, computers are widely used to
solve problems pertaining to various domains, such as banking, commerce, medicine, manufacturing, and
transport.

To solve a given problem by using a computer, you need to write a program. A program consists of two
components, algorithm and data structure.

Different algorithms can be used to solve the same problem. Similarly, different types of data structures can be
used to represent a problem in a computer.

To solve the problem in an efficient manner, you need to select a combination of algorithms and data structures
that provide maximum efficiency.

Role of Algorithms

The word, algorithm, is derived from the name of the Persian mathematician, Al Khwarizmi.

An algorithm can be defined as a step-by-step procedure for solving a problem. It helps the user to get the correct
result with a finite number of steps. Consider the following step-by-step procedure to display the first 10 natural
numbers:

1. Set the value of counter to 1.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


2. Display counter .
3. Increment counter by 1.
4. If counter <= 10, go to step 2.

The preceding step-by-step procedure is an algorithm because it produces the correct result with a finite number
of steps.

An algorithm has the following five important properties:

Finiteness: It terminates after a finite number of steps.


Definiteness: Each step in an algorithm is unambiguous. This means that the action specified by the
step cannot be interpreted in multiple ways and can be performed without any confusion.
Input: It accepts zero or more inputs.
Output: It produces at least one output.
Effectiveness: It consists of basic instructions that are attainable. This means that the instructions can be
performed by using the given inputs in a definite amount of time.

A problem can be solved by using a computer only if an algorithm can be written for it. In addition, the use of
algorithms provides the following benefits:

While writing an algorithm, you identify the step-by-step procedure, the major decision points, and the
variables necessary to solve the problem. This helps you in the development of the corresponding
algorithm.
The procedure identification and decision points break the problem into a series of smaller problems of
more manageable size. Therefore, the problems that will be difficult or impossible to solve as a whole
can be approached as a series of sub problems that are small and solvable.
With the use of an algorithm, the same specified steps are used for performing the task. This makes the
process more consistent and reliable.
With the use of a consistent process for problem solving, decision making becomes a more rational
process, which is not affected by human biases and misjudgments.

Role of Data Structures

Multiple algorithms can be designed to solve a particular problem. However, the algorithms may differ in the
extent of efficiency to which they can solve the problem. In such a situation, an algorithm that provides
maximum efficiency should be used for solving the problem. Here, efficiency means that the algorithm should
work in minimal time and use minimal memory.

One of the basic techniques for improving the efficiency of algorithms is to structure the data that they operate on
in such a way that the resulting operations can be efficiently performed.

The way in which the various data elements are organized in memory, with respect to each other, is called a data
structure.

Data can be organized in many different ways. Therefore, you can create as many data structures as you want.
However, there are some standard data structures that have proved useful over the years. These include arrays,
linked lists, stacks, queues, and trees. You will learn more about these data structures in the subsequent chapters.

All these data structures are designed to hold a collection of data items. However, the difference lies in the way

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


in which the data items are arranged with respect to each other and the operations that they allow. As the data
items are arranged in different ways, some data structures prove to be more efficient than others to solve a given
problem.

Suppose you have to write an algorithm that enables a printer to serve the requests of multiple users on a first-
come-first-served basis. In this case, using a data structure that stores and retrieves the requests in the order of
their arrival would be more efficient than a data structure that stores and retrieves the requests in a random order.

In addition, to improve the efficiency of an algorithm, the use of appropriate data structures is required. It also
allows you to overcome the following programming challenges:

Simplifying complex problems


Creating standard and reusable code components
Creating programs that are easy to understand and maintain

To understand the use of an appropriate data structure, which helps in simplifying the solution to a problem, let
us consider an example where you have to find the maximum value in a set of 50 numbers. In such a case, you
can either use 50 variables or use a data structure, such as an array of size 50, to store the numbers. When 50
different variables are used to store the numbers, the following algorithm can be used to determine the maximum
value among the numbers:

1. Accept 50 numbers and store them in num1, num2, num3, ..., num50 .
2. Set max = num1 .
3. If num2 > max then:
max = num2
4. If num3 > max then:
max = num3
5. If num4 > max then:
max = num4
.
.
6. If num50 > max then:
max = num50
7. Display max .

On the other hand, when an array of size 50 is used, the following algorithm can be used to determine the
maximum value among the elements in an array:

1. Set max = num[0].


2. Repeat step 3 varying i from 1 to 49.
3. If num[i] > max then:
max = num[i]
4. Display max .

From the preceding two algorithms, it can be seen that the algorithm that is using an array manipulates memory
more efficiently than the algorithm that is using 50 variables. In addition, the algorithm using an array involves
few steps, and is therefore, easier to understand and implement as compared to the algorithm that uses 50
variables.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Data structures also enable the creation of reusable code components. Suppose you have created a class to
implement a data structure that stores and retrieves requests in the order of their arrival. Once the class is created,
the same class can be used in several different applications that need to serve the requests of multiple users on a
first-come-first-served basis.

This means that a data structure, once implemented, can be used as a standard component to provide standard
solutions to a specific set of problems. The use of standard components helps to simplify the maintenance
process. This is because the standard components are time-tested, and therefore, do not need much maintenance.

Types of Data Structures

Data structures can be classified under the following two categories:

Static: These are data structures whose size is fixed at compile time and does not grow or shrink at run
time. An example of a static data structure is an array. Suppose you declare an array of size 50, but
store only 5 elements in it. Therefore, the memory space allocated for the remaining 45 elements will be
wasted. Similarly, if you have declared an array of size 50, but later, if you want to store 20 more
elements, you will not be able to store these extra required elements because of the fixed size of an
array.
Dynamic: These are data structures whose size is not fixed at compile time and that can grow and
shrink at run time to make the efficient use of memory. An example of a dynamic data structure will be
a list of items for which memory is not allocated in advance. As and when items are added to the list,
memory is allocated for those elements. Similarly, when items are removed from the list, memory
allocated to those elements is deallocated. Such a list is called a linked list.

Arrays and linked lists are basic data structures that are used to implement other data structures, such as
stacks, queues, and trees.

An array is always a static data structure, and a linked list is always a dynamic data structure. However, the
other data structures can be static or dynamic depending on whether they are implemented by using an array or a
linked list.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Activity 1.1: Problem Solving Using Algorithms

Designing Algorithms and Measuring their Efficiency


Designing an algorithm for a given problem is a difficult intellectual exercise. This is because there is no
systematic method for designing an algorithm. Moreover, there may be more than one algorithm to solve a given
problem. Writing an effective algorithm for a new problem or writing a better algorithm for an already existing
algorithm is art as well as science because it requires both, creativity and insight.

Identifying Techniques for Designing Algorithms

Although there is no systematic method for designing an algorithm, there are some well-known techniques that
have proved to be quite useful in designing algorithms. The following two techniques are commonly used for
designing algorithms:

Divide and conquer approach


Greedy approach

Divide and Conquer Approach

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


The divide and conquer approach is an algorithm design technique that involves breaking down a problem
recursively into sub problems until the sub problems become so small and trivial that they can be easily solved.
The solutions to the sub problems are then combined to give a solution to the original problem.

Divide and conquer is a powerful approach for solving conceptually difficult problems. It simply requires you to
find a way of breaking the problem into sub problems, solving the trivial cases, and combining the solutions to
the sub problems to solve the original problem.

Divide and conquer often provides a natural way to design efficient algorithms.

Consider an example where you have to find the minimum value in a list of numbers. The list of numbers is as
shown in the following figure.

The List of Numbers

To find the minimum value, you can divide the list into two halves, as shown in the following figure.

The List Divided into Two Equal Parts

Again, divide each of the two lists into two halves, as shown in the following figure.

The List Divided into Four Equal Parts

Now, there are only two elements in each list. At this stage, compare the two elements in each list to find the
minimum of the two. The minimum value from each of the four lists is shown in the following figure.

The Minimum Values in the Four Lists

Again, compare the first two minimum values to determine their minimum. Also, compare the last two minimum
values to determine their minimum. The two minimum values thus obtained are shown in the following figure.

The Minimum Values in the Two Halves of the Original List

Again, compare the two final minimum values to obtain the overall minimum value, which is 1 in the preceding
example.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Greedy Approach
The greedy approach is an algorithm design technique that selects the best possible option at a given time.
Algorithms based on the greedy approach are used for solving optimization problems where you need to
maximize profits or minimize costs under a given set of conditions. Some examples of optimization problems are:

Finding the shortest distance from an originating city to a set of destination cities, given the distances
between the pairs of cities.
Finding the minimum number of currency notes required for an amount, where an arbitrary number of
notes for each denomination is available.
Selecting items with a maximum value from a given set of items, where the total weight of the selected
items cannot exceed a given value.

Consider an example where you have to fill a bag of capacity 10 kg by selecting items, (from a set of items)
whose weights and values are given in the following table.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


The Weights and Values of Items

A greedy algorithm acts greedy, and therefore, selects the item with the maximum total value at each stage.
Therefore, first of all, the item, C, with a total value of $800 and weight of 4 kg will be selected. Next, the item,
E, with a total value of $500 and weight of 5 kg will be selected. The next item with the highest value is item, B,
with a total value of $450 and weight of 3 kg. However, if this item is selected, the total weight of the selected
items will be 12 kg (4 + 5 + 3), which is more than the capacity of the bag.

Therefore, we discard the item, B, and search for the item with the next higher value. The item with the next
higher value is item, A, which has a total value of $400 and a total weight of 2 kg. However, this item also
cannot be selected because if it is selected, the total weight of the selected items will be 11 kg (4 + 5 + 2). Now,
there is only one item left, that is, item, D, with a total value of $50 and weight of 1 kg. This item can be selected
as it makes the total weight equal to 10 kg.

The selected items and their total values and weights are listed in the following table.

The Items Selected by Using the Greedy Approach

For most of the problems, the greedy algorithms usually fail to find the globally optimal solution. This is because
they usually do not operate exhaustively on all data. They can make commitments to certain choices too early.
Hence, it prevents them from finding the best overall solution, later.

This can be seen from the preceding example where the use of a greedy algorithm selects the items with a total
value of $1350 only. However, if the items were selected in the sequence depicted by the following table, the
total value would have been greater, with the weight being 10 kg only.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


The Optimal Selection of Items

In the preceding example, you can observe that the greedy approach commits to item, E, very early. This prevents
it from determining the best overall solution, later. Nevertheless, the greedy approach is useful because it is quick
and easy to implement. Moreover, it often gives a good approximation to the optimal value.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Designing Algorithms Using Recursion

Recursion refers to the technique of defining a process in terms of its own. It is used to solve complex
programming problems that are repetitive in nature.

The basic idea behind recursion is to break a problem into smaller versions of its own, and then, build a solution
for the entire problem. This may sound similar to the divide and conquer technique. However, the recursion
technique is different from the divide and conquer technique. Divide and conquer is a theoretical concept that
may be implemented in a computer program with the help of recursion.

Recursion is implemented in a program by using a recursive procedure or function. A recursive procedure is a


function that invokes on its own.

Consider a function f(n), which is the sum of the first n natural numbers. This function can be defined in different
ways.

In mathematics, the function will be defined as:

f(n) = 1 + 2 + 3 + 4 + 5 +...+ n

However, the same function can be defined in a recursive manner as:

f(n) = f(n 1) + n

Where n > 1 and f(1) = 1

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


In this case, the recursive definition of the f(n) function calls the same function, but its arguments are reduced
by one. Recursion will end when n = 1. In that case, f(1) = 1.

To understand this concept, consider a factorial function. A factorial function is defined as:

n! = 1 2 3 4 ... n

This same factorial function can be redefined in a recursive manner as:

n! = (n 1)! n

where n > 1 and 0! = 1

This definition of n! is recursive because it refers to itself when it uses (n 1)! . The value of n! is explicitly
given when n = 0 and the value of n! for arbitrary n is defined in terms of the smaller value of n, which is closer
to the base value, 0.

If you have to calculate 3! by using recursion, you first define 3! in terms of 2! as:

3! = (3 2!)

Now, you will define 2! in terms of 1! as:

3! = (3 (2 1!))

Now, you will define 1! in terms of 0! as:

3! = (3 (2 (1 0!)))

Now, 0! is defined as 1. Therefore, the expression becomes:

3! = (3 (2 (1 1)))

3! = (3 (2 1))

3! = (3 2)

3! = 6

The recursive algorithm for determining the factorial of a number, n, can be written as:

Algorithm: Factorial(n)

1. If n = 0, then: // Terminating condition


a. Return (1) .
2. Return (n Factorial(n 1)) .

Please note that every recursive algorithm should have a terminating condition. Otherwise, the algorithm will
keep on calling itself infinitely.

The main advantage of recursion is that it is useful in writing clear, short, and simple programs. One of the most
common and interesting problems that can be solved by using recursion is the Tower of Hanoi problem.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Tower of Hanoi
Tower of Hanoi is a classical problem, which consists of n different sized disks and three pins over which these
disks can be mounted. All the disks are placed on the first pin with the largest disk at the bottom, and the
remaining disks are placed in decreasing order of their size, as shown in the following figure.

The Tower of Hanoi Problem

The objective of the game is to move all disks from the first pin to the third pin in the least number of moves by
using the second pin as intermediary.

To play this game, you need to follow the following rules:

Only one disk can be moved at a time.


A larger disk cannot be placed over a smaller one.

Let n be the number of the discs. If n = 3, it will require seven moves to transfer all discs from pin one to pin
three, as shown in the following table.

The Sequence of Moves for n = 3

The moves given in the preceding table are illustrated in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


The Moves for Solving the Tower of Hanoi Problem

When n = 2, we first move the top disc from pin 1 to pin 2. Then, move the top disc from pin 1 to pin 3, and
later, move the top disc from pin 2 to pin 3.

The solution for n = 1 will be to move the disc from pin 1 to pin 3.

In general, to move n discs from pin 1 to pin 3 by using pin 2 as intermediary, you first need to move the top n
1 disc(s) from pin 1 to pin 2 by using pin 3 intermediary.

The following algorithm can be used to move the top n discs from the first pin, START, to the final pin, FINISH,
through the temporary pin, TEMP:

1. MOVE (n, START, TEMP, FINISH)


2. When n = 1:
a. MOVE a disc from START to FINISH
b. Return
3. Move the top n 1 discs from START to TEMP using FINISH as an intermediary

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


[MOVE (n 1, START, FINISH, TEMP)]
4. Move the top disc from START to FINISH
5. Move the top n 1 discs from TEMP to FINISH using START as an intermediary
[MOVE (n 1, TEMP, START, FINISH)]

In general, this solution requires 2n 1 moves for n discs.

Determining the Efficiency of an Algorithm

The greatest difficulty in solving programming problems is not how the problem is to be solved; in fact, how
efficiently the problem is to be solved. Factors that affect the efficiency of a program include the speed of the
machine, compiler, operating system, programming language, and size of the input. However, in addition to these
factors, the way in which the program data is organized and the algorithm is used for solving the problem also
has a significant impact on the efficiency of the program.

There can be cases where a number of methods and algorithms are used to solve a problem. In such a situation, it
becomes difficult to decide which algorithm is to be used.

When there are several different ways to organize data and devise algorithms, it becomes important to develop
criteria to recommend a choice. Therefore, you need to study the behavior of algorithms under various conditions
and compare their efficiency.

The efficiency of an algorithm can be computed by determining the amount of resources, it consumes. The

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


primary resources that an algorithm consumes are time and space. Time refers to the CPU time required to
execute the algorithm, and space refers to the amount of memory used by the algorithm for execution. An
algorithm may be extremely time-efficient or space-efficient. This leads to a situation wherein you can reduce
the use of memory at the cost of slower program execution. Or, you can reduce the running time at the cost of
increased memory usage. This situation is known as time/space tradeoff.

An example of a situation where time/space tradeoff can be applied is data storage. If data is stored in a
compressed form, the memory used is less because data compression reduces the amount of space required.
However, it is more time consuming because some additional time is required to run the compression algorithm.
Similarly, if data is stored in uncompressed form, the memory used is more and the running time is less.

Memory is generally perceived to be extensible because you can increase the volume of memory of your
computer. Time, however, is not extensible. Therefore, time considerations generally override memory
considerations.

Method for Determining Efficiency


Although the efficiency of an algorithm depends on how efficiently it uses time and memory space, the scope of
this course is limited to determining only the time efficiency of an algorithm.

To measure the time efficiency of an algorithm, you can write a program based on the algorithm, execute it, and
measure the time it takes to run. The execution time that you measure in this case will depend on the following
factors:

Speed of the machine


Compiler
Operating system
Programming language
Input data

However, to determine how efficiently an algorithm solves a given problem, you need to determine how the
execution time is affected by the nature of the algorithm. Therefore, you need to develop fundamental laws that
determine the efficiency of a program in terms of the nature of the underlying algorithm.

To understand how the nature of an algorithm affects the execution time, consider a simple example. Suppose the
assignment, comparison, write, and increment statements take a, b, c, and d time units to execute, respectively.
Now, consider the following code to display the elements stored in an array:

1. Set I = 0 // 1 assignment
2. While (I < n): // n comparisons
a. Display a[I] // n writes
b. Increment I by 1 // n increments

The execution time required for the preceding algorithm is given by:

T = a + b n + c n + d n

T = a + n (b + c + d)

Here, T is the total running time of the algorithm, which is expressed as a linear function of the number of

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


elements (n) in the array. From the preceding expression, it is clear that T is directly proportional to n.

In fact, the total running time, T, is directly proportional to the number of iterations involved in the algorithm.
The number of iterations can be determined by counting the number of comparisons involved in the algorithm.

In the preceding code, n comparisons are being made. Therefore, the total running time of the algorithm, T, is
directly proportional to n.

As T is directly proportional to n, an increase in the value of n will result in a proportional increase in the value
of T, as shown in the following figure.

The Rate of Change of T with an Increase in the Value of n

Now, consider the following algorithm:

1. Set I = 0. // 1 assignment
2. While (I < n): // n comparisons
a. Set J = 0. // n assignments
b. While (J < n): // n n comparisons
i. Display (a[I][J]) . // n n writes
ii. Increment J by 1. // n n increments
c. Increment I by 1. // n increments

If you count the number of comparisons in the preceding code, it comes out to be (n2 + n), which is a quadratic
function of n. Therefore, the total running time is directly proportional to n2.

Although the number of comparisons is , the value of n is smaller as compared to the value of n 2
(especially when n is very large). Therefore, the value of n can be ignored for finding the approximate running
time.

As the running time is directly proportional to n2, an increase in the value of n will result in a quadratic increase
in the running time. This means that if the value of n is doubled, the running time will increase four times. The
rate of change of T, with an increase in the value of n, is depicted in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


The Rate of Change of T with an Increase in the Value of n

From the preceding discussion, you can conclude that the running time of a program is a function of n, where n is
the size of the input data. The rate, at which the running time of an algorithm increases, as a result of an increase
in the volume of input data, is called the order of growth of the algorithm.

The order of growth of an algorithm is defined by using the big O notation. The big O notation has been accepted
as a fundamental technique for describing the efficiency of an algorithm.

The following table lists some possible orders of growth and their corresponding big O notations.

The Big O Notations

If an algorithm has a linear order of growth, the algorithm is said to be of the order, O (n). Similarly, if an
algorithm has a quadratic order of growth, the algorithm is said to be of the order, O (n 2 ).

Selecting an Efficient Algorithm


Now that you know how the efficiency of a particular algorithm is determined, let us see how this knowledge can
be used to select an efficient algorithm.

According to their orders of growth, the big O notations can be arranged in an increasing order as:

O (1) < O (log n) < O (n) < O (n log n) < O (n 2 ) < O (n 3 ) < O (2 n ) < O (10 n )

Therefore, if a problem can be solved by using algorithms of each of the preceding orders of growth, an

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


algorithm of the order, O (1), will be considered the best, and an algorithm of the order, O (10 n ), will be
considered the worst. The goal of algorithm development should be to make algorithms of the smallest possible
orders.

The following table depicts the orders of growth for the preceding big O notations.

Big O Notation Order of Growth

O (1)

The Order of Growth of an O (1) Algorithm

O (log n)

The Order of Growth of an O (log n) Algorithm

O (n)

The Order of Growth of an O (n) Algorithm

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


O (n log n)

The Order of Growth of an O (n log n) Algorithm

O (n 2 )

The Order of Growth of an O (n 2) Algorithm

O (n 3 )

The Order of Growth of an O (n 3) Algorithm

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


O (2 )

The Order of Growth of an O (2 n) Algorithm

O (10 n )

The Order of Growth of an O (10n) Algorithm

The Orders of Growth

Now, consider that the assignment, comparison, write, and increment statements take the time units, a, b, c, and
d, to execute, respectively. In addition, suppose all arithmetic operations require the e time units to execute. Now,
consider the following two algorithms to find the sum of the first n natural numbers:

Algorithm A

1. Set sum = 0. // 1 assignment


2. Set i = 0. // 1 assignment
3. While (i <= n): // n comparisons
a. Set sum = sum + i. // n arithmetic operations, n assignments
b. Increment i by 1. // n increments
4. Display (sum). // 1 write

Algorithm B

1. Set sum = (n (n + 1))/2. // 3 arithmetic operations, 1 assignment


2. Display (sum). // 1 write

Both, Algorithm A and Algorithm B, perform the same task. This means that both determine the sum of the first n
natural numbers. Algorithm A adds each number iteratively to a variable, sum . However, Algorithm B uses a

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


formula to calculate the sum of the first n natural numbers.

The execution time required for Algorithm A is given by:

T = (n + 2) a + n b + 1 c + n d + n e

T = an + 2n + bn + c + dn + en

T = c + n (a + b + d + e + 2)

As T is a linear function of n, the algorithm is of the order, O (n).

Now, determine the time required to execute Algorithm B:

T = 1 a +1 c + 3 e

T = a + c + 3e

Unlike Algorithm A, the time taken by Algorithm B is constant and does not depend on the value of n. Therefore,
the algorithm is of the order, O (1).

Because Algorithm A is of the order, O (n), the execution time of Algorithm A increases linearly with the value
of n. However, Algorithm B is of the order, O (1). Therefore, the execution time of Algorithm B is constant. This
means that an increase in the value of n does not have any impact on the execution time of the algorithm.
Therefore, no matter how large the problem is, Algorithm A solves it in the same amount of time.

Suppose for n = 10, both, Algorithm A and Algorithm B, take 10 nanoseconds (ns) to execute. However,
when n is increased to 100 , Algorithm A will take 100 ns to execute, but Algorithm B will take only 10 ns to
execute. Similarly, when n is increased to 1000 , Algorithm A will take 1000 ns to execute, but Algorithm B will
take only 10 ns.

This means that when the problem is huge, Algorithm B will prove to be more efficient than Algorithm A.

Best, Worst, and Average Case Efficiency


Suppose you have a list of names in which you have to search for a particular name. You have designed an
algorithm that searches the name in the list of n elements by comparing the name to be searched with each
element in the list, sequentially.

The best case, in this scenario, will be if the first element in the list matches the name to be searched. The
efficiency in that case will be expressed as O (1) because only one comparison was made.

Similarly, the worst case, in this scenario, will be if the complete list is traversed and the element is found at the
end of the list or is not found at all in the list. The efficiency in that case will be expressed as O (n) because n
comparison were made.

Continuing with the same example, the average case efficiency can be obtained by finding the average number of
comparisons. Here,

Minimum number of comparisons = 1

Maximum number of comparisons = n

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Therefore, average number of comparisons = (n + 1)/2

(n + 1)/2 is a linear function of n. Therefore, the average case efficiency will be expressed as O (n).

The worst case efficiency of the preceding search algorithm can be improved by using an alternate search
algorithm that provides better worst case efficiency. A search algorithm with a better worst case efficiency is a
binary search that provides efficiency of O (log n) in the worst case. You will learn more about this algorithm in
the subsequent chapters.

Activity 1.2: Designing Algorithms and Measuring their Efficiency

Summary
In this chapter, you learned that:

An algorithm can be defined as a step-by-step procedure for solving a problem that produces the correct
result with a finite number of steps.
An algorithm has five important properties:
Finiteness
Definiteness
Input
Output
Effectiveness
An algorithm that provides maximum efficiency should be used for solving a problem.
Data structures can be classified into the following two categories:
Static
Dynamic
Two commonly used techniques for designing algorithms are:
Divide and conquer approach
Greedy approach
Recursion refers to a technique of defining a process in terms of its own. It is used to solve complex
programming problems that are repetitive in nature.
The primary resources that an algorithm consumes are time and space.
Time/space tradeoff refers to a situation where you can reduce the use of memory at the cost of slower
program execution. Or, you can reduce the running time at the cost of increased memory usage.
The total running time of an algorithm is directly proportional to the number of comparisons involved in
the algorithm.
The order of growth of an algorithm is defined by using the big O notation.

Reference Reading

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Explore the Role of Algorithms and Data Structures in Problem Solving
Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://en.wikipedia.org/wiki/Algorithm


Design by Robert L. Kruse http://en.wikipedia.org/wiki/Data_structure

Designing Algorithms and Measuring their Efficiency


Reference Reference Reading: URLs
Reading:
Books

Data http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
Structures http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Greedy/greedyIntro.htm
and Program
http://en.wikipedia.org/wiki/Algorithmic_efficiency
Design by
Robert L.
Kruse

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_01.htm[09/05/2014 9:58:03 AM]


Chapter 2
Implementing Sorting Algorithms

Retrieving data quickly is one of the major tasks of an efficient data management system. Sorting helps in
retrieving data faster by storing data in a particular order. There are various sorting algorithms that can be used to
arrange data in a specific order.

This chapter discusses various sorting algorithms and their implementation. In addition, it compares the
efficiency of various sorting algorithms.

Objectives

In this chapter, you will learn to:

Sort data
Sort data by using bubble sort
Sort data by using insertion sort
Sort data by using quick sort

Sorting Data
Consider that you need to retrieve the telephone number of a person named Steve from a telephone directory
where the names are stored randomly.

To retrieve the desired record, you need to sequentially traverse the list of names one by one because the names
are not sorted. This is a time-consuming activity. When you have to retrieve a record from a huge volume of
data, the activity becomes even more difficult.

A simple solution to this problem is sorting. Sorting is the process of arranging data in some predefined order or
sequence. The order can be either ascending or descending.

If the data is sorted, you can directly go to the section that stores the names starting with S, thereby reducing the
number of records to be traversed.

There are different types of sorting algorithms that can help you sort data in a particular order. These sorting
algorithms may provide varying efficiency levels. However, even when two algorithms have the same efficiency,
there can be situations when one works better than the other.

Selecting a Sorting Algorithm

Since there are various sorting algorithms, it becomes important to understand which sorting algorithm to use in a
particular situation. To select an appropriate algorithm, you need to consider the following criteria in the
suggested order:

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Execution time
Storage space
Programming effort

Consider a situation where the data that needs to be sorted is small in quantity. To sort this data, all the sorting
algorithms will use a reasonable amount of storage space. In addition, the sorting algorithms will be executed in a
reasonable amount of time.

Therefore, in this situation, the criteria for selecting the sorting algorithm will be the programming effort
involved. An algorithm that requires less programming efforts will be preferred over an algorithm that requires
more programming efforts.

Consider another situation where the data that needs to be sorted is large in size. In such a situation, the time
taken by different algorithms may differ drastically because of the difference in their orders of growth. For
example, when there are a large number of elements, an algorithm with a logarithmic order of growth will
execute faster than an algorithm with a quadratic order of growth.

In addition, with an increase in data, the space requirement for different algorithms also differs drastically.
Therefore, when the data is large, you need to select a sorting algorithm that makes most efficient use of time or
memory, depending upon the requirement.

Types of Sorting Algorithms

There are various sorting algorithms that are used to sort data. Some of these are:

Bubble sort
Insertion sort
Quick sort

Let us discuss the working of these algorithms in detail.

In addition to the preceding algorithms, there are several other algorithms that can be used for sorting
data. Some of these algorithms are Selection sort, Shell sort, Merge sort, Counting sort, Bucket sort, Comb sort,
Radix sort, and Heap sort. However, the scope of this course is limited to Bubble sort, Insertion sort, and Quick
sort.

Sorting Data by Using Bubble Sort


Bubble sort is one of the simplest sorting algorithms. This algorithm has a quadratic order of growth and is
therefore suitable for sorting small lists only. The algorithm works by repeatedly scanning through the list,
comparing adjacent elements, and swapping them if they are in the wrong order. The algorithm gets its name
from the way smaller elements bubble to the top of the list after being swapped with the greater elements.

Implementing the Bubble Sort Algorithm

To understand the implementation of the bubble sort algorithm, consider an unsorted list of numbers stored in an

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


array. Suppose there are n elements in the array.

To implement the bubble sort algorithm, you need to traverse the list multiple times. The process of traversing the
entire list once is called a pass. It can be said that sorting is performed in multiple passes.

Pass 1
In Pass 1, you compare the first two elements and interchange their values if the first number is greater than the
second number. Then, you compare the second and third elements, and interchange their values if they are not in
the correct order. You repeat this process till the (n - 1) th element is compared with the nth element. The total
number of comparisons in Pass 1 is therefore, n - 1.

By the end of Pass 1, the largest element is placed at the nth position in the array. The number of comparisons is
one less than the total number of elements in the list.

Pass 2
In Pass 2, you repeat the same process as in Pass 1, but stop the comparison after comparing the element at the (n
- 2) th position with the element at the (n - 1) th position. This time, the number of comparisons required will be
one less than what is required in Pass 1. The total number of comparisons in Pass 2 is therefore, n - 2.

By the end of Pass 2, the second largest number will be placed at the (n - 1) th position in the array.

Pass 3
In Pass 3, you repeat the same process as in Pass 2, and this time, there will be n - 3 comparisons. This means
that the comparison will stop after comparing the element at the (n - 3) th position with the element at the (n - 2) th
position. By the end of Pass 3, the third largest number will be placed at the (n - 2) th position in the array.

Pass n - 1
Continuing the same process in all subsequent passes, in the (n - 1) th pass, you will have to perform only one
comparison. After the completion of this pass, the list will be sorted in the ascending order.

To sort a list with n elements by using bubble sort, n - 1 passes are required.

Consider an example. You have an unsorted list containing the ranks of students based on the results of an
examination. You need to sort these ranks in the ascending order. The list of ranks is stored in an array, as shown
in the following figure.

The List of Ranks in an Array

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


There are five elements in the list. Therefore, four passes will be required to sort the list.

In Pass 1, you need to perform the following steps:

1. Compare arr[0] with arr[1]. In the given list, arr[0] is greater than arr[1]. Therefore, you need to
interchange the two values. The resultant list is shown in the following figure.

The List of Ranks in an Array

2. Compare arr[1] with arr[2]. Here, arr[1] is less than arr[2]. Therefore, the values remain unchanged.
3. Compare arr[2] with arr[3]. Here, arr[2] is less than arr[3]. Therefore, the values remain unchanged.
4. Compare arr[3] with arr[4]. Here, arr[3] is greater than arr[4]. Therefore, you need to interchange the
two values, as shown in the following figure.

The List of Ranks in an Array

At the end of Pass 1, the largest element is placed at the last index position. The preceding process will be
repeated in all the subsequent passes. However, the value at index 4 will not be compared in any of the
subsequent passes because it has already been placed at its correct position.

In Pass 2, you need to perform the following steps:

1. Compare arr[0] with arr[1]. In the given list, arr[0] is less than arr[1]. Therefore, the values remain
unchanged.
2. Compare arr[1] with arr[2]. Here, arr[1] is less than arr[2]. Therefore, the values remain unchanged.
3. Compare arr[2] with arr[3]. Here, arr[2] is greater than arr[3]. Therefore, you need to interchange the
two values, as shown in the following figure.

The List of Ranks in an Array

At the end of this pass, the second largest element is placed at its correct position in the list. You will repeat the
same process in the subsequent passes. However, the values at indexes, 3 and 4, will not be compared in the
subsequent passes, because they are already placed at their correct positions.

The following figure shows the result after Pass 3.

The List of Ranks in an Array

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Similarly, the following figure shows the result after Pass 4.

The List of Ranks in an Array

After Pass 4, the list is completely sorted.

The following algorithm depicts the logic of bubble sort:

1. Set pass = 1.
2. Repeat step 3 varying j from 0 to n - 1 - pass .
3. If the element at index j is greater than the element at index j + 1, swap the two
elements .
4. Increment pass by 1.
5. If pass <= n - 1 go to step 2.

The preceding algorithm sorts a list in the ascending order. After making a slight modification in step 3,
the same algorithm can be used for sorting a list in the descending order. Instead of checking whether the
element at index j is greater than the element at index j+1, you need to check whether the element at index j is
less than the element at index j+1.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Determining the Efficiency of the Bubble Sort Algorithm

The efficiency for a sorting algorithm is measured in terms of the number of comparisons. The number of
comparisons in bubble sort can be easily computed. In bubble sort, there are n - 1 comparisons in Pass 1, n - 2
comparisons in Pass 2, and so on. Therefore, the total number of comparisons will be (n - 1) + (n - 2) + (n - 3) +
... + 3 + 2 + 1, which is an arithmetic progression.

The formula for determining the sum of an arithmetic progression is:

Sum = n/2 [2a + (n - 1)d]

where,

n is the total number of elements in the arithmetic progression.

a is the first element in the arithmetic progression.

d is the step value, which is the difference in successive terms of an arithmetic progression.

For the preceding arithmetic progression,

n = n 1

a = 1

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


d = 1

Therefore,

Sum = (n 1)/2 [2 1 + (n 1 1) 1]

Sum = (n 1)/2 [2 + (n 2)]

Sum = (n 1)/2 (n)

Sum = n(n 1)/2 n(n is of the O(n 2) order. Therefore, the bubble sort algorithm is of the order,
1)/2
O(n 2 ). This means that the time taken to execute the algorithm increases quadratically with an increase in the size
of the list.

Suppose it takes 100 ns to execute the algorithm on a list of 10 elements. Now, if the number of elements is
doubled, that is, the number of elements is increased to 20, the execution time will increase to 400 ns, which is
four times the time taken to execute the algorithm on 10 elements.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Activity 2.1: Sorting Data by Using the Bubble Sort Algorithm

Sorting Data by Using Insertion Sort


Similar to bubble sort that has a quadratic order of growth, insertion sort also has a quadratic order of growth,
and is, therefore, used for sorting small lists only.

However, if the list that needs to be sorted is nearly sorted, insertion sort becomes more efficient than bubble sort.
This is because bubble sort always performs the same number of comparisons, no matter what is the initial
ordering of elements. In contrast, insertion sort performs a different number of comparisons depending on the
initial ordering of elements. When the elements are already in the sorted order, insertion sort needs to make few
comparisons.

Implementing the Insertion Sort Algorithm

The insertion sort algorithm divides the list into two parts, sorted and unsorted. Initially, the sorted part contains
only one element. In each pass, one element from the unsorted list is inserted at its correct position in the sorted
list. As a result, the sorted list grows by one element and the unsorted list shrinks by one element in each pass.

Consider the following figure that shows an unsorted list stored in an array that needs to be sorted by using the

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


insertion sort algorithm.

The Unsorted List of Elements

To sort this list by using the insertion sort algorithm, you need to divide the list into two sub lists, sorted and
unsorted. Initially, the sorted list contains only the first element and the unsorted list contains the remaining four
elements, as shown in the following figure.

The List Divided in Two Parts

The list is not physically divided into two separate arrays. The division is only logical where the element at
index 0 is considered to be in the sorted list and the elements at indexes, 1 to 4, are considered to be in the
unsorted list.

Now, to further sort the unsorted list, you need to perform a number of passes:

1. In Pass 1, take the first element, 80, from the unsorted list, and store it at its correct position in the
sorted list. Here, 80 is greater than 70, therefore, it remains at array index 1. However, array index 1 is
now considered to be a part of the sorted list. This is shown in the following figure.

The Element Stored at its Correct Position in the Sorted List

The sorted list has two elements now and the unsorted list has three elements.
2. In Pass 2, take the first element, 30, from the unsorted list, and store it at its correct position in the
sorted list. Here, 30 is less than 70 and 80, therefore, it needs to be stored at array index 0. To store 30
at array index 0, 80 needs to be shifted to array index 2, and 70 needs to be shifted to array index 1.
This is shown in the following figure.

The Element Stored at its Correct Position in the Sorted List

The sorted list has three elements now and the unsorted list has two elements.
3. In Pass 3, take the first element, 10, from the unsorted list, and store it at its correct position in the

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


sorted list. Here, 10 is smaller than the three elements in the sorted list. Therefore, it needs to be stored
at array index 0. To store 10 at array index 0, 30 needs to be shifted to array index 1, 70 needs to be
shifted to array index 2, and 80 needs to be shifted to array index 3. This is shown in the following
figure.

The Element Stored at its Correct Position in the Sorted List

The sorted list has four elements now, and the unsorted list has one element.
4. In Pass 4, take the first element, 20, from the unsorted list, and store it at its correct position in the
sorted list. Here, 20 is greater than 10 and smaller than the other three elements in the sorted list.
Therefore, it needs to be stored at array index 1.

To store 20 at array index 1, 30 needs to be shifted to array index 2, 70 needs to be shifted to array index 3, and
80 needs to be shifted to array index 4. This is shown in the following figure.

The Element Stored at its Correct Position in the Sorted List

The unsorted list is now empty, and the sorted list contains all the elements. This means that the list is now
completely sorted.

The following algorithm depicts the logic of insertion sort:

1. Repeat steps 2, 3, 4, and 5 varying i from 1 to n - 1.


2. Set temp = arr[i].
3. Set j = i - 1.
4. Repeat until j becomes less than 0 or arr[j] becomes less than or equal to temp:
a. Shift the value at index j to index j + 1.
b. Decrement j by 1.
5. Store temp at index j + 1.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Determining the Efficiency of the Insertion Sort Algorithm

Consider an unsorted list of numbers with n elements. To sort this unsorted list by using insertion sort, you need
to perform (n - 1) passes. In insertion sort, if the list is already sorted, you will have to make only one
comparison in each pass.

In n - 1 passes, you need to make n - 1 comparisons. This is the best case for insertion sort. Therefore, the best
case efficiency of insertion sort is of the order, O(n) .

Now, consider a situation where initially, the list is stored in the reverse order. In this case, you need to make
one comparison in Pass 1, two comparisons in Pass 2, three comparisons in Pass 3, and n - 1 comparisons in the
(n - 1) th pass.

The formula for determining the total number of comparisons in this case is:

Sum = 1 + 2 + . . . + (n - 1)

This is the same as bubble sort. Therefore, the worst case efficiency of insertion sort is of the order, O(n 2).

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Activity 2.2: Sorting Data by Using the Insertion Sort Algorithm

Sorting Data by Using Quick Sort


Various sorting algorithms, such as bubble sort and insertion sort, are useful for sorting small lists of data.
However, for larger lists, these algorithms prove inefficient.

Quick sort is one of the most efficient sorting algorithms useful for sorting large lists. This algorithm involves
successively dividing the problem into smaller problems, until the problems become so small that they can be
easily solved. The solutions to all the smaller problems are then combined to solve the complete problem.

Implementing the Quick Sort Algorithm

The quick sort algorithm works by selecting an element from the list called a pivot, and then partitioning the list
into two parts that may or may not be equal. The list is partitioned by rearranging the elements in such a way that
all the elements towards the left end of the list are smaller than the pivot, and all the elements towards the right
end of the list are greater than the pivot. The pivot is then placed at its correct position between the two sub lists.

This process is repeated for each of the two sub lists created after partitioning, and the process continues until
one element is left in each sub list.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


To understand the concept behind the quick sort algorithm, consider an unsorted list of numbers stored in an
array named arr, as shown in the following figure.

The Unsorted List

You need to sort the preceding list by using the quick sort algorithm.

In the given list, you can take arr[0] as the pivot, as shown in the following figure.

The Selection of the Pivot Value

After selecting the pivot value, you need to perform the following steps:

1. Starting from the left end of the list (at index 1), and moving from the left to right direction, search the
first element that is greater than the pivot value. Here, arr[1] is the first value greater than the pivot.
2. Similarly, starting from the right end of the list, and moving in the right to left direction, search for the
first element that is smaller than or equal to the pivot value. Here, arr[4] is the first value smaller than
pivot.
The two searched values are depicted in the following figure.

The Array Depicting the Two Searched Values

In the preceding figure, the greater value is on the left hand side of the smaller value. This means that
the values are not in the correct order.
3. Interchange arr[1] with arr[4] so that the smaller value is placed on the left hand side and the greater
value is placed on the right hand side. The resultant array is shown in the following figure.

The Array after Interchanged Values

4. Starting from arr[2] and moving in the left to right direction, continue the search for an element greater

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


than the pivot. Here, arr[2] is found to be greater than the pivot.
5. Similarly, starting from arr[3], and moving to the right to left direction, continue the search for an
element smaller than or equal to the pivot. Here, arr[1] is found to be smaller than the pivot.
The two searched values are depicted in the following figure.

The Array Depicting the Two Searched Values

In the preceding figure, the smaller value is on the left hand side of the greater value. This indicates
that the values are in the right order. Therefore, the values need not be interchanged and the search
stops here. At this stage, the list can be divided into two sub lists, List 1 and List 2.

List 1 contains all the values less than or equal to the pivot, and List 2 contains all values greater than
the pivot, as shown in the following figure.

The Two Sub Lists

6. Interchange the pivot value with the last element of List 1, as shown in the following figure.

Interchanging the Pivot Value

The pivot value, 28, is now placed at its correct position in the list. All the elements towards the right
side of 28 are greater than 28, and all the elements towards the left side of 28 are smaller than or equal
to 28. Now the two sub lists, List 1 and List 2, need to be sorted.
7. Truncate the last element, that is, pivot from List 1 because it has already reached its correct position.
List 1 now has only one element. Therefore, nothing needs to be done to sort it.
8. Sort the second list, List 2 by following the same process as for the original list. The pivot in this case
will be arr[2], that is, 46, as shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


The Pivot Value for List 2

9. Starting from the left, arr[4] is greater than the pivot value. Similarly, starting from the right, arr[7] is
smaller than the pivot value. The greater value is on the left hand side of the smaller value. This means
that the values are not in the correct order.
Therefore, interchange the two values, as shown in the following figure.

The Interchanged Values

10. Starting from arr[5] and moving to the left to right direction, continue the search for an element greater
than the pivot. Here, arr[5] is found to be greater than the pivot.
11. Similarly, starting from arr[6] and moving to the right to left direction, continue the search for an
element smaller than or equal to the pivot. Here, arr[4] is found to be smaller than the pivot.
The two searched values are depicted in the following figure.

The Values Greater and Smaller than the Pivot

In the preceding figure, the smaller value is to the left hand side of the greater value. This indicates that
the values are in the right order. Therefore, the values need not be interchanged and the search stops
here. At this stage, the list can be divided into two sub lists, Sublist 1 and Sublist 2, in such a way that
Sublist 1 contains all the values less than or equal to the pivot, and Sublist 2 contains all the values
greater than the pivot, as shown in the following figure.

The List 2 Divided into Two Sub Lists

12. Interchange the pivot value with the last element of Sublist 1, as shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


The Interchange of the Pivot Value

The pivot value, 46, has now reached its correct position in the list. Now, you need to sort Sublist 1 and
Sublist 2.

13. Sort Sublist 1 and Sublist 2 by following the same procedure:


a. Select a pivot value.
b. Divide the sub list into two parts in a way that one part contains all elements less than or equal
to the pivot, and the other part contains all elements greater than the pivot.
c. Place the pivot at its correct position between the two parts of the list.
d. The preceding process will stop when there is a maximum of one element in each sub list. At
that point, the list will be completely sorted, as shown in the following figure.

The Sorted List

The quick sort algorithm recursively divides the list into two sub lists. Therefore, the algorithm for quick sort is
recursive in nature. The following algorithm depicts the logic of quick sort:

Algorithm: QuickSort(low, high) // Low is the index of the first element in //the list and
high is the index of the last //element in the list

1. If (low > high):


a. Return.
2. Set pivot = arr[low] .
3. Set i = low + 1.
4. Set j = high .
5. Repeat step 6 until i > high or arr[i] > pivot . //Search for an element
//greater than the pivot
6. Increment i by 1.
7. Repeat step 8 until j < low or arr[j] < pivot . // Search for an element
// smaller than the pivot
8. Decrement j by 1.
9. If i < j: // If greater element is on the left of smaller element
a. Swap arr[i] with arr[j].
10. If i <= j:
a. Go to step 5. // Continue the search
11. If low < j:

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


a. Swap arr[low] with arr[j]. // Swap the pivot with last element in
// the first part of the list

12. QuickSort(low, j - 1) . // Apply quick sort on the list left to the pivot
13. QuickSort(j + 1, high) . // Apply quick sort on the list right to the pivot

Determining the Efficiency of the Quick Sort Algorithm

The total time taken by this sorting algorithm depends on the position of the pivot value. Typically, the first
element is chosen as the pivot, but it leads to a worst case efficiency of O(n 2).

The worst case occurs when the list is already sorted. In this case, the partitioning will always be unbalanced.
One of the two sub lists will always be empty and the other will contain all the elements.

In such a case, the first element requires n comparisons to recognize that it remains in the first position.

Similarly, the second element requires n - 1 comparisons to recognize that it remains in the second position.
Consequently, the total number of comparisons in this case is:

Number of comparisons = n + (n - 1) + (n - 2) +...+ 2 + 1

= n(n - 1)/2

= O(n 2 )

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


However, if you select the median of all values as the pivot, the efficiency will be O(n log n). This is because:

Partitioning the initial list places one element at its correct position and produces two sub lists of nearly
equal sizes.
Partitioning the two sub lists places two elements at their correct positions and produces four sub lists of
nearly equal sizes. This means that a total of three elements are placed at their correct positions in the
list.
Partitioning the four sub lists places four elements at their correct positions and produces eight sub lists
of nearly equal sizes. This means that a total of seven elements are placed at their correct positions in
the list.

This process continues until all the elements are placed at their correct positions. You can generalize the process
by saying that the partitioning in the kth step places a total of 2k - 1 elements at their correct positions.

Suppose you require x partitions to sort the list completely. This means that after the xth partition, all n elements
are placed at their correct positions in the list. Therefore:

2x - 1 = n

2x = n + 1

x = log 2 (n + 1) (By definition, if bq = p, then Log b p = q)

This means that you require approximately log 2n reductions to sort a list of n elements. In each reduction, there
are a maximum of n comparisons. Therefore, the efficiency of quick sort is of the order, O(n log n).

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]
Activity 2.3: Sorting Data by Using the Quick Sort Algorithm

Summary
In this chapter, you learned that:

Sorting is the process of arranging data in some predefined order or sequence. The order can be either
ascending or descending.
There are various sorting algorithms that are used to sort data. Some of these are:
Bubble sort
Insertion sort
Quick sort
To select an appropriate algorithm, you need to consider the following criteria in the suggested order:
Execution time
Storage space
Programming effort
Bubble sort is one of the simplest sorting algorithms. This algorithm has a quadratic order of growth
and is therefore suitable for sorting small lists only.
In bubble sort, there are n - 1 comparisons in Pass 1, n - 2 comparisons in Pass 2, and so on.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


The insertion sort algorithm divides the list into two parts, sorted and unsorted.
Insertion sort performs a different number of comparisons depending on the initial ordering of elements.
When the elements are already in the sorted order, insertion sort needs to make few comparisons.
The quick sort algorithm recursively divides the list into two sub lists.
The worst case efficiency of the quick sort algorithm is O(n 2).
The best case efficiency of the quick sort algorithm is O(n log n).

Reference Reading

Sorting Data
Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://www.cs.auckland.ac.nz/~jmor159/PLDS210/sorting.html


Design by Robert L. Kruse

Sorting Data by Using Bubble Sort


Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://www.cprogramming.com/tutorial/computersciencetheory/sorting1.html


Design by Robert L. Kruse http://www.sorting-algorithms.com/bubble-sort

Sorting Data by Using Insertion Sort


Reference Reference Reading: URLs
Reading:
Books

Data http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Sorting/insertionSort.htm
Structures http://en.wikipedia.org/wiki/Insertion_sort
and
Program
Design by
Robert L.
Kruse

Sorting Data by Using Quick Sort


Reference Reference Reading: URLs
Reading: Books

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Data Structures http://www.cs.auckland.ac.nz/~jmor159/PLDS210/qsort.html
and Program http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Sorting/quickSort.htm
Design by
http://www.cse.iitk.ac.in/users/dsrkg/cs210/applets/sortingII/quickSort/quick.html
Robert L.
Kruse
Data Structures
Using C and
C++ by Aaron
M. Tenenbaum

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_02.htm[09/05/2014 10:01:20 AM]


Chapter 3
Implementing Searching Algorithms

Information retrieval is one of the most important functions of computers. To retrieve any kind of information,
you need to search it. There are different types of searching algorithms that can be used to search data. These
include linear search, binary search, and hashing.

This chapter covers the implementation of various searching algorithms and compares their efficiency.

Objectives

In this chapter, you will learn to:

Search data by using linear search technique


Search data by using binary search technique
Store and search data by using hashing

Performing Linear Search


Linear search is the simplest searching method that can be applied to a given collection of data. Given a list of
items and an item to be searched in the list, linear search will compare the item sequentially with the elements in
the list.

Because the elements in the list are compared sequentially with the item to be searched, this type of search is also
known as sequential search.

Implementing Linear Search

To understand the implementation of the linear search algorithm, consider an example where you need to search
the record of an employee, whose employee ID is 1420, from a list of employee records. Linear search will begin
by comparing the required employee ID with the first element in the list.

If the values do not match, the employee ID will be compared with the second element. Again, if the values do
not match, the employee ID will be compared with the third element. This process will continue until the desired
employee ID is found or the end of the list is reached.

The following algorithm depicts the logic to search an employee ID in an array by using linear search:

1. Read the employee ID to be searched .


2. Set i = 0.
3. Repeat step 4 until i = n or arr[i] = employee ID.
4. Increment i by 1.
5. If i = n:

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Display Not Found.

Else

Display Found.

Determining the Efficiency of Linear Search

The efficiency of a searching algorithm is determined by the running time of the algorithm. This running time is
proportional to the number of comparisons made for searching a record in a given list of records.

While performing linear search, if the desired record is found at the first position in the list, you will have to
make only one comparison. Therefore, the best case efficiency of linear search is O(1) .

However, if the desired record is stored at the last position in the list or does not exist in the list, you will have to
make n comparisons, where n is the number of records in the list. Therefore, the worst case efficiency of linear
search is O(n) .

The average number of comparisons for a linear search can be determined by finding the average of the number
of comparisons in the best and worst cases. This turns out to be (n+1)/2.

Suppose you have to search for an element in an array of size 15. In the best case, you will find the element in
only one comparison. However, in the worst case, you will either find the element or conclude that the element
does not exist in the list after 15 comparisons. Therefore, in an average case, the element will be found in

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


((15+1)/2), that is, 8 comparisons.

Activity 3.1: Performing Linear Search

Performing Binary Search


Linear search is a useful technique for searching small lists. However, it is an inefficient searching solution for
large lists. Suppose you have a list of 10,000 elements, and the item to be searched is the last item in the list. In
this case, you will have to make 10,000 comparisons to search the element. Therefore, linear search is not an
appropriate technique for searching large lists.

An alternate solution, which offers better efficiency for large lists, is binary search algorithm. This searching
algorithm helps you to search data in few comparisons. To apply binary search algorithm, you should ensure that
the list to be searched is sorted. If the list to be searched is not sorted, it needs to be sorted before binary search
can be applied to it.

Implementing Binary Search

Consider an example where you have to search the name, Steve, in a telephone directory that is sorted
alphabetically. In this case, you do not search the name sequentially. Instead, you open the telephone directory at

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


the middle to determine the half portion that contains the name.

You can open that half portion at the middle to determine the quarter of the directory that contains the name. You
repeat the process until the required name is found. In this way, each time, you reduce the number of pages to be
searched by half, and thus, find the required name quickly.

The binary search algorithm is based on the preceding approach for searching an item in a sorted list. Consider
another example. Suppose you have nine items in a sorted array, as shown in the following figure.

The Sorted Array

In the preceding list, you have to search the element, 13, by using the binary search algorithm. To search this
element, you need to perform the following sequence of steps:

1. Compare the element to be searched with the middlemost element of the list. You can determine the
index of the middlemost element with the help of the following formula:

Mid = (Lower bound + Upper bound)/2


Here, Lower bound (LB) is the index of the first element in the list and Upper bound (UB) is the index
of the last element in the list.

In the preceding list, LB is 0 and UB is 8.

Therefore,

Mid = (LB + UB)/2


Mid = (0 + 8)/2
Mid = 4
After determining the middlemost element, you need to check the following possible conditions:

Element to be searched = middle element: If this condition holds true, the desired element
is found at arr[mid] .
Element to be searched < middle element: If this condition holds true, you need to search
the item towards the left of the middle element.
Element to be searched > middle element: If this condition holds true, you need to search
the element towards the right of the middle element.
In the preceding example, the middle element is at the index, 4, as shown in the following figure.

The Middle Element

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


2. The element at the index, 4, is 25, which is greater than 13. Therefore, you will search the element
towards the left of the middle element, that is, in the list arr[0] to arr[3]. Here, LB is 0 and UB is
equal to mid 13 , as shown in the following figure.

The LB and UB Pointers

3. Again, determine the index of the middlemost element of the list arr[0] to arr[3].
Mid = (LB + UB)/2
Mid = (0 + 3)/2
Mid = 1
Therefore, the middle element will be at the index, 1, as shown in the following figure.

The Middle Element

The element at the index, 1, is 13, which is the desired element. Therefore, the element is found at the
index, 1, in the preceding list.

The following algorithm depicts the logic to search a desired element by using binary search:

1. Accept the element to be searched .


2. Set lowerbound = 0.
3. Set upperbound = n 1.
4. Set mid = (lowerbound + upperbound)/2 .
5. If arr[mid] = desired element :
a. Display Found .
b. Go to step 10 .
6. If desired element < arr[mid] :
a. Set upperbound = mid 1.
7. If desired element > arr[mid] :
a. Set lowerbound = mid + 1.
8. If lowerbound <= upperbound :
a. Go to step 4.
9. Display Not Found.
10. Exit .

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Determining the Efficiency of Binary Search

In the binary search algorithm, with every step, the search area is reduced to half. Therefore, it requires few
comparisons.

The best case for this algorithm will be the one where the element to be searched is present at the middlemost
position in the array. In this case, the desired element is found in just one comparison, and therefore, the
efficiency of the search process is O(1) .

However, the worst case will be the one where the desired element is not found in the array. In this case, the
process of dividing the list into sub lists continues until there is only one item left for the comparison.

After bisecting the list, the following conditions can be there for the worst case:

After the first bisection, the search space is reduced to n/2 elements, where n is the number of elements
in the original list.
After the second bisection, the search space is reduced by n/4, that is, to n/2 2 elements.
After the i th bisection, the search space is reduced to n/2 i elements.

Suppose, after the i th bisection, the search space is reduced to one element. In this case, n/2 i = 1

n = 2i

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


i = log 2 n (If a = 2b , then b = log 2 a)

This means that the list can be bisected in maximum log 2n times. After each bisection, only one comparison is
made. Therefore, the total number of comparisons will be log 2n. This means that the worst case efficiency of
binary search is O(log n).

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Activity 3.2: Performing Binary Search

Implementing Hashing
Of the two searching algorithms, linear search and binary search, this is the binary search that is an efficient
algorithm to search the desired element from a large list. However, the binary search algorithm also has some
disadvantages. The main disadvantage of the binary search algorithm is that it works only on sorted lists.

In addition, it requires a way to directly access the middle element of the list. If the list is stored as a linked list,
there is no way of accessing the middle element of the list. Therefore, binary search cannot be applied to a linked
list.

You will learn more about linked lists later in this book.

An alternate searching algorithm that overcomes these limitations and provides good efficiency is hashing. This
section explains the concept and implementation of hashing.

Defining Hashing

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Suppose you have to search for the element corresponding to a given key value in a given list of elements. To
retrieve the desired element, you have to sequentially search through the elements until the element with the
desired key value is found. This method is time consuming, especially, if the list is large.

In this case, an effective solution to search the element will be to calculate the offset address of the desired
element, and read the element at the resultant offset address.

An offset element is the relative address of the element from the beginning of the file.

Given the offset of an element, the element could easily be retrieved in a single disk access without wasting any
time in searching. For example, assume that the keys in a file are consecutive numbers starting from 0 to n 1.
Then, given a key, the offset of the element corresponding to it can be easily calculated by the formula:

Key Element length

However, in practical situations, keys have to be more meaningful than merely being consecutive integral
numbers. Fields like client codes, product codes, and even names are more likely to be used as keys. When such
fields are used as keys, a technique, called hashing, can help you to convert the key value to an offset address.

Hashing is one of the best methods of finding and retrieving information associated with a unique identifying
key. The fundamental principle of hashing is to convert a given key value to an offset address to retrieve an
element.

Conversion of a key to an address is done by a relation (formula), which is known as a hashing function. A
hashing function operates on a key to give its hash value, which is a number that represents the location
(position) at which the element can be found.

The following process is used to search an element by using hashing:

1. Given a key, the hash function converts it into a hash value (location) within the range of 1 to n, where
n is the size of the storage (address) space that has been allocated for the elements.
2. The element is then retrieved at the location generated.

The hash function is used to generate the location at which an element will be inserted. During retrieval,
the same hash function is used to find the location at which the element is stored. Alphanumeric keys are usually
converted into numeric keys before the hashing function can operate on them.

Limitations of Hashing

Although hashing is an efficient technique to search data, it also has some disadvantages. Suppose there are two
keys that generate the same hash values. In this case, the locations at which the elements corresponding to the
keys have to be stored would also be the same. Such a situation in which an attempt is made to store two keys at
the same position is known as collision. Consider the following hash function:

pos(key) = key % 4

By using the preceding hash function on the keys 3, 5, 8, and 10, the keys get scattered, as shown in the

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


following figure.

The Keys Mapped to their Addresses

This looks fine but a problem arises if the keys to be hashed are 3, 4, 8, and 10, as shown in the following figure.

The Keys Mapped to their Addresses

The keys, 4 and 8, hash to the same position and therefore results in a collision.

Another disadvantage of hashing is that items in a hash table cannot be accessed sequentially. Hashing can
accelerate the process of searching prominently. However, there can be certain operations that take place after
hashing. Such operations, which involve sequential access to data items, may not be very efficient.

A hash table is a table where the data to be searched is stored.

The occurrence of a collision can be minimized by using a good hash function. Consider the example of
calculating the address of an element by applying the modulus operation on each key. This involves dividing the
key value by the size of the hash table to obtain the remainder of the division. The remainder is considered as the
address of the element corresponding to the key value.

When you use this method for implementing hashing, you should ensure that the size of the table is not a power
of two, else there will be more chances of collisions. You can minimize the collision of addresses by keeping the
size of the hash table equal to a prime number. Consider the following keys:

36475611, 47566933, 75669353, 34547579, 46483499

Assuming that the size of the table is 43, the addresses of the preceding keys will be calculated as:

36475611 mod 43 = 1

47566933 mod 43 = 32

75669353 mod 43 = 17

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


34547579 mod 43 = 3

46483499 mod 43 = 26

Resolving Collision

Two elements cannot occupy the same position. Therefore, a collision situation needs to be checked. The
following methods can be used for collision processing:

Chaining
Open addressing

Chaining
In this method, you use links/pointers to resolve hash clashes. The following chaining techniques are used for
resolving a collision:

Coalesced chaining
Separate chaining

Coalesced Chaining
In this method, the storage area is divided into two parts, the prime area and the overflow area, as shown in the
following figure.

The Division of Storage Areas in a File

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Initially, the hashing function is used to translate keys into addresses in the prime area. The first key value, which
is hashed, is then stored at a particular hash address in the prime area. All subsequent keys with the same hash
address result in a collision and are placed in the overflow area.

You need to link the two entries that have the same hash address. It is because of this fact that each such entry
requires a link field in addition to the key and data field. If a node is not linked to any other node, the link field is
given a NULL value indicating that there are no further key entries having the same hash value.

Consider the following hashing function:

hash(key) = key % 7

The following table displays the method of storing the keys, 22, 31, 67, 36, 29, and 60, by using coalesced
chaining.

A File Using Coalesced Chaining

In this way, whenever a collision occurs, an entry is added in the overflow area.

Separate Chaining
Coalesced chaining is an effective method to process collisions. However, it has one disadvantage. In this
method, the size of the hash table is fixed in advance. If the number of elements grows beyond the number of
available positions, the elements cannot be inserted without allocating more space.

To overcome this disadvantage, the separate chaining method can be used. In the separate chaining method, the
hash table is implemented in such a way that each slot in the hash table contains the header node of a linked list.
This means that each slot in the hash table contains the address of the first node of a distinct linked list. All
elements that hash to a particular slot in the hash table are stored in the linked list corresponding to that slot. This

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


means that each linked list is a list of elements whose keys have the same hash values.

Consider an example where the following hash function is used:

key mod 10

The preceding function will always generate values between 0 and 9. Therefore, a hash table of size 10 can be
used. Suppose the elements with the following keys need to be stored:

80, 92, 25, 76, 48, 150, 215, 216, and 38

The elements corresponding to the preceding keys can be stored as an array of linked lists, as shown in the
following figure.

An Array of Linked Lists

Each node in the linked lists contains INFO, as well as a pointer, which stores the address of the next node in the
list. INFO contains the key, K, and the Element, E.

When a record has to be retrieved, the hashing function converts the given key to yield a position (subscript) in
the array (hash table). The linked list that initiates at that position is then searched to retrieve the desired element.

Open Addressing
In this method, elements that produce a collision are stored at an alternate position in the hash table. An alternate
location is obtained by searching the hash table until an unused position is found. This process is called probing.
The following probing sequence can be used to search an empty position in the hash table:

Linear probing
Quadratic probing

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Double hashing

Linear Probing
In this method, whenever there is a collision, the record is stored at the next empty position in the hash table. The
hash table is considered to be a circular array so that after the last location, the search proceeds from the first
location of the table.

Although linear probing is a simple technique to resolve collisions, it has a disadvantage associated with it. When
the table becomes about half-full, there is a tendency towards clustering. This means that elements start appearing
in long strings of consecutive cells with gaps between the strings. Therefore, the sequential search for an empty
position becomes time consuming.

Quadratic Probing
This technique overcomes the problem of clustering by implementing an alternate probing sequence. For
example, if there is a collision at the hash address, i, then, in case of linear probing, the sequence of search for
an empty location is given by:

i + 1, i + 2, i + 3, ...

However, in case of quadratic probing, the sequence of search is given by:

i + 1, i + 4, i + 9, ...

Increasing the distance between the search locations decreases the problem of clustering.

Double Hashing
In this method, whenever there is a collision, a second hash function is applied to obtain an alternate position.
Keys, which collide at the first probe, are likely to have different values for the second hash function.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Determining the Efficiency of Hashing

Search becomes faster by using hashing as compared to any other searching method. This is because in hashing,
you can ideally access the desired element in just one comparison. Therefore, the efficiency of hashing is ideally
O(1) .

However, because of any collision, the efficiency of hashing gets reduced. The efficiency of hashing in this case
depends on the quality of the hash function.

A hash function is considered good if it results in uniform distribution of elements in the hash table. On the other
hand, a poor hash function results in a lot of collisions. For example, if the hash function always returns the
value, 1, for all keys, then it is obvious that the associated hash table acts just like a linked list. The efficiency of
search in this case would be O(n) .

Summary
In this chapter, you learned that:

The best case of efficiency of linear search is O(1) , and the worst case of efficiency of linear search is
O(n) .
To apply binary search algorithm, you should ensure that the list to be searched is sorted.
The best case of efficiency of binary search is O(1) , and the worst case of efficiency of binary search is

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


O(log n) .
The fundamental principle of hashing is to convert a given key value to an offset address to retrieve a
record.
In hashing, conversion of a key to an address is done by a relation (formula), which is known as a
hashing function.
The situation in which the hash function generates the same hash value for two or more keys is called
collision.
There are several methods of the collision processing. These are:
Chaining
Open addressing

Reference Reading

Performing Linear Search


Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://en.wikipedia.org/wiki/Linear_search


Design by Robert L. Kruse http://en.wikipedia.org/wiki/Search_algorithm
Data Structures Using C and
C++ by Aaron M. Tenenbaum

Performing Binary Search


Reference Reading: Reference Reading: URLs
Books

Data Structures Using C http://en.wikipedia.org/wiki/Binary _search


and C++ by Aaron M. http://video.franklin.edu/Franklin/Math/170/common/mod01/binarySearchAlg.html
Tenenbaum

Implementing Hashing
Reference Reading: Books Reference Reading: URLs

An Introduction to Data http://en.wikipedia.org/wiki/Hashing_algorithm


Structures with Applications http://www.cs.princeton.edu/~rs/AlgsDS07/10Hashing.pdf
by Jean-Paul Tremblay and
Paul G.Sorenson

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_03.htm[09/05/2014 10:13:58 AM]


Chapter 4
Solving Programming Problems Using Linked Lists

A list is a set of items organized sequentially. Lists are commonly implemented in programs by using arrays.
However, there are certain limitations associated with the use of arrays in programs. You can overcome these
limitations by implementing a list as a linked list.

This chapter discusses the basic concepts of a linked list. It explains how linked lists overcome the limitations
imposed by arrays.

Objectives

In this chapter, you will learn to:

Identify the features of linked lists


Implement a singly-linked list
Implement a doubly-linked list

Introduction to Linked Lists


Consider a scenario where you have to write a program to generate and store all the prime numbers between one
and a million, and then display them. If you use an array to store the prime numbers, the size of the array needs
to be declared in advance. However, the number of prime numbers between 1 and 10,00,000 is not known in
advance. Therefore, to store all the prime numbers, you need to declare an arbitrarily large size array. In the
worst case, you may need to declare an array of the size, 10,00,000.

For instance, you declare an array of the size, n. Now, if the number of prime numbers between 1 and 10,00,000
is more than n, all the prime numbers cannot be stored. Similarly, if the number of prime numbers is less than n,
a lot of memory space will be wasted.

Therefore, you cannot use an array to store the numbers. What can you do in such a situation?

To solve such problems, you can use a dynamic data structure that does not require you to specify the size in
advance and allows memory to be allocated whenever required. An example of such a data structure is a linked
list.

Linked lists are flexible data structures that provide a convenient way to store data. You do not have to specify
the size of the list in advance. Memory is allocated dynamically whenever required. Linked lists are useful in
operations where frequent manipulation (insertion and deletion) of data is required. There are various types of
linked lists. Each has a unique feature. The choice of a particular type of linked list is based on the problem.

Dynamic Memory Allocation

Dynamic memory allocation refers to the process of allocating memory on the basis of the need at runtime. It is

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


useful in cases where you do not know the amount of data to be stored in advance.

Consider a scenario where you need to write a program to store the first 10 prime numbers. In this case, the
number of prime numbers to be stored is known in advance. Therefore, you can declare an array and specify its
size as 10, as shown in the following code snippet:

// Code in C#

int [] prime = new int[10];

// Code in C++

int prime[10];

In the preceding code snippet, when the code is compiled, memory space is allocated to store 10 integer values.
This is called static memory allocation because the memory allocated to the array is fixed at the time of
compilation. It cannot increase or decrease at runtime.

The memory representation for the integer array is shown in the following figure.

The Static Memory Allocation

In the preceding figure, you can see that one contiguous block of memory is allocated for the array. You can
access the elements of an array by referring to their memory locations. If you know the address of the first
element of an array, you can easily calculate the addresses of the rest of the elements. The address of the first
element of the array (also known as the base address) is internally stored in the array variable, name. You can
calculate the address of any other element by using the following formula:

Address of the first element + (size of the element index of the element)

Now, consider the scenario in which you are required to store all the prime numbers between 1 and 10,00,000. In
this case, the number of prime numbers to be stored is not known in advance. Therefore, you need an alternate
data structure that allows you to allocate memory at runtime, so that whenever a prime number is encountered,
memory is allocated for that prime number at runtime. When memory is allocated in this manner, the various
chunks of memory may not be contiguous.

They are spread randomly in the memory, as shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


The Dynamic Memory Allocation

In this case, if you know the address of the first element, you cannot calculate the address of the rest of the
elements. This is because all the elements are stored at random locations in the memory.

To solve this problem, each allocated memory block is divided into two halves. The first half holds the data and
the second half holds the address of the next block in the sequence. This gives a linked structure to the blocks of
memory where each block is linked to the next block in the sequence, as shown in the following figure.

The Linked Representation of Dynamically Allocated Memory

An example of a data structure that implements this concept is a linked list.

Defining Linked Lists

A linked list is a chain of elements in which each element consists of data, as well as a link to the next element.
The link stores the address of the next logically similar element in the list. Each such element of a linked list is
called a node, as shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


The Structure of a Node

Through the address field, one node logically references another node of the same type in the linked list. Due to
this property, a linked list is called a self-referential data structure.

The structure of a linked list is shown in the following figure.

The Structure of a Linked List

Each node in a linked list contains the address of the next node in the list. However, there is no node that contains
the address of the first node. To keep track of the first node of the list, a variable is used that stores the address of
the first node of the list. In the preceding example, a variable named START is used to store the address of the first
node of the list. When the list does not contain any node, START is set to the value, NULL .

The last node does not need to point to any other node. Therefore, the content of the address field of the last
node is set to NULL , so that the end of the list can be identified.

A linked list with no nodes is termed as an empty list.

Identifying Different Types of Linked Lists

Based on ways the various nodes are connected to each other, linked lists can be of the following types:

Singly-linked list: It is the simplest type of linked list where each node points to the next node. The
last node does not point to any other node in the list. Therefore, it points to NULL . This means that a
node pointing to NULL refers to the end of the list. The structure of a singly-linked list is shown in the
following figure.

The Singly-linked List

Doubly-linked list: In this type of linked list, each node contains a reference to the next node, as well
as the previous node. Therefore, in a doubly-linked list, it is possible to traverse in the reverse direction
also, which is not possible in a singly-linked list. The structure of a doubly-linked list is shown in the

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


following figure.

The Doubly-linked List

Circular-linked list: It is similar to a singly-linked list where each node points to the next node in the
list. The difference lies with the last node where the last node points to the first node instead of pointing
to NULL . Therefore, circular-linked list neither has any beginning nor has any end. The structure of the
circular-linked list is shown in the following figure.

The Circular-linked List

In a circular-linked list, usually a variable is used to store the address of the last node of the list. The
address of the first node of the list can be obtained from the last node of the list as the last node points
to the first node.

A doubly-linked list in which the last node contains the address of the first node and the first node
contains the address of the last node is called a doubly circular-linked list.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Implementing a Singly-linked Lists
A singly-linked list is a simple data structure that acts as a base for the various other data structures. It provides a
convenient way to organize and represent data for efficient manipulation in computer applications.

The various operations implemented on a linked list are insert, delete, traverse, and search.

Representing a Singly-linked List

A singly-linked list is represented in a program by defining the following classes:

A class that represents a node in a linked list: A node is the basic building block in a linked list. To
implement a linked list in a computer program, you can create a class named Node that represents a
node in a linked list. This class contains the data members of varying data types, which represent the
data to be stored in the linked list. In addition to this, it also contains the reference of the class type,
Node , to hold the reference of the next node in the sequence. Consider the following declaration of the
Node class in C# and C++:
// Code in C#

class Node

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


public int info;

public Node next;// Variable containing the address of the

// next node in the sequence

// Code in C++

class Node

public:

int info;

Node *next;// Pointer to the next node in thesequence

};

In C++, you need a special type of variable called a pointer to hold the address of an object. The
pointer is declared as:
<Type> *<variable_name>;
Here, the pointer variable, <variable_name>, can hold the address of an object of the type,
<Type>.
In C#, you do not need to use a pointer to store the address of an object. This is because the object name
implicitly refers to the address of the object.

In the preceding Node class declaration, the node contains only one data element, that is, an integer.
However, a node can also contain multiple data elements.

For example, to store the details of the students in a class, you can declare the Node class that contains
the details of a student, such as name, roll number, and marks. Consider the following declaration of a
class named Node in C# and C++:

// Code in C#

class Node

public int roll_no;

public string name;

public float marks;

public Node next;

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


// Code in C++

class Node

public:

int roll_no;

char name[50];

float marks;

Node *next;

};

The class that represents a node in a linked list may not necessarily be named as Node. You can
assign any name to such a class.

A class that represents a linked list: This class consists of a set of operations, which are implemented
on a linked list. These operations are insertion, deletion, search, and traversal. It also contains the
declaration of the variable/pointer, START , which always points to the first node in the list. When the list
is empty, START points to NULL . Consider the following C# and C++ declarations of the class named
List that implement the various operations on a linked list:
// Code in C#

class List
{
private Node START;
List()
{
START = NULL;
}
public void addNode(int element)
{
/* statements */
}
public bool search(int element, ref Node previous, ref Node
current)
{
/* statements */
}
public bool delNode(int element)
{

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


/* statements */
}
public void traverse()
{
/* statements */
}
}
// Code in C++

class List
{
Node * START;
public:
List()
{
START = NULL;
}
void addNode(int element)
{
/* statements */
}
bool search(int element, Node *previous, Node *current)
{
/* statements */
}
bool delNode(int element)
{
/* statements */
}
void traverse()
{
/* statements */
}
};

The class that represents a linked list may not necessarily be named as List. You can assign any
name to such a class.

Traversing a Singly-linked List

Traversing a singly-linked list refers to the process of visiting each node of the list starting from the beginning. A
singly-linked list allows traversal in one direction only.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


The following algorithm depicts the logic of traversing a singly-linked list:

1. Make currentNode point to the first node in the list .


2. Repeat the steps, 3 and 4, until currentNode becomes NULL .
3. Display the information contained in the node marked as currentNode.
4. Make currentNode point to the next node in the sequence .

Inserting a Node in a Singly-linked List

Insertion in a singly-linked list refers to the process of adding a new node in the list or creating a new linked list
if it does not exist. Therefore, to insert a node in a linked list, you need to first check whether the list is empty or
not.

If the linked list is empty, the following algorithm depicts the logic to insert a node in the linked list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. Make the next field of the new node point to NULL .
4. Make START point to the new node .

The process of inserting a node in an empty list is illustrated in the following table.

Operation Illustration

Allocate memory for a new node.

Assign the value to the data field of the


new node.

Make the next field of the new node point


to NULL.

Make START point to the new node.

The Insert Operation in an Empty Singly-linked List

If the linked list is not empty, you may need to insert a node at any of the following positions in the list:

Beginning of the list

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


End of the list
Between two nodes in the list

The place where you insert an element in the list would depend on the problem at hand. For example, if you have
to write a program to generate and store a list of prime numbers between 1 and 10,00,000, and then display them
in the same order in which they were generated, you need to insert all the nodes at the end of the list.

However, if you have to display the prime numbers in the reverse order, you need to insert all the nodes at the
beginning of the list.

Again, consider that you are given a list of student records that needs to be stored in the ascending order of
marks. In this case, you may need to insert a new record at any position in the list, including the beginning of the
list, end of the list, or between any two nodes in the list.

Let us now write algorithms for inserting a node at the various positions in a linked list.

Inserting a Node at the Beginning of the List


The following algorithm depicts the logic to insert a node at the beginning of the linked list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. Make the next field of the new node point to START (that is the first node in the
list) .
4. Make START point to the new node .

The sequence in which the steps of an algorithm are executed is important. For example, the preceding
algorithm will not work correctly if sequence of the steps, 3 and 4, is reversed. This is because if you execute the
step, 4, first, START will point to the new node and the address of the first node in the linked list will be lost.
Therefore, you will not be able to link the new node to the first node in the linked list.

The process of inserting a node at the beginning of a list is illustrated in the following table.

Operation Illustration

Allocate
memory
and
assign the
value to
the data
field of
the new
node.

Make the
next field

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


of the new
node
point to
the first
node in
the list.

Make
START
point to
the new
node.

The Insertion of a Node at the Beginning of a Singly-linked List

Inserting a Node at the End of the List


The following algorithm depicts the logic to insert a node at the end of the linked list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. If START is NULL, then:
a. Make START point to the new node .
b. Go to the step, 6.
4. Locate the last node in the list, and mark it as currentNode. To locate the last
node in the list, execute the following steps:
a. Mark the first node as currentNode.
b. Repeat the step, c, until the successor of currentNode becomes NULL .
c. Make currentNode point to the next node in the sequence .
5. Make the next field of currentNode point to the new node .
6. Make the next field of the new node point to NULL .

Successor of a node in a linked list refers to the next node in the sequence of the node in the linked list.

The process of inserting a node at the end of a list is illustrated in the following table.

Operation Illustration

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Allocate memory
and assign the
value to the data
field of the new
node.

Locate the last


node in the list,
and mark it as
currentNode.

Make the next


field of
currentNode
point to the new
node.

Make the next


field of the new
node point to
NULL.

The Insertion of a Node at the End of a Singly-linked List

From the preceding algorithm, it is clear that inserting a node at the end of the linked list requires you to traverse
to the last node of the linked list. For instance, you have to solve a problem that always requires you to insert
data at the end of the linked list. In such a case, whenever you have to insert a node at the end of the linked list,
you have to traverse to the last node. If the list is long, this can be time consuming.

In such a case, it would be useful to have the variable/pointer, LAST , which always contains the address of the
last node in the list.

This is shown in the following figure.

The Singly-linked List with LAST Variable/Pointer

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


The following algorithm is the modified algorithm that depicts the logic to insert a node at the end of the list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. If START is NULL (if the list is empty), then:
a. Make START point to the new node .
b. Make LAST point to the new node .
c. Go to the step, 6.
4. Make the next field of LAST point to the new node .
5. Mark the new node as LAST .
6. Make the next field of the new node point to NULL .

Inserting a Node Between Two Nodes in the List


Consider that you have to store a set of student records in the increasing order of marks. In such a situation, you
may need to insert the new node at any position in the linked list. You have already seen how to insert a node at
the beginning and at the end of a linked list. Now, you will see how a node can be inserted between two nodes in
an ordered linked list.

The following algorithm depicts the logic for inserting a node between two nodes in an ordered linked list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. If START is NULL (if the list is empty), then:
a. Make the next field of the new node point to NULL .
b. Make START point to the new node .
c. Exit .
4. Identify the nodes between which the new node is to be inserted. Mark them as
previous and current. To locate previous and current, execute the following steps:
a. Make current point to the first node .
b. Make previous point to NULL .
c. Repeat the steps, d and e, until current.info becomes greater than
newnode.info or current becomes equal to NULL .
d. Make previous point to current.
e. Make current point to the next node in the sequence .
5. Make the next field of the new node point to current .
6. If previous is not NULL:
a. Make the next field of previous point to the new node .
b. Exit .
7. Make START point to the new node .

Consider the linked list, as shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


The Singly-linked List Before Insertion of a Node

The process of inserting the node, 16, in the preceding list is illustrated in the following table.

Operation Illustration

Allocate memory
and assign the
value to the data
field of the new
node.

Identify the nodes


between which
the new node is
to be inserted.
Mark them as
previous and
current.

Make the next


field of the new
node point to
current.

Make the next


field of previous
point to the new

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


node.

The Insertion of a Node Between Two Nodes in a Singly-linked List

Before inserting a node between two nodes in a linked list, you need to implement a search operation to
place the previous and current pointers on the nodes between which the new node is to be inserted.
After the search operation, if the current is found to have a NULL value, it means that the new node is to be
inserted at the end of the list. The preceding algorithm will work in that case as well. This means that the
algorithm will insert the node at the end of the linked list. Therefore, if you have created a function for inserting a
node between two nodes in a list, you do not need to create a separate function for inserting a node at the end of
the list.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Deleting a Node from a Singly-linked List

Deletion of a node refers to the process of deleting a node from the linked list. Before implementing a delete
operation, you need to check whether the list is empty or not. If the list is empty, an error message needs to be
shown.

If the list is not empty, you need to first search the node to be deleted. If the specified node is not found in the
list, an error message needs to be shown.

You can delete a node from one of the following places in a linked list:

Beginning of the list


Between two nodes in the list
End of the list

Deleting a Node from the Beginning of the List


The following algorithm depicts the logic to delete a node from the beginning of a list:

1. Mark the first node in the list as current.


2. Make START point to the next node in the sequence .
3. Release the memory for the node marked as current.

The process of deleting a node from the beginning of a list is illustrated in the following table.

Operation Illustration

Mark the first


node in the list
as current.

Make START
point to the next
node in the
sequence.

Release the
memory for the
node marked as
current.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


The Deletion of a Node from the Beginning of a Singly-linked List

Deleting a Node Between Two Nodes in the List


To delete a node between two nodes in the list, you first need to search the node to be deleted. The following
algorithm depicts the logic to delete a node between two nodes in the list:

1. Locate the node to be deleted. Mark the node to be deleted as current and its
predecessor as previous. To locate current and previous, execute the following
steps:
a. Set previous = START .
b. Set current = START .
c. Repeat the steps, d and e, until the value of current matches the value to be
deleted or current becomes NULL .
d. Make previous point to current.
e. Make current point to the next node in the sequence .
2. If current is NULL:
a. Display Value not found in list .
b. Exit .
3. If current points to the first node of the list:
a. Make START point to the next node in the sequence .
b. Go to the step, 5.
4. Make the next field of previous point to the successor of current .
5. Release the memory for the node marked as current .

The process of deleting a node between two nodes in a list is illustrated in the following table.

Operation Illustration

Locate the node


to be deleted.
Mark the node to
be deleted as
current and its
predecessor as
previous.

Make the next

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


field of previous
point to the
successor of
current.

Release the
memory for the
node marked as
current.

The Deletion of a Node Between Two Nodes in a Singly-linked List

Before deleting a node between two nodes in the list, you need to perform a search operation to place the
pointer, current, on the node to be deleted, and the pointer, previous, on the node preceding current.
After the search operation, if current points to the last node in the list, it means that the node to be deleted is the
last node in the list. The preceding algorithm will work in that case as well. Therefore, you do not need to create
a separate algorithm for deleting a node from the end of the linked list.

There is a slight difference in the delete operation of a node in C# and C++.


In C++, the deletion of a node involves the following phases:

Logical deletion: It refers to the process of removing the node from the linked list in such a manner so
that the node to be deleted is not pointed by any other node in the list. However, at this stage, the
memory occupied by the node is not released.
Physical deletion: It refers to the process in which the memory occupied by the node is released.

In C#, there is no need to explicitly deallocate memory occupied by the node. The garbage collector in C# frees
the programmer from the task of explicitly releasing the memory of a node. The garbage collector periodically
checks the objects that are being used by the application. It automatically releases the memory for all those
objects, which are not being referenced by any other object.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]
Activity 4.1: Implementing a Singly-linked List

Implementing a Doubly-linked List


Consider that you need to implement a singly-linked list to store the marks of students in the ascending order. To
display the marks of students in the ascending order, you can simply traverse the list starting from the first node.

Now, consider another case where you need to display these marks in the descending order. This problem could
be easily solved if you could traverse the list in the reverse direction. However, as each node in a singly-linked
list contains the address of the next node in the sequence, traversal is possible in the forward direction only.

To solve this problem, each node in a linked list can be made to hold the reference of the preceding node, in
addition, to its next node in the sequence. Such a type of linked list is known as a doubly-linked list. In a doubly-
linked list, each node contains the address of its next node, as well as its previous node. This allows the
flexibility to traverse in both the directions.

The various operations in a doubly-linked list include insertion, deletion, search, and traversal.

Representing a Doubly-linked List

A doubly-linked list in a program can be represented by declaring the following classes:

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


A class that represents a node in a doubly-linked list: In a doubly-linked list, each node needs to
store:
Information
The address of the next node in the sequence
The address of the previous node
In order to represent each node of a doubly-linked list, you can create the class named Node that
contains data members of varying data types to store information associated with the node. In addition,
it contains the two variables/pointers, next and prev , to hold the address of the next node and the
previous node, respectively. The first node in the list contains NULL in its prev variable/pointer and the
address of the next node in its next variable/pointer. Similarly, the last node in the list contains NULL in
its next variable/pointer and the address of the previous node in its prev variable/pointer.
Refer to the following figure for the structure of a node in a doubly-linked list.

The Structure of a Node in a Doubly-linked List

Consider the following declaration of the class named Node in C# and C++:

// Code in C#

class Node
{
public int info;
public Node next;
public Node prev;
}
// Code in C++

class Node
{
public:
int info;
Node * next;
Node * prev;
};

The class that represents a node in a doubly-linked list may not necessarily be named as Node.
You can assign any name to such a class.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


A class that represents a doubly-linked list: This class consists of a set of operations implemented on
a linked list. In addition, it declares the variable/pointer, START , which will always point to the first node
in the list. Consider the following declaration of the class named DoubleLinkedList in C# and C++:
// Code in C#

class DoubleLinkedList
{
Node START;
DoubleLinkedList(){}
public void addNode(int element){}
public bool search(int element, ref Node previous, ref Node
current){}
public bool delNode(int element){}
public void traverse() {}
public void revtraverse(){}
}
// Code in C++

class DoubleLinkedList
{
Node *START;
public:
DoubleLinkedList(){}
void addNode(int element) {}
bool search(int element, Node *previous, Node *current) {}
bool delNode(int element) {}
void traverse(){}
void revtraverse(){}};

The class that represents a doubly-linked list may not necessarily be named as DoubleLinkedList.
You can assign any name to such a class.

In addition to the variable/pointer, START, which holds the address of the first node in the list,
you may also want to declare the variable/pointer, LAST, which holds the address of the last node in the
list. The declaration of this variable/pointer depends on the problem at hand. If the problem requires you
to frequently insert and delete elements at the end of the list, or traverse the list in the reverse order, it
will be prudent to use the variable/pointer, LAST, that holds the address of the last node in the list.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Traversing a Doubly-linked List

A doubly-linked list enables you to traverse the list in the forward direction, as well as in the backward direction.

The following algorithm depicts the logic for traversing a doubly-linked list in the forward direction:

1. Mark the first node in the list as currentNode.


2. Repeat the steps, 3 and 4, until currentNode becomes NULL .
3. Display the information contained in the node marked as currentNode.
4. Make currentNode point to the next node in the sequence .

Please note that the algorithm for traversing a doubly-linked list in the forward direction is same as that for a
singly-linked list.

The following algorithm depicts the logic for traversing a doubly-linked list in the backward direction:

1. Mark the last node in the list as currentNode.


2. Repeat the steps, 3 and 4, until currentNode becomes NULL .
3. Display the information contained in the node marked as currentNode.
4. Make currentNode point to the node preceding it .

Inserting a Node in a Doubly-linked List

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Insertion involves adding a new node to an existing list or creating a new list if one does not exist. Therefore, to
insert a node in a doubly-linked list, you need to first check whether the list is empty or not.

If the list is empty, the following algorithm can be used to insert a node in the linked list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. Make the next field of the new node point to NULL .
4. Make the prev field of the new node point to NULL .
5. Make START point to the new node .

Once the first node is inserted, the subsequent nodes can be inserted at any of the following positions:

Beginning of the list


Between two nodes in the list
End of the list

Now, you will learn to write algorithms for inserting a node at the various positions in a linked list.

Inserting a Node at the Beginning of the List


The following algorithm depicts the logic for inserting a node at the beginning of a doubly-linked list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. Make the next field of the new node point to the first node in the list .
4. Make the prev field of START point to the new node .
5. Make the prev field of the new node point to NULL .
6. Make START point to the new node .

Inserting a Node Between Two Nodes in the List


To insert a node between two nodes, you first need to search the nodes between which the new node is to be
inserted and mark them as previous and current. The following algorithm depicts the logic for inserting a node
between two nodes in a doubly-linked list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. If START is NULL (if the list is empty), then:
a. Make the next field of the new node point to NULL .
b. Make the prev field of the new node point to NULL .
c. Make START point to the new node .
4. Identify the nodes between which the new node is to be inserted. Mark them as
previous and current, respectively. To locate previous and current, execute the
following steps:

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


a. Make current point to the first node .
b. Make previous point to NULL .
c. Repeat the steps, d and e, until current.info > newnode.info or current =
NULL .
d. Make previous point to current .
e. Make current point to the next node in the sequence .
5. Make the next field of the new node point to current .
6. Make the prev field of the new node point to previous .
7. Make the prev field of current point to the new node .
8. If previous is not NULL:
a. Make the next field of previous point to the new node .
9. If previous is NULL:
a. Make START point to the new node .

The process of inserting a node between two nodes in a doubly-linked list is illustrated in the following table.

Operation Illustration

Allocate memory
for the new node,
and assign the
value to the data
field of new
node.

Identify the nodes


between which
the new node is
to be inserted.
Mark them as
previous and
current.

Make the next


field of new node
point to current.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Make the prev
field of new node
point to previous.

Make the prev


field of current
point to the new
node.

Make the next


field of previous
point to the new
node.

The Insertion of a Node Between Two Nodes in a Doubly-linked List

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


If after the initial search operation, current is found to be NULL , it means that the node is to be inserted at the end
of the list. In this case, the step, 7, will give an error. This is because NULL cannot have the prev field. Therefore,
the preceding algorithm cannot be used to insert a node at the end of the list.

However, you can modify the preceding algorithm to solve this problem. Consider the following algorithm:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. If START is NULL (if the list is empty):
a. Make the next field of the new node point to NULL .
b. Make the prev field of the new node point to NULL .
c. Make START point to the new node .
4. Identify the nodes between which the new node is to be inserted. Mark them as
previous and current, respectively. To locate previous and current, execute the
following steps:
a. Make current point to the first node .
b. Make previous point to NULL .
c. Repeat the steps, d and e, until current.info > newnode.info or current =
NULL .
d. Make previous point to current.
e. Make current point to the next node in the sequence .
5. Make the next field of the new node point to current.
6. Make the prev field of the new node point to previous .
7. If current is not NULL:
a. Make the prev field of current point to the new node .
8. If previous is not NULL:
a. Make the next field of previous point to the new node .
9. If previous is NULL:
a. Make START point to the new node .

The preceding algorithm can now also be used to insert nodes at the end of the linked list. The process of
inserting a node at the end of the linked list is illustrated in the following table.

Operation Illustration

Allocate
memory and
assign the
value to the
data field of
the new node.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Make the next
field of the
new node
point to
current that is
NULL in this
case.

Make the prev


field of the
new node
point to
previous.

Make the next


field of
previous point
to the new
node.

The Insertion of a Node at the End of a Doubly-linked List

Inserting a Node at the End of the List


The algorithm for inserting a node between two nodes in a doubly-linked list can also be used to insert a node at
the end of the list. However, if you always have to insert a node at the end of the list only, it will be better to
have the variable/pointer, LAST , which contains the address of the last node in the list. You can then use the
following algorithm to insert a node at the end of the list:

1. Allocate memory for the new node .


2. Assign a value to the data field of the new node .
3. If LAST is not NULL:
a. Make the next field of the node marked as LAST point to the new node .
4. Make the prev field of the new node point to the node marked LAST .
5. Make the next field of the new node point to NULL .
6. Mark the new node as LAST .

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Deleting Nodes from a Doubly-linked List

The delete operation in a doubly-linked list is a bit different from that of a singly-linked list. This is because in
contrast to a singly-linked list, each node in a doubly-linked list has an additional field pointing to its previous
node. Therefore, you need to adjust both the fields while performing the delete operation.

Before performing the delete operation, you first need to check whether the list is empty. If the list is empty, an
error message is shown.

However, if the list is not empty, you need to identify the position of the node to be deleted in the list. You can
delete a node from one of the following places in a doubly-linked list:

Beginning of the list


Between two nodes in the list
End of the list

Deleting a Node from the Beginning of the List


The following algorithm depicts the logic to delete a node from the beginning of the doubly-linked list:

1. Mark the first node in the list as current.


2. Make START point to the next node in the sequence .
3. If START is not NULL(if the deleted node was not the only node in the list), then:
a. Assign NULL to the prev field of the node marked as START .
4. Release the memory of the node marked as current.

The process of deleting a node from the beginning of the doubly-linked list is illustrated in the following table.

Operation Illustration

Mark the first


node in the list as
current.

Make START
point to the next
node in the
sequence.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Assign NULL to
the prev field of
the node marked
as START.

Release the
memory of the
node marked as
current.

The Deletion of a Node from the Beginning of the Doubly-linked List

Deleting a Node Between Two Nodes in the List


To delete a node between two nodes in the list, you first need to search the node to be deleted. Once found, mark
the node to be deleted as current, and its predecessor as previous. The following algorithm deletes a node from
the middle of the doubly-linked list:

1. Mark the node to be deleted as current and its predecessor as previous . To locate
previous and current, execute the following steps:
a. Make previous point to NULL . // Set previous = NULL
b. Make current point to the first node in the linked list(that is, Set current
= START).
c. Repeat the steps, d and e, until either the value of current is same as the
value to be deleted or current becomes NULL .
d. Make previous point to current.
e. Make current point to the next node in the sequence .
2. If current is NULL:
a. Display Value to be deleted not found in the list .
b. Exit .
3. If previous is not NULL:
a. Make the next field of previous point to the successor of current .
4. If previous is NULL:
a. Make START point to its successor .
5. Make the prev field of the successor of current point to previous .
6. Release the memory of the node marked as current.

The process of deleting a node between two nodes in the doubly-linked list is illustrated in the following table.

Operation Illustration

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Make the next
field of previous
point to the
successor of
current.

Make the prev


field of the
successor of
current point to
previous.

Release the
memory of the
node marked as
current.

The Deletion of a Node Between Two Nodes in the Doubly-linked List

If after the initial search operation, the next field of current is found to have the NULL value, it means that the
node is to be deleted from the end of the list. In this case, the step, 3, will give an error. This is because the
successor of current is NULL and therefore, it cannot have the prev field. Therefore, the preceding algorithm
cannot be used to insert a node at the end of the list.

However, you can modify the preceding algorithm to solve this problem. Consider the following algorithm:

1. Mark the node to be deleted as current and its predecessor as previous .


2. If current is NULL:
a. Display Value to be deleted not found in the list .
b. Exit .
3. If previous is not NULL:
a. Make the next field of previous point to the successor of current.
4. If previous is NULL:
a. Make START point to its successor .
5. If the successor of current exists:
a. Make the prev field of the successor of current point to previous .
6. Release the memory of the node marked as current.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Summary
In this chapter, you learned that:

Dynamic memory allocation refers to the process of allocating memory on the basis of the need at
runtime.
A linked list is a chain of elements in which each element consists of data, as well as a link to the next
element.
Based on how the various nodes are connected to each other, linked lists can be of the following types:
Singly-linked list
Doubly-linked list
Circular-linked list
A singly-linked list is a simple data structure that acts as a base for the various other data structures.
A singly-linked list is represented in a program by defining the following classes:
A class that represents a node in a linked list
A class that represents a linked list
A doubly-linked list in a program can be represented by declaring the following classes:
A class that represents a node in a doubly-linked list
A class that represents a doubly-linked list

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Reference Reading

Introduction to Linked Lists


Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://en.wikipedia.org/wiki/Linked_list


Design by Robert L. Kruse http://cse.iitkgp.ac.in/pds/semester/2009a/slides/l9-linkedlist.pdf
Data Structures Using C and
C++ by Aaron M. Tenenbaum

Implementing a Singly-linked Lists


Reference Reading: Books Reference Reading: URLs

Data Structures Using C and http://en.wikipedia.org/wiki/Linked_list


C++ by Aaron M. Tenenbaum http://www.brpreiss.com/books/opus5/html/page97.html

Implementing a Doubly-linked List


Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://en.wikipedia.org/wiki/Doubly_linked_list


Design by Robert L. Kruse http://staff.science.uva.nl/~heck/JAVAcourse/ch4/sss1_2_3.html
Data Structures Using C and
C++ by Aaron M. Tenenbaum
An Introduction to Data
Structures with Applications
by Jean-Paul Tremblay and
Paul G.Sorenson

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_04.htm[09/05/2014 10:17:16 AM]


Chapter 5
Solving Programming Problems Using Stacks and Queues

At times, while solving programming problems, you may need to store or retrieve a list of items. Consider a
situation where you need to maintain a list of student records in the ascending order of their names. To maintain
such a list, you may need to insert or delete records from any position in the list. There can be a situation where
you want to implement this list in such a way that allows items to be inserted and deleted at only one end of the
list. This kind of a list can be implemented by using a stack.

Moreover, there are programming problems that require you to implement a list in such a way that items can be
retrieved in the same order in which they are inserted in the list. Such a list can be implemented by using a data
structure called queue.

This chapter discusses the concept and implementation of stacks and queues.

Objectives

In this chapter, you will learn to:

Solve programming problems by using stacks


Solve programming problems by using queues

Solving Programming Problems by Using Stacks


Consider an example of a card game called Rummy. To start the game, a stock pile and a discard pile are placed
in the center. A player can draw either the topmost card of the stock pile or the topmost card of the discard pile.
If the drawn card does not make a valid sequence in the players hand, the player can discard the card by placing
it on the top of the discard pile.

The next player can then draw either the topmost card of the stock pile or the topmost card of the discard pile,
and so on.

To represent and manipulate such type of a discard pile in a computer program, you need a data structure that
allows insertion and deletion at only one end. It should ensure that the last item inserted is the first one to be
removed. A data structure that implements this concept is called a stack.

This section discusses the stack data structure and explains the operations that can be performed on a stack.

Defining a Stack

A stack is a collection of data items that can be accessed at only one end, which is called top. This means that the
items are inserted and deleted at the top. The last item that is inserted in a stack is the first one to be deleted.
Therefore, a stack is called a Last-In-First-Out (LIFO) data structure.

A stack is like an empty box containing books, which is just wide enough to hold the books in one pile. The

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


books can be placed, as well as removed, only from the top of the box. The book most recently put in the box is
the first one to be taken out. The book at the bottom is the first one to be put inside the box and the last one to be
taken out.

The following figure shows a stack of books.

A Stack of Books

The characteristics of stacks are:

Data can only be inserted on the top of the stack.


Data can only be deleted from the top of the stack.
Data cannot be deleted from the middle of the stack. All the items from the top first need to be
removed.

Identifying the Operations on Stacks

The following two basic operations can be performed on a stack:

PUSH
POP

When you insert an item into a stack, you say that you have pushed the item into the stack. When you delete an
item from a stack, you say that you have popped the item from the stack.

The following table depicts the PUSH and POP operations on a stack.

Operation Stack Contents

Push book 1 into an empty


stack

Push book 2 into the stack

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


Pop a book from the stack

Push book 3 into the stack

Push book 4 into the stack

Pop book 3 from the stack Invalid Operation. To pop book 3,


you need to first pop book 4 from
the stack.

Pop a book from the stack

The PUSH and POP Operations on a Stack

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


Implementing a Stack

A stack is simply a list in which insertion and deletion is allowed only at one end, which is known as the top of
the stack. A stack can be implemented by using an array or a linked list.

The scope of this course is limited to implementing a stack by using a linked list.

When a stack is implemented by using a linked list, the stack is said to be dynamic. In this case, memory is
dynamically allocated to the stack, and the size of the stack can grow and shrink at runtime.

To represent a stack as a linked list, you need to first declare a class to represent a node in the linked list. The
following code snippets give the C# and C++ declarations of a class named Node that represents a node in a
linked list:

// Code in C#
class Node
{
public int info;
public Node next;
public Node(int i, Node n)
{

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


info = i;
next = n;
}
}
// Code in C++

class Node
{
public:
int info;
Node *next;
public:
Node(int i, Node* n)
{
info = i;
next = n;
}
};
After representing a node of a stack, you need to declare a class to implement operations on a stack. In this class,
you also need to declare a variable/pointer to hold the address of the topmost element in the stack and initialize
this variable/pointer to contain the value, NULL .

The following code snippets give the C# and C++ declarations of a class named Stack that implements the
operations on a stack:

// Code in C#

class Stack
{
Node top;
public Stack()
{
top = null;
}
bool empty()
// Returns true if the stack is empty, false otherwise
{
// Statements
}
public void push(int element)
{
// Statements
}
public void pop()

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


{
// Statements
}
}
// Code in C++

class Stack
{
private:
Node *top;
public:
Stack()
{
top = NULL;
}
int empty()
{
// Statements
}
void push( int element)
{
// Statements
}
int pop()
{
// Statements
}
};
After declaring the class to implement the operations on the stack, you need to implement the PUSH and POP
operations.

Implementing the PUSH Operation


To implement the PUSH operation, you need to insert a node at the beginning of the linked list. The following
algorithm depicts the logic for the PUSH operation:

1. Allocate memory for the new node .


2. Assign value to the data field of the new node .
3. Make the next field of the new node point to top .
4. Make top point to the new node .

Implementing the POP Operation


To implement the POP operation, you need to delete a node from the beginning of the linked list. The following

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


algorithm depicts the logic for the POP operation:

1. Make a variable/pointer tmp point to the topmost node .


2. Retrieve the value contained in the topmost node .
3. Make top point to the next node in sequence .
4. Release memory allocated to the node marked by tmp .

You can also implement the POP operation by simply making top point to the next node in sequence.
However, in this case, the memory will not be released after popping the element.

When deleting a node, you need to check whether the stack contains any element. If you attempt to pop an
element from an empty stack, there is an underflow. Therefore, before popping an element from the stack, you
need to check the stack empty condition. The condition for stack empty is:

top = NULL

If the stack empty condition is true, the POP operation should not be performed.

The following modified algorithm depicts the logic for the POP operation:

1. If top = NULL:
a. Display Stack Underflow: Cannot delete from an empty stack.
b. Exit .
2. Make tmp point to the topmost node .
3. Retrieve the value contained in the topmost node .
4. Make top point to the next node in sequence .
5. Release the memory allocated to the node marked as tmp .

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]
Activity 5.1: Solving Programming Problems by Using Stacks

Solving Programming Problems by Using Queues


Consider the scenario of a bank where there is only one counter to resolve customer queries. Therefore,
customers have to stand in a queue to get their queries resolved. To avoid the inconvenience caused to the
customers, the bank has decided to set up a new system.

According to this system, whenever a customer visits the bank, a request entry will be made into the system and
the customer will be given a request number. The requests will be stored in the system in the order in which they
are received. The request number of the earliest request will be automatically flashed on the query counter to
indicate that the customer with that request number can come next to get his/her query resolved. When a
customers request has been processed, the request will be removed from the system.

To implement such a system, you need a data structure that stores and retrieves the requests in the order of their
arrival. A data structure that implements this concept is a queue.

Defining Queues

A queue is a list of elements in which items are inserted at one end of the queue and deleted from the other end

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


of the queue. You can think of a queue as an open ended pipe, with elements being pushed from one end and
coming out of another. The end at which elements are inserted is called the rear, and the end from which the
elements are deleted is called the front. A queue is also called a First-In-First-Out (FIFO) list because the first
element to be inserted in the queue is the first one to be deleted. The following figure represents a queue.

The Structure of a Queue

The queue data structure is similar to the queues in real life. Consider the scenario of a cafeteria where there is a
queue of customers at the counter. The customer standing first in the queue is served first. When new customers
arrive, they are made to stand at the rear end of the queue. After being served, a customer moves away from the
queue and the next person is there to be served.

Identifying the Various Operations on Queues

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


The following two types of operations can be performed on queues:

Insert: It refers to the addition of an item in the queue. Items are always inserted at the rear end of the
queue. Consider the queue, as shown in the following figure.

The Queue Before Insertion

In the preceding queue, there are five elements. The element, B, is labeled as FRONT to indicate that it is
the first element in the queue. Similarly, the element, D, is labeled as REAR to indicate that it is the last
element in the queue.
Now, suppose you want to add an item, F, in the queue. Since addition takes place at the rear end of the
queue, F will be inserted after D. Now, F becomes the rear end of the queue. Hence, we label F as REAR ,
as shown in the following figure.

The Queue After Insertion

Delete: It refers to the deletion of an item from a queue. Items are always deleted from the front of the
queue. When an item is deleted, the next item in the sequence becomes the front end of the queue.
Consider the queue, as shown in the following figure.

The Queue Before Deletion

On implementing a delete operation, the item, B, will be removed from the queue. Now, the item, A,

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


will become the new front end of the queue. Therefore, the item, A, is labeled as FRONT , as shown in the
following figure.

The Queue After Deletion

Implementing a Queue

You can implement a queue by using an array or a linked list. A queue implemented in the form of a linked list is
known as a linked queue.

The scope of this course is limited to implementing a queue by using a linked list.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


To represent a queue in the form of a linked list, you need to declare two classes:

A class to represent a node in the queue: This class represents a node in the linked queue, and is
similar to the node of a singly-linked list. Refer to the following C# and C++ declarations of a class
named Node that represents a node in a queue:
// Code in C#

class Node
{
public int data;
public Node next;
}
// Code in C++

class Node
{
public:
int data;
Node *next;
};
A class that represents the queue: This class implements all the operations on a queue, such as insert
and remove. In addition, it declares two variables/pointers, FRONT and REAR , which point to the first and
last elements in the queue, respectively. Initially, FRONT and REAR are made to point to NULL indicating
that the queue is empty. Refer to the following declarations of the LinkedQueue class in C# and C++:
// Code in C#

class LinkedQueue
{
Node FRONT, REAR;
public LinkedQueue()
{
FRONT = null;
REAR = null;
}
public void insert (int element){}
public void remove(){}
public void display(){}
}
// Code in C++

class LinkedQueue
{
Node * FRONT, * REAR;
public:
LinkedQueue()

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


{
FRONT = NULL;
REAR = NULL;
}
void insert(int element){}
void remove(){}
void display(){}
};
The structure of a linked queue is shown in the following figure.

The Linked Queue

After deciding the representation for the linked queue, you can start writing algorithms for inserting and deleting
elements in the queue.

Inserting an Element in a Linked Queue


An element is always inserted at the rear end of the queue. Therefore, you need to insert a new node at the rear
end of the linked queue. The following algorithm depicts the logic for inserting a node at the rear end of a linked
queue:

1. Allocate memory for the new node .


2. Assign value to the data field of the new node .
3. Make the next field of the new node point to NULL .
4. If the queue is empty, execute the following steps:
a. Make FRONT point to the new node .
b. Make REAR point to the new node .
c. Exit .
5. Make the next field of REAR point to the new node .
6. Make REAR point to the new node .

The process of inserting an element in a linked queue is shown in the following table.

Operation Illustration

Allocate
memory
and
assign
value to
the data

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


field for
the new
node.

Make the
next field
of the new
node
point to
NULL .

Make the
next field
of REAR
point to
the new
node.

Make
REAR
point to
the new
node.

The Insert Operation in a Linked Queue

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


Deleting an Element from a Linked Queue
An element is always deleted from the front end of a queue. Therefore, you need to delete a node from the front
end of the linked queue. The following algorithm depicts the logic for deleting a node from the front end of a
linked queue:

1. If the queue is empty: // FRONT = NULL


a. Display Queue empty.
b. Exit .
2. Mark the node marked FRONT as current.
3. Make FRONT point to the next node in its sequence .
4. Release memory for the node marked as current.

The process of deleting an element from a linked queue is shown in the following table.

Operation Illustration

Mark the node


marked FRONT as
current .

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


Make FRONT point to
the next node in its
sequence.

Release memory for


the node marked as
current .

The Delete Operation in a Linked Queue

Activity 5.2: Solving Programming Problems by Using Queues

Summary
In this chapter, you learned that:

A stack is a collection of data items that can be accessed at only one end, which is called top.
The last item inserted in a stack is the first one to be deleted.
A stack is called a LIFO data structure.
The following two basic operations can be performed on stacks:
PUSH
POP
A stack can be implemented by using an array or a linked list.
A queue is a list of elements in which items are inserted at one end of the queue and deleted from the
other end of the queue.
The end at which elements are inserted is called the rear, and the end from which the elements are
deleted is called the front.
A queue is called a FIFO data structure.
The two types of operations that can be performed on a queue are insert and delete.
A queue implemented by using a linked list is called a linked queue.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


Reference Reading

Solving Programming Problems by Using Stacks


Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://en.wikipedia.org/wiki/Stack_(data_structure)


Design by Robert L. Kruse http://www.cs.auckland.ac.nz/~jmor159/PLDS210/stacks.html
Data Structures Using C and
C++ by Aaron M. Tenenbaum
An Introduction to Data
Structures with Applications
by Jean-Paul Tremblay and
Paul G.Sorenson

Solving Programming Problems by Using Queues


Reference Reading: Books Reference Reading: URLs

An Introduction to Data http://en.wikipedia.org/wiki/Queue_(abstract_data_type)


Structures with Applications http://www.cs.uregina.ca/Links/class-info/210/Queue/#IMPLEMENTATION
by Jean-Paul Tremblay and
Paul G.Sorenson

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_05.htm[09/05/2014 10:23:05 AM]


Chapter 6
Solving Programming Problems Using Trees

Many programming problems require data to be stored in a hierarchical fashion. This can be done by using a
data structure called tree. A tree not only helps you represent the hierarchical relationship among data, but also
provides an efficient mechanism for data storage and retrieval.

This chapter introduces you to the basic features of a tree as a data structure. It discusses the implementation of a
specific type of tree called binary tree. It also explains a common variant of a binary tree called binary search
tree.

Objectives

In this chapter, you will learn to:

Store data in a tree


Implement a binary tree
Implement a binary search tree

Storing Data in a Tree


Consider a scenario where you need to represent the directory structure of your operating system. The directory
structure contains various folders and files. A folder may further contain any number of sub folders and files.
Such an arrangement enables you to keep the related files and folders together. The directory structure of an
operating system is shown in the following figure.

The Directory Structure

Suppose you want to represent a similar directory structure in the memory. It is difficult to represent the structure

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


linearly because all the items have a hierarchical relationship among themselves. To represent such a structure, it
is better to have a data storage mechanism that enables you to store data in a nonlinear fashion.

You can implement a tree to solve such a problem. Trees offer a lot of practical applications in the field of
computer science. For example, most of the modern operating systems have their file systems organized as a tree.

Defining Trees

A tree is a nonlinear data structure that represents the hierarchical relationship among the various data elements,
as shown in the following figure.

A Sample Tree Structure

Each data element in a tree is called a node. The topmost node of a tree is called a root. The nodes in a tree are
connected to other nodes through edges. The only way to get from one node to the other is to follow the path
along the edges.

Each node in a tree can have zero or more child nodes. However, each node in a tree has exactly one parent. An
exception to this rule is the root node, which has no parent. A node in a tree that does not have any child node is
called a leaf node.

In the tree structure that represents a hierarchical file system, the root directory is the root of the tree. The
directories that are one level below the root directory are the child nodes. Similarly, we can have various levels
of sub directories, thereby forming a hierarchical structure. Files, on the other hand, will have no children.
Therefore, they can be termed as the leaf nodes.

Tree Terminology

There are various terms that are frequently used while working with trees. Let us explain these terms by using a
tree, as shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


A Tree

The following terms are associated with a tree:

Leaf node: A node with no children is called a leaf node. They are also known as terminal nodes. In the
preceding figure, E, F, G, H, I, J, L, and M are the leaf nodes.
Subtree: A portion of a tree, which can be viewed as a tree in itself, is called a sub tree. In the
preceding figure, the tree starting at node B, containing nodes E, F, G, and H, is a sub tree of the
complete tree. A sub tree can also contain only one node, which is called the leaf node. In other words,
all the leaf nodes are sub trees, but all the sub trees are not leaf nodes.
Children of a node: The roots of the sub trees of a node are called the children of the node. In the
preceding figure, E, F, G, and H are the children of node B, and B is the parent of nodes: E, F, G, and
H. Similarly, J and K are the children of node D, and D is the parent of J and K.
Degree of a node: The number of sub trees of a node is called the degree of a node. In the preceding
figure:
Degree of node A is 3.
Degree of node B is 4.
Degree of node C is 1.
Degree of node D is 2.
Edge: A link from the parent to the child node is known as an edge. It is also known as a branch. A tree
with n nodes has n 1 edges.
Siblings/Brothers: Children of the same node are called siblings of each other. In the preceding figure:
Nodes, B, C, and D, are siblings of each other.
Nodes, E, F, G, and H, are siblings of each other.
Nodes, L and M, are siblings of each other.
Internal node: An intermediate node between the root node and the leaf node is called an internal node.
It is also known as a nonterminal node. In the preceding figure, nodes, B, C, D, and K, are internal
nodes.
Level of a node: The distance (in number of nodes) of a node from the root is called the level of a
node. The root always lies at level, 0. As you move down the tree, the level of a node increases in such
a way that if a node is at level, n, its children are at level, n + 1. In the preceding tree, the level of node,
A, is 0; level of nodes, B, C, and D, is 1; level of nodes, E, F, G, H, I, J, and K, is 2; and level of
nodes, L and M, is 3.
Depth of a tree: The maximum number of levels in a tree is called the depth of a tree. In other words,
the depth of a tree is one more than the maximum level of the tree. The depth of a tree is also known as
the height of a tree. In the preceding figure, the depth of the tree is 4.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


The maximum number of nodes in a binary tree of depth, d, can be 2d 1.

Implementing a Binary Tree


A binary tree is a special type of tree that offers a lot of practical applications in the field of computer science.
This section discusses the concept and implementation of a binary tree.

Defining a Binary Trees

A binary tree is a specific type of tree in which each node can have a maximum of two children. These child
nodes are typically distinguished as the left and the right child. The structure of a binary tree is shown in the
following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


The Structure of a Binary Tree

In the preceding binary tree, node, B, is the left child of node, A, and node, C, is the right child of node, A.
Similarly, node, D, is the left child of node, B, and node, E, is the right child of node, B.

There are some variants of binary trees with some additional characteristics. These variants include:

Strictly binary tree: A binary tree is said to be strictly binary if every node, except the leaf nodes, has
non empty left and right children. An example of a strictly binary tree is shown in the following figure.

A Strictly Binary Tree

In the preceding binary tree, every non leaf node has precisely two children. Therefore, it is a strictly
binary tree.
Full binary tree: A binary tree of depth, d, is said to be a full binary tree if it has exactly 2d 1 nodes.
An example of a full binary tree is shown in the following figure.

A Full Binary Tree

Complete binary tree: A binary tree in which all levels, except possibly the deepest level, are
completely filled, and at the deepest level, all nodes are as far left as possible.
An example of a complete binary tree is shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


A Complete Binary Tree

Representing a Binary Tree

Binary trees can be implemented by using an array or linked lists.

The scope of this course is limited to implementing a binary tree by using a linked list.

In a linked representation of a binary tree, you declare two classes:

A class to represent a node of the tree: This class represents the structure of each node in a binary
tree. A node in a binary tree consists of the following parts:
Information: It refers to the information held by each node in a binary tree.
Left child: It holds the reference of the left child of the node.
Right child: It holds the reference of the right child of the node.
Consider the structure of a node in a linked binary tree, as shown in the following figure.

The Structure of a Node in a Linked Binary Tree

If a node does not have a left child or a right child or both, the respective left or right child
field(s) of that node point to NULL.

Refer to the following C# and C++ declarations of a class named Node that represents a node of a tree:
// Code in C#
class Node
{
public int info;
public Node lchild;
public Node rchild;

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


}
// Code in C++
class Node
{
public:
int info;
Node *lchild;
Node *rchild;
};
A class to represent the binary tree: This class implements various operations on a binary tree, such
as insert, delete, and traverse. The class also provides the declaration of the variable/pointer root, which
contains the address of the root node of the tree. Refer to the following C# and C++ declarations of a
class named BinaryTree that represents a binary tree in a program:
// Code in C#
class BinaryTree
{
public Node ROOT;
public BinaryTree()
{
ROOT = null;
}
public void insert(int element);
public void find(int element, ref Node parent, ref Node
currentNode);
public void inorder(ref Node ptr);
public void preorder(ref Node ptr);
public void postorder(ref Node ptr);
};
// Code in C++
class BinaryTree
{
public:
Node *ROOT;
BinaryTree()
{
ROOT = NULL;
}
void insert(int element);
void find(int element, Node **parent,Node **currentNode);
void inorder(Node *ptr);
void preorder(Node *ptr);
void postorder(Node *ptr);
};

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


Consider the linked representation of a binary tree, as shown in the following figure.

The Linked Representation of a Binary Tree

Traversing a Binary Tree

Traversal of nodes is one of the most common operations in a binary tree. Traversing a tree refers to the process
of visiting all the nodes of a tree once.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


There are three ways in which you can traverse a binary tree. These are inorder traversal, preorder traversal, and
postorder traversal. The sequence of traversal in the three traversal techniques is different.

Consider the binary tree, as shown in the following figure.

The Binary Tree

Now, let us discuss the inorder, preorder, and postorder traversals by referring to the preceding binary tree.

Inorder Traversal
The following steps can be used for traversing a binary tree in the inorder sequence:

1. Traverse the left sub tree.


2. Visit the root.
3. Traverse the right sub tree.

For performing the inorder traversal of the given binary tree, you need to start at the root, and first traverse each
node's left branch, then the node, and finally the node's right branch. This is a recursive process because each
nodes left and right branch is a tree in itself.

The root of the tree is A. Before visiting A, you must traverse the left sub tree of A. Therefore, you move to node
B. Now, before visiting B, you must traverse the left sub tree of B. Therefore, move to node, D. Now, before
visiting D, you must traverse the left sub tree of node, D. However, the left sub tree of D is empty. Therefore,
visit node, D.

After visiting node, D, you must traverse the right sub tree of node, D. Therefore, you move to node, H. Before
visiting H, you must traverse the left sub tree of H. However, H does not have a left sub tree. Therefore, visit
node, H.

After visiting node, H, you must traverse the right sub tree of node, H. The right sub tree of H is empty. Now the
traversal of the left sub tree of node, B, is complete. Therefore, visit node, B.

After visiting node, B, you must traverse the right sub tree of node, B. Therefore, move to node, E. Because E
does not have a left sub tree, visit E. Node, E, does not have a right sub tree either. Now, the traversal of the left
sub tree of A is complete. Therefore, visit node, A.

After visiting node, A, you must traverse the right sub tree of node, A. Therefore, move to node, C. Before

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


visiting C, you must visit the left sub tree of C. Therefore, move to node, F. F does not have a left sub tree, so
visit F. F does not have a right sub tree either. At this stage, the traversal of the left sub tree of C is complete.
Therefore, visit node, C.

After visiting node, C, you must traverse the right sub tree of C. Therefore, move to node, G. Before, visiting
node, G, you must traverse the left sub tree of node, G. Therefore, move to node, I. I does not have a left sub
tree. Therefore, visit I.

After visiting I, you must traverse the right sub tree of I. However, I does not have a right sub tree. The traversal
of the left sub tree of G is now complete. Therefore, visit node, G. G does not have a right sub tree. At this stage,
the traversal of the right sub tree of C is complete. Also, the traversal of the right sub tree of A is complete.

Therefore, the following sequence specifies the inorder traversal:

DHBEAFCIG

The following figure depicts the sequence of traversal in the inorder traversal.

The Inorder Traversal

The following algorithm depicts the logic of inorder traversal:

Algorithm: Inorder (root)

1. If (root = NULL):
a. Exit .
2. Inorder (left child of root) . // Recursive call to Inorder for traversing
// the left sub tree

3. Visit (root).
4. Inorder (right child of root) . // Recursive call to Inorder for traversing
// the right sub tree

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


Preorder Traversal
The following steps can be used for traversing a binary tree in the preorder sequence:

1. Visit the root.


2. Traverse the left sub tree.
3. Traverse the right sub tree.

For a preorder traversal, you must begin with the root node. The root of the tree is A. Therefore, visit node, A.
Now, move to the left sub tree of A. The root of the left sub tree of A is B. Hence, visit node, B. Move to the left
sub tree of B, and visit its root, D. Now, D does not have a left sub tree. Therefore, move to its right sub tree.
The root of its right sub tree is H. Therefore, visit H. Now move to the right sub tree of node, B, and visit E. This
finishes the traversal of the root and its left sub tree. Now, traverse the right sub tree of the root in a similar
fashion.

Therefore, the following sequence specifies the preorder traversal:

ABDHECFGI

The following figure depicts the sequence of traversal in the preorder traversal.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


The Preorder Traversal

The following algorithm depicts the logic of preorder traversal:

Algorithm: Preorder (root)

1. If (root = NULL):
a. Exit .
2. Visit (root).
3. Preorder (left child of root) . // Recursive call to Preorder for
// traversing the left sub tree
4. Preorder (right child of root) . // Recursive call to Preorder for
// traversing the right sub tree

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


Postorder Traversal
The following steps can be used for traversing a binary tree in the postorder sequence:

1. Traverse the left sub tree.


2. Traverse the right sub tree.
3. Visit the root.

For postorder traversal, the left and right sub trees of a node are traversed first, and then the node itself.
Therefore, begin by traversing the left sub tree of the root node, A. The root of its left sub tree is B. Therefore,
move to node, B.

From node, B, you need to move further to its left sub tree. The root of its left sub tree is D. Node, D, does not
have a left sub tree, but has a right sub tree. Therefore, move to its right sub tree. The root of its right sub tree,
which is H, does not have left and right sub trees. Therefore, H will be the first node to be visited.

After having visited H, the traversal of the right sub tree of D is complete. Therefore, you need to visit node, D.
Until now, the left sub tree of node, B, has been traversed. Now, you need to traverse the right sub tree of B
before you visit B. The root of its right sub tree is E, which does not have left and right sub trees. Hence, visit E,
and then visit B. The left sub tree of node, A, has been traversed. You now need to traverse its right sub tree in a
similar fashion before you visit node, A.

Therefore, the following sequence specifies the postorder traversal:

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


HDEBFIGCA

The following figure depicts the sequence of traversal in the postorder traversal.

The Postorder Traversal

Implementing a Binary Search Tree


Consider a scenario of a cellular phone company that maintains the information of millions of its customers
spread across the world. Each customer is assigned a unique identification number (ID). Individual customer

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


records can be accessed by referring to the respective ID. These IDs need to be stored in a sorted manner in such
a way that you can easily perform various transactions, such as search, insertion, and deletion.

Which data structure will you use to solve this problem? If you use an array, you can perform a faster search.
However, insertion or deletion will be difficult in that case. If you use a linked list, insertion and deletion will be
fast. However, the search operation will be time consuming, especially if the data to be searched is at the end of
the list.

In such a case, it is better to have a data storage mechanism, which provides the advantages of both, arrays as
well as linked lists. A special type of binary tree, which is known as a binary search tree, can be used in this case.
A binary search tree matches the search speed of arrays, and also offers efficient insertions and deletions as in the
case of a linked list.

Defining a Binary Search Tree

A binary search tree is a binary tree in which the value of the left child of a node is always less than or equal to
the value of the node, and the value of the right child of a node is always greater than the value of the node.
Consider the binary search tree, as shown in the following figure.

The Binary Search Tree

The root node contains 52. All the nodes in the left sub tree of the root have a value less than 52. Similarly, all
the nodes in the right sub tree of the root node have values greater than 52. The same holds true for other sub
trees as well.

The inorder traversal of a binary search tree gives you a sorted list of elements.

Searching a Node in a Binary Search Tree

The search operation in a binary search tree refers to the process of searching a specific value in the tree. The
following algorithm depicts the logic of searching a particular value in a binary search tree:

1. Make currentNode point to the root node .


2. If currentNode is null:
a. Display Not found.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


b. Exit .
3. Compare the value to be searched with the value of currentNode. Depending on the
result of the comparison, there can be three possibilities:
a. If the value is equal to the value of currentNode:
i. Display Found .
ii. Exit .
b. If the value is less than the value of currentNode:
i. Make currentNode point to its left child .
ii. Go to step 2.
c. If the value is greater than the value of currentNode:
i. Make currentNode point to its right child .
ii. Go to step 2.

In the preceding figure, suppose you want to search the value, 59. You begin the search from the root node, 52.
Compare 52 with 59. Since 59 > 52, move to the right child of 52, that is 68. Compare 59 with 68. Since 59 < 68,
move to the left child of 68, that is 59. Again, compare 59 with 59. The values are equal. This means that you
have found the node in the binary search tree.

From the preceding example, you can see that after every comparison, the number of elements to be searched
reduces to half. As a result, binary search trees provide quick access to data.

Inserting Nodes in a Binary Search Tree

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


The insert operation involves insertion of a new node into a binary search tree. The new node must be inserted in
such a way that the keys remain properly ordered. Before implementing an insert operation, you first need to
check whether the tree is empty or not. If the tree is empty, the new node to be inserted becomes the root of the
tree. However, if the tree is not empty, then you need to locate the position of the new node to be inserted in the
binary search tree. A new node is always inserted as a leaf node.

Consider a binary search tree, which is initially empty. Suppose you want to insert nodes in a binary search tree
in the following order:

523668244472

The process of inserting a new node in a binary search tree is shown in the following table.

Operation Decisions Taken Illustration

Insert 52 The tree is empty;


therefore, node,
52, becomes the
root.

Insert 36 Compare the


value, 36, with
that of the root.
Since 36 < 52,
insert 36 as the
left child of 52.

Insert 68 Compare the


value 68 with that
of the root. Since
68 > 52, insert 68
as the right child
of 52.

Insert 24 Compare the


value, 24, with
that of the root.
Since 24 < 52, 24
will be inserted in
the left sub tree of
the root.
However, the left
sub tree of the
root is not empty.
Therefore, move
to the left child of

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


52, which is 36.
Compare the
value, 24, with 36.
Since 24 < 36,
insert 24 as the
left child of 36.

Insert 44 Compare the


value, 44, with
that of the root.
Since 44 < 52, 44
will be inserted in
the left sub tree of
the root.
However, the left
sub tree of the
root is not empty.
Therefore, move
to the left child of
root, which is
node 36.
Compare the
value 44 with 36.
Since 44 > 36,
insert 44 as the
right child of 36.

The Process of Inserting Nodes in a Binary Search Tree

The following algorithm depicts the logic of inserting a node in a binary search tree:

1. Allocate memory for the new node .


2. Assign value to the data field of new node .
3. Make the left and right child of the new node point to NULL .
4. Locate the node that will be the parent of the node to be inserted. Mark it as
parent. Execute the following steps to locate the parent of the new node to be
inserted:
a. Mark the root node as currentNode.
b. Make parent point to NULL .
c. Repeat steps d, e, and f, until currentNode becomes NULL .
d. Make parent point to currentNode.
e. If the value of the new node is less than or equal to that of currentNode:
i. Make currentNode point to its left child .
f. If the value of the new node is greater than that of currentNode:
i. Make currentNode point to its right child .
5. If parent is NULL (Tree is empty):
a. Mark the new node as ROOT .

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


b. Exit .
6. If the value in the data field of the new node is less than or equal to that of
parent:
a. Make the left child of parent point to the new node .
b. Exit .
7. If the value in the data field of the new node is greater than that of the parent:
a. Make the right child of parent point to the new node .
b. Exit .

Consider a binary search tree, as shown in the following figure.

The Binary Search Tree Before Insert Operation

Suppose you want to insert a node, 55, in the given binary search tree. The process of inserting is shown in the
following table.

Operation Illustration

Allocate
memory
and
assign a
value to
the data
field of
the new
node.
Make the
left and
right
child
fields of
the new
node
point to

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


NULL.

Locate
the node
that will
be the
parent of
the node
to be
inserted.
Mark it as
parent.

If the
value in
the data
field of a
new node
is less
than or
equal to
the value
of parent,
make the
left child
of parent
point to
the new
node.

The Process of Inserting a Node in a Binary Search Tree

Deleting Nodes from a Binary Search Tree

The delete operation in a binary search tree refers to the process of deleting a specified node from the tree. To
delete a node from a binary search tree, you first need to locate the node to be deleted and its parent node. The
following algorithm depicts the logic to locate the node to be deleted and its parent:

1. Make a variable/pointer currentNode point to the ROOT node .


2. Make a variable/pointer parent point to NULL .
3. Repeat steps a, b, and c until currentNode becomes NULL or the value of the node to
be searched becomes equal to that of currentNode:
a. Make parent point to currentNode.
b. If the value to be deleted is less than that of currentNode:
i. Make currentNode point to its left child .

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


c. If the value to be deleted is greater than that of currentNode:
i. Make currentNode point to its right child .

After executing the preceding algorithm, currentNode is positioned on the node to be deleted and parent is
positioned on the currentNodes parent.

After executing the preceding algorithm, if currentNode becomes NULL, it means that the node to be
deleted does not exist.

Once you locate the node to be deleted and its parent, there can be the following cases:

Case I The node to be deleted is a leaf node: In this case, the left and right child fields of
currentNode are NULL. The following algorithm depicts the logic to implement a delete operation on a
leaf node:
1. If currentNode is the root node: // If parent is NULL
a. Make ROOT point to NULL .
b. Go to step 4.
2. If currentNode is the left child of parent:
a. Make the left child field of parent point to NULL .
b. Go to step 4.
3. If currentNode is the right child of parent:
a. Make the right child field of parent point to NULL .
b. Go to step 4.
4. Release the memory for currentNode.
Consider the binary search tree, as shown in the following figure.

The Binary Search Tree Before Deleting the Leaf Node

Suppose you want to delete node, 75. The process of deleting node, 75, from the binary search tree is
shown in the following table.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


Operations Illustration

Locate the
node to be
deleted. Mark
it as
currentNode
and its parent
as parent.

If
currentNode
is the left
child of
parent, make
the left child
field of parent
point to
NULL.

Release the
memory for
currentNode.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


The Process of Deleting a Leaf Node from a Binary Search Tree

Case II The node to be deleted has one child (left or right): In this case, currentNode can either
have a left child or a right child. Mark the only child of currentNode as child . To delete currentNode,
the left/right child field of parent is linked with the left/right child of currentNode. The following
algorithm depicts the logic to implement a delete operation on a node having one child:
1. If currentNode has a left child:
a. Mark the left child of currentNode as child .
b. Go to step 3.
2. If currentNode has a right child:
a. Mark the right child of currentNode as child .
b. Go to step 3.
3. If currentNode is the root node:
a. Mark child as ROOT .
b. Go to step 6.
4. If currentNode is the left child of parent:
a. Make left child field of parent point to child .
b. Go to step 6.
5. If currentNode is the right child of parent:
a. Make right child field of parent point to child .
b. Go to step 6.
6. Release the memory for currentNode.
Consider the binary search tree, as shown in the following figure.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


The Binary Search Tree Before Deleting a Node Having One Child

Suppose you want to delete node, 80. The process of deleting node, 80, is shown in the following table.

Operation Illustration

Locate the
node to be
deleted. Mark
it as
currentNode
and its parent
as parent. If
currentNode
has a left
child, mark
the left child
of
currentNode
as child.

If
currentNode
is the right
child of
parent, make
right child
field of
parent point
to child.

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


Release the
memory for
currentNode.

The Process of Deleting a Node Having One Child from a Binary Search Tree

Case III The node to be deleted that has two children: In this case, currentNode has two children.
To delete currentNode from the tree, you need to replace it with its inorder successor. The inorder
successor of a node, x, refers to the next node after x in the inorder traversal of the tree. To locate the
inorder successor of a node, you need to locate the left most node in its right sub tree.
The following algorithm depicts the logic to implement the delete operation on a node having two
children:
1. Locate the inorder successor of currentNode. Mark it as Inorder_suc. Execute
the following steps to locate Inorder_suc:
a. Mark the right child of currentNode as Inorder_suc.
b. Repeat until the left child of inorder_suc becomes null:

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


i. Make Inorder_suc point to its left child .
2. Replace the information held by currentNode with that of Inorder_suc.
3. If the node marked Inorder_suc is a leaf node:
a. Delete the node marked Inorder_suc by using the algorithm for Case I.
4. If the node marked Inorder_suc has one child:
a. Delete the node marked Inorder_suc by using the algorithm for Case II.
Because the node marked Inorder_suc is the leftmost node of the right sub tree of currentNode, it
cannot have a left child. Therefore, the node marked Inorder_suc has, at the most, one child. This
means that the node marked Inorder_suc can be deleted by executing the algorithm that was developed
for deleting a leaf node, or the one that was developed for deleting a node with one child.

Consider the binary search tree, as shown in the following figure.

The Binary Search Tree Before Deleting a Node Having Two Children

Suppose you want to delete node, 72. The process is shown in the following table.

Operation Illustration

Locate the
node to be
deleted. Mark
it as
currentNode.
Locate the
inorder
successor of
currentNode.
The inorder
successor of
currentNode
will be the

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


leftmost node
in the right
sub tree of
currentNode.
Mark the
inorder
successor of
currentNode
as
Inorder_suc.

Replace the
information
held by
currentNode
with that of
Inorder_suc.

The node
marked
Inorder_suc
is a leaf node.
Therefore,
delete it by
using the
algorithm for

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


deleting a
leaf node.

The Process of Deleting a Node Having Two Children from a Binary Search Tree

Activity 6.1: Implementing a Binary Search Tree

Summary
In this chapter, you learned that:

A tree is a nonlinear data structure that represents a hierarchical relationship among the various data
elements.
A binary tree is a specific type of tree in which each node can have a maximum of two children.
Binary trees can be implemented by using arrays, as well as linked lists, depending upon the
requirement.
Traversal of a tree is the process of visiting all the nodes of the tree once. There are three types of
traversals: inorder traversal, preorder traversal, and postorder traversal.
Binary search tree is a binary tree in which the value of the left child of a node is always less than or
equal to the value of the node, and the value of the right child of a node is greater than the value of the
node.

Reference Reading

Storing Data in a Tree


Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://en.wikipedia.org/wiki/Tree_data_structure


Design by Robert L. Kruse
Data Structures Using C and
C++ by Aaron M. Tenenbaum

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]


An Introduction to Data
Structures with Applications
by Jean-Paul Tremblay and
Paul G.Sorenson

Implementing a Binary Tree


Reference Reading: Books Reference Reading: URLs

Data Structures and Program http://en.wikipedia.org/wiki/Binary_tree


Design by Robert L. Kruse http://www.cs.auckland.ac.nz/~jmor159/PLDS210/trees.html
Data Structures Using C and
C++ by Aaron M. Tenenbaum
An Introduction to Data
Structures with Applications
by Jean-Paul Tremblay and
Paul G.Sorenson

Implementing a Binary Search Tree


Reference Reading: Books Reference Reading: URLs

An Introduction to Data http://en.wikipedia.org/wiki/Binary_search_tree


Structures with Applications http://www.informatics.susx.ac.uk/courses/dats/notes/html/node59.html
by Jean-Paul Tremblay and
Paul G.Sorenson

http://www.niitstudent.com/India/Content/133DSASGS1/OEBPS/Proxy.aspx?Q=DSandAlgorithm_SG_06.htm[09/05/2014 10:30:45 AM]

You might also like