You are on page 1of 9

Chapter 10- Disjoint Set

Disjoint Set 10

10.1 Definition
10.2 Equivalence Relation
10.3 Examples of set
10.4 Dynamic Equivalence Problem
10.5 Disjoint Set ADT
10.6 Smart Union Algorithm
10.7. Path Compression
10.8 Summary
10.9 Key Terms
10.10 Review Questions

186
Chapter 10- Disjoint Set

10.1 Definition

A disjoint-set data structure is a data structure that keeps track of a set of elements
partitioned into a number of disjoint (non overlapping) subsets.
It is an efficient data structure to solve the dynamic equivalence problem.
A union-find algorithm is an algorithm that performs two useful operations on such a data
structure:
Find: Determine which subset a particular element is in. This can be used for
determining if two elements are in the same subset.
Union: Join two subsets into a single subset.

10.2 Equivalence Relation

An equivalence relation on a set of stamps, where similar stamps are in the same
equivalence class: No stamp is in two bundles, and no bundle is empty.
An equivalence relation is a relation that, loosely speaking, partitions a set so that every
element of the set is a member of one and only one cell of the partition. Two elements of the set
are considered equivalent (with respect to the equivalence relation) if and only if they are
elements of the same cell. The intersection of any two different cells is empty; the union of all
the cells equals the original set.
A given binary relation ~ on a set S is said to be an equivalence relation if and only if it is
reflexive, symmetric and transitive. Equivalently, for all a, b and c in S:
 Reflexivity: a ~ a
 Symmetry : if a ~ b then b ~ a
 Transitivity : if a ~ b and b ~ c then a ~ c
The <= relationship is not an equivalence relation.
It is reflexive, since x <= x, and transitive, since x <= y and y <= z implies x <= z, it is not
symmetric since x <= y does not imply y <= x.
Electrical connectivity, where all connections are by metal wires, is an equivalence relation.

187
Chapter 10- Disjoint Set

It is clearly reflexive, since any component is connected to itself.


It is symmetric because if component a is connected to component b then b must be electrically
connected to a.
It is transitive, since if component a is connected to component b and b is connected to c then a is
connected to c.

10.3 Examples of set

(1) A = {tiger, lion, puma, cheetah, leopard, cougar, ocelot} (this is a set of large species of cats)
(2) A = {a, b, c, ..., z} (this is a set consisting of the lowercase letters of the alphabet)
(3) A = {-1, -2, -3, ...} (this is a set of the negative numbers)

10.4 Dynamic Equivalence Problem

The dynamic equivalence problem is to decide for any a and b if a ~ b. The problem is
often because the relation is not explicitly, but implicitly defined.
The equivalence class of an element a S is the subset of S that contains all the elements
that are related to a. To decide if a ~ b, we need only to check whether a and b are in the same
equivalence class. To solve equivalence problem, the following strategy is adopted:
We want to get the information soon after the availability of the input data. So the data is
processed immediately. The input is initially a collection of n sets, each with one element.
Suppose if we have 1000 people, then there will be need of 1000 sets having only one person.
Are these people related to each other? No, because every person is in different set. This initial
representation is that all relations (except reflexive relations) are false. We have made 1000 sets
for 1000 people, so only the reflexive relation (every person is related to himself) is true. Now
mathematically speaking, each set has a different element so that Si ∩ Sj = Ø which makes the
sets disjoint. A person in one set has no relation with a person in another set, therefore there
intersection is null. Now here we have 1000 sets each containing only one person. Only the
reflexive relation is true and all the 1000 sets are disjoint. If we take intersection of any two sets
that will be null set i.e. there is no common member in them.
There are two permissible operations in these sets i.e. find and union. In the find method,
we are given one element (name of the person) and asked to find which set it belongs to. Initially,
we have 1000 sets and asked in which set person 99 is? We can say that every person is in a
separate set and person 99 is in set 99. When we get the information of relationships between
different persons, the process of joining the sets together can be started. This is the union
operation. When we apply union operation on two sets, the members of both sets combined
together and form a new set. In this case, there will be no duplicate entry in the new sets as these
were disjoint. The definitions of find and union are:

188
Chapter 10- Disjoint Set

 Find returns the name of the set (equivalence class) that contains a given element, i.e., Si
= find(a)
 Union merges two sets to create a new set Sk = Si U Sj.

We give an element to the find method and it returns the name of the set. The method
union groups the member of two sets into a new set. We will have these two operations in the
disjoint abstract data type. If we want to add the relation a R b, there is need to see whether a and
b are already related. Here a and b may be two persons and a relation is given between them.
First of all we will see that they are already related or not. This is done by performing find on
both a and b to check whether they are in the same set or not. At first, we will send a to the find
method and get the name of its set before sending b to the find method. If the name of both sets is
same, it means that these two belong to the same set. If they are in the same set, there is a
relation between them.

10.5 Disjoint Set ADT

An equivalence relation n over a set S can be viewed as a partitioning of S into disjoint


sets. Each set of the partition is called an equivalence class of n (all elements that are related).
We are given an input of n sets each containing one element. Initially all sets are represented in
such a way that they are not related. This means that all the sets are disjoint and can be
represented as Si ÇSj =Ø.

10.5.1 Operations on Disjoint Set


The three permissible operations on disjoint sets are:
i. MAKE-SET: It creates a set with only one member in each set.
ii. FIND-SET: Returns the name of the set (equivalence class) containing a given
element
iii. UNION: If we want to add the relation a ~b, then we first see if a and b are already
related. This is done by performing finds on both a and b and checking whether they are
in the same equivalence class. If they are not, then we apply union. This operation
merges the two equivalence classes containing a and b into a new equivalence class.
This is known as UNION/FIND algorithm. This algorithm is dynamic because, as the
algorithm proceeds, the sets can change via the union operation.

A tree data structure can be used to represent a disjoint set ADT. Each set is represented
by a tree. The elements in the tree have the same root and hence the root is used to name the set.
The trees do not have to be binary since we only need a parent pointer.

189
Chapter 10- Disjoint Set

Figure10.1 - Tree representation of disjoint set ADT after Make-set operation

Figure 10.2 - Tree representation of disjoint set ADT after union (5,6)

Figure 10.3 - Tree representation of disjoint set ADT after union (7,8)

Figure 10.4 - Tree representation of disjoint set ADT after union (5,7)

0 0 0 0 0 5 5 7
1 2 3 4 5 6 7 8

Figure 10.5 - Element and its root node structure

10.5.2 Algorithms of Disjoint Set

Initially, after the Initialize operation, each set contains one element.

190
Chapter 10- Disjoint Set

int Disjset[Num_of_the_sets];
int Root1;
int Root2;

void Initialize(int Disjset[])


{
int i;
for(i= Num_of_the_sets; i>0;i--)
Disjset[i]=0;
}

// Find returns the root of the tree containing the element X.


int Find(int x, int Disjset[])
{
if(Disjset [x] <= 0 )
return x;
else
return( find(Disjset[x], Disjset );
}

The Find operation takes a time proportional to the depth of the tree. This is inefficient for an
unbalanced tree.
void Union(int Disjset[],int root1, int root2 )
Disjset [root2] = root1;
The union operation takes a constant time of O(1).

10.6 Smart Union Algorithm

The unions in the basic tree data structure representation were performed arbitrarily, by making
the second tree a sub tree of the first. A basic improvement is to make the smaller tree a sub tree
of the larger. We call this approach union-by-size.
If unions are done by size, the depth of any node is never more than log n. Note that a node is
initially at depth 0. When its depth increases as a result of a union, it is placed in a tree that is at
least twice as large as before. Thus, its depth can be increased at most log n times. This implies
that the running time for a find operation is O(log n), and a sequence of m operations takes O(m
log n).

191
Chapter 10- Disjoint Set

We need to keep track of the size of each tree. Let us assign a size variable for each node and let
it contain the size of the tree (Initially a 0 0r 1 according to the convenience). When a union is
performed, check the sizes and make the new size as the sum of the old. Thus, unionby- size is
not at all difficult to implement and requires no extra space. It is also fast, on average. It has been
shown that a sequence of m operations requires O(m) average time if union-by-size is used. This
is because when random unions are performed, small sets are merged with large sets throughout
the algorithm.
An alternative implementation, which also guarantees that all the trees will have depth at most
O(log n), is union-by-rank. We keep track of the height, instead of the size, of each tree and
perform unions by making the shallow tree a subtree of the deeper tree. This is an easy
algorithm, since the height of a tree increases only when two equally deep trees are joined (and
then the height goes up by one). Thus, union-by-height is a trivial modification of union-by-size.
This algorithm is also known as smart union algorithm.

Algorithm
// Smart Union

void Union(int Disjset[],int root1, int root2 )


{
if (Disjset[Root2] < Disjset[Root1])
Disjset[Root1]= Root2;
else
{
if(Disjset[Root1] = = Disjset[Root2])
Disjset[Root1]--;
Disjset[Root2]= Root1;
}
}

10.7. Path Compression

Path compression is quite simple and very effective. We use it during Find-set operations
to make each node on the find path point directly to the root. Path compression does not change
any ranks.
The Find procedure is a two-pass method: it makes one pass up the find path to find the
root, and it makes a second pass back down the find path to update each node so that it points
directly to the root.

192
Chapter 10- Disjoint Set

int Find (int x, int Disjset[])


{
if (Disjset[x] < = 0)
return x;
else
return Disjset[x] = Find(Disjset[x], Disjset);
}

Figure 10.6 - Example Tree

After Find(15):

Figure 10.7 - Effect of Path Compression

10.8 Summary

One of the many applications of disjoint-set data structures arises in determining the
connected components of an undirected graph. Another application of the disjoint set data
structure is that it is used in Kruskal’s algorithm to check for safe edges and to eliminate the
unsafe ones. Only the safe edges get added to the graph in order to generate a minimum spanning
tree.

193
Chapter 10- Disjoint Set

10.9 Key Terms

Disjoint set, Equivalence Relation, Dynamic Equivalence Problem, Find, Smart Union
Algorithm, Path Compression Algorithm

10.10 Review Questions ?????????????

1. Define Disjoint set.


2. Describe about Equivalence Relation.
3. Explain smart union and path compression in detail.

194

You might also like