You are on page 1of 36

The Disjoint Set ADT

Chapter 8

Disjoint Set ADT


This chapter introduces the disjoint set ADT. The disjoint set involves two basic operations, union and find, and so its algorithm is often called the union/find algorithm. The author introduces the disjoint set ADT with a discussion of equivalence relations and the dynamic equivalence problem.

Equivalence Relations
An equivalence relation R is defined for a set if for every a,b in the set, aRb is either true or false. An equivalence relation has 3 properties:
Reflexive: a R a for all a in S Symmetric: a R b if and only if b R a Transitive: a R b and b R c => a R c
3

Equivalence Relations
Is the <= operator an equivalence relation?
It is reflexive (a<=a) It is transitive (a<=b and b<=c means a<=c) But it is not symmetric (a<=b does not mean b<=a).

So <= is not an equivalence relation.

Examples
Is electrical connectivity between components an equivalence relation?
It is reflexive since a component is connected to itself. It is symmetric since if a is connected to b then b is connected to a. It is transitive since if a connects to b and b connects to c then a has connectivity to c.

So, electrical connectivity is an equivalence relation.


5

Examples
Is travel between cities in a country an equivalence relation?
It is reflexive because you may travel from the city to itself. It is symmetric because traveling from a to b implies travel is possible from b to a. It is transitive because traveling from a to b and from b to c implies travel from a to c.

So, travel between cities in a country is an equivalence relation.


6

Dynamic Equivalence
We use ~ to mean an equivalence relation. We would like to decide if a~b for any a,b. This could be done in constant time with a 2D array of Boolean values. For example, if for any a,b we inspect the array, we would find either true or false, telling us if a~b or not.
7

Dynamic Equivalence
A 2D array would contain all of the relation information explicitly. However, the data may not come to us in this form, it may come implicitly. For example: a1~a2, a3~a4, a5~a1, a4~a2 implies that all pairs in {a1, a2, a3, a4, a5} are related. We would like to be able to determine this quickly.
8

Dynamic Equivalence
The equivalence class of an element, a, is the subset of S that relates to a. The equivalence class of a partitions S into two sets, the set that relates to a and the set that does not. So, to know if a~b, we need to know if a and b belong to the same equivalence class.
9

Dynamic Equivalence
We start with a list of N sets, each with one element, and no relation between the elements. Since all sets are unique, they are disjoint. We then define two operations:
Find: returns the name of the set containing a given element. Union: merges two equivalence classes into one. 10

Dynamic Equivalence
The operations on the sets do not involve comparing their relative values. For this reason, the values of the elements in the sets are simply representative values and can be number 0 to N-1. Actual data items would need to be mapped to these values for an application to use them.
11

Dynamic Equivalence
The find operation returns the name of a set, but the name is somewhat arbitrary since we merely wish to know if find(a) == find(b).

12

Dynamic Equivalence
An array could be used by letting the index represent the element, and the value represent the set it belongs to (its name). A find could then be done in O(1) time. However, a union would take O(N) time, since a union would need to scan the list changing all elements of the sets to the merged sets name.
13

Basic Data Structure


We can use a tree to represent a set. The root node can act as the name of the set. Other members of the set will be children of the root. We can store the tree in an array in the following way:
If i is a root, then let s[i] = -1. If i is not a root, then let s[i] = parent of i.
14

Trees
0 1 2 3

-1 0

-1 1

-1 2

-1 3

(0,1,2,3 are all roots)

Union(2,3)

-1 0

-1 1

-1 2

2 3

(3 has 2 as its parent)

3
15

Trees
So, to perform a union of two sets, we merge two trees in the array, by making one trees root a child of the other trees root. This takes O(1) time (constant time). Each set is stored as a separate tree. A collection of trees is called a forest.

16

Find(x)
The find(x) command can return the root of the tree containing x. Because a tree may be N-1 elements deep, the running time of find is O(N). So, a series of M operations could take O(MN).

17

public class DisjSets { /** Construct the disjoint sets object. * @param numElements the initial number of disjoint sets. */ public DisjSets( int numElements ) { s = new int [ numElements ]; for( int i = 0; i < s.length; i++ ) s[ i ] = -1; }
/** Union two disjoint sets. * Assume root1 and root2 are distinct and represent set names. * @param root1 the root of set 1. * @param root2 the root of set 2. */ public void union( int root1, int root2 ) { s[ root2 ] = root1; }
18

/** Perform a find. Error checks omitted again for simplicity. * @param x the element being searched for. * @return the set containing x. */ public int find( int x ) { if( s[ x ] < 0 ) return x; else return find( s[ x ] ); }
private int [ ] s; }

19

Smart Union Algorithms


We can improve the union operation by making the tree with fewer nodes be a child of the tree having more nodes. This is called union-by-size. To make tree size comparisons easy, the array entry for a root can store the negative of the tree size rather than -1.
20

Union-by-size: 0 1 union(2,4) 2 3 4 5 6 7

0
1

4
5 6 7 2 3

merge tree with fewer nodes into tree with more nodes
21

Union-by-size:

6 7

2 3

-2 0

0 1

4 2

2 3

-6 4

4 5

4 6

6 7

0 and 4 are roots. -2 and -6 indicate 2 nodes and 6 nodes.

22

Smart Union Algorithms


Union by size will cause the depth of a node to be no more than log N. To see this, consider that every node is initially a tree of depth 0. If a union causes its depth to grow, it is because it has been placed into a tree that has at least twice as many nodes. This can only be done log N times.
23

Watch the depth of element 7 as unions are performed:

0
union(0,1);

union(2,3); union(4,5); union(6,7);

0
1
union(0,2);

2
3
union(4,6);

4
5
union(0,4);

6
7

0 1 2

4 5 6

0 1 2

4 3
5 6 7

There are 8 elements, and element 7 has depth 3: log 8 = 3

24

Smart Union Algorithms


Another smart way to do the union is by height, which means merging the shorter tree with the taller tree. This also results in tree depth of at most O(log N), and thus the find operation will be O(log N). Therefore, a series of M operations would take O(M log N) with either algorithm.
25

Union-by-height: 0 1 union(2,4) 2 3 4 5 6 7

0
1

4
5 6 7 2 3

merge tree with lesser height into tree with greater height
26

Union-by-height:

0 1

5 6
7

2
3

-2 0

0 1

4 2

2 3

-3 4

4 5

4 6

6 7

0 and 4 are roots. -2 and -3 indicate heights 1 and 2. It is one less since a 1-node tree would be height 0, and since 0 is not negative, -1 is used.
27

Path Compression
Path compression is done to make finds faster. When a find(x) is performed, every node on the path to x is made a child of the root. Future finds on these nodes is thus faster. This turns out to be quite easy to do.

28

After Find(4)

0
1 1

0
2 3 4

3 4

5
29

find(4)

public int find(int x) { if (s[x] < 0) return x; else return s[x] = find(s[x]); }

find(3) find(2) find(1) find(0) 0 s[1] = 0 s[2] = 0

s[3] = 0
s[4] = 0

-1
0

0
1

1
2

2
3

3
4

4
5

-1 0

0 1

0 2

0 3

0 4

4 5
30

Performance
When path compression is used with a smart union algorithm, any sequence of M union/find operations takes O(M log*N) time, where M=(N). log*N is the number of times the log must be applied until the result is <= 1. Example: log* 65536 = 4, because log 65536=16, log 16 = 4, log 4 = 2, log 2 = 1.
31

Performance
log* 265536 = 5, and 265536 is a 20000-digit number. So, log* N grows extremely slow. Because log* N grows so slow, the performance is almost linear across a series of operations.

32

Maze Generation
Union/find operations can aid in the construction of a maze. Suppose the maze is to be created so that there is a path from the upper left cell to the lower right cell. Further suppose that there is a path to any cell, meaning all cells are connected (this results in many false paths).
33

Maze Generation
We could begin with each cell in a different set. We then randomly pick a cell and wall. If this cell is not yet connected, we knock down the wall and union it to the set containing the first cell. We continue until all cells are connected, implying a path from the upper left to lower right cells.

34

5 10 15 20

6 11 16 21

7 12 17 22

8 13 18 23

9 14 19 24

5 6 10 11 15 16

7 8 9 12 13 14 17 18 19

20 21 22 23 24

5 6 10 11 15 16 20 21

7 8 9 12 13 14 17 18 19 22 23 24
35

End of Slides

36

You might also like