Science Fair Project Encyclopedia
Disjoint-set data structure
Given a set of elements, it is often useful to break them up or partition them into a number of separate, nonoverlapping groups. A disjoint-set data structure is a data structure that keeps track of such a partitioning. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:
- Find: Determine which group a particular element is in. Also useful for determining if two elements are in the same group.
- Union: Combine or merge two groups into a single group.
Because it supports these two operations, a disjoint-set data structure is sometimes called a merge-find set. The other important operation, MakeSet, which makes a group containing only a given element (a singleton), is generally trivial. With these three operations, many practical partitioning problems can be solved (see the Applications section).
In order to define these operations more precisely we need some way of representing the groups. One common approach is to select a fixed element of each group, called its representative, to represent the group as a whole. Then, Find(x) returns the representative of the group that x belongs to, and Union takes two group representatives as its arguments.
Disjoint-set linked lists
Perhaps the simplest approach to creating a disjoint-set data structure is to create a linked list for each group. We choose the element at the head of the list as the representative.
MakeSet is obvious, creating a list of one element. Union simply appends the two lists, a constant-time operation. Unfortunately, it seems that Find requires Ω(n) or linear time with this approach.
We can avoid this by including in each linked list node a pointer to the head of the list; then Find takes constant time. However, we've now ruined the time of Union, which has to go through the elements of the list being appended to make them point to the head of the new combined list, requiring Ω(n) time.
We can ameliorate this by always appending the smaller list to the longer, called the weighted union heuristic. This also requires keeping track of the length of each list as we perform operations to be efficient. Using this, a sequence of m MakeSet, Union, and Find operations on n elements requires O(m + nlog n) time. To make any further progress, we need to start over with a different data structure.
In a disjoint-set forest, each set is represented by a tree data structure where each node holds a reference to its parent node. The representative of each set is the root of that set's tree. Find simply follows parent nodes until it reaches the root. Union combines two trees into one by attaching one to the root of the other. One way of implementing these might be:
function MakeSet(x) x.parent := null function Find(x) p := x while p.parent ≠ null p := p.parent return p function Union(x, y) y.parent := x
In this naive form, this approach is no better than the linked-list approach, because the tree it creates can be highly unbalanced, but it can be enhanced in two ways.
The first way, called union by rank, is to always attach the smaller tree to the root of the larger tree, rather than vice versa. To evaluate which tree is larger, we use a simple heuristic called rank: one-element trees have a rank of zero, and whenever two trees of the same rank are unioned together, the result has one greater rank. Just applying this technique alone yields an amortized running-time of O(log n) per MakeSet, Union, or Find operation. Here are the improved
function MakeSet(x) x.parent := null x.rank := 0 function Union(x, y) if x.rank > y.rank y.parent := x else if x.rank < y.rank x.parent := y else if x.rank = y.rank y.parent := x x.rank := x.rank + 1
The second improvement, called path compression, is a way of flattening the structure of the tree whenever we use Find on it. The idea is that each node we visit on our way to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, we make one traversal up to the root node, to find out what it is, and then make another traversal, making this root node the immediate parent of all nodes along the path. The resulting tree is much flatter, speeding up future operations not only on these elements but on those referencing them, directly or indirectly. Here is the improved
function Find(x) p := x while p.parent ≠ null // Find root p := p.parent root := p p := x while p ≠ root // Update parents next := p.parent p.parent := root p := next return root
These two techniques complement each other; applied together, the amortized time per operation is only O(α(n)), where α(n) is the inverse of the function f(n) = A(n,n), where A is the extremely quickly-growing Ackermann function. Since α(n) is its inverse, it's less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant; we couldn't ask for much better than this.
In fact, we can't get better than this: Fredman and Saks showed in 1989 that Ω(α(n)) words must be accessed by any disjoint-set data structure per operation on average.
Disjoint-set data structures arise naturally in many applications, particularly where some kind of partitioning or equivalence relation is involved, and this section discusses some of them.
Finding the connected components of an undirected graph
Initially, we assume that every vertex in the graph is in its own connected component, and is not connected to any other vertex. To represent this, we use MakeSet to initially make a set for each vertex containing only that vertex.
Next, we simply visit each vertex and use Union to union its set with the sets of all its neighbors. Once this is done, we will have one group for each connected component, and can use Find to determine which connected component a particular vertex is in, or whether two vertices are in the same connected component.
Computing shorelines of a terrain
When computing the contours of a 3D surface, one of the first steps is to compute the "shorelines," which surround local minima or "lake bottoms." We imagine we are sweeping a plane, which we refer to as the "water level," from below the surface upwards. We will form a series of contour lines as we move upwards, categorized by which local minima they contain. In the end, we will have a single contour containing all local minima.
Whenever the water level rises just above a new local minimum, it creates a small "lake," a new contour line that surrounds the local minimum; this is done with the MakeSet operation.
As the water level continues to rise, it may touch a saddle point, or "pass." When we reach such a pass, we follow the steepest downhill route from it on each side until we arrive a local minimum. We use Find to determine which contours surround these two local minima, then use Union to combine them. Eventually, all contours will be combined into one, and we are done.
While the ideas used in disjoint-set forests have long been familiar, Robert Tarjan was the first to prove the upper bound (and a restricted version of the lower bound) in terms of the inverse Ackermann function. Until this time the best bound on the time per operation, proven by Hopcroft and Ullman, was O(log* n), the iterated logarithm of n, another slowly-growing function (but not quite as slow as the inverse Ackermann function). Tarjan and van Leeuwen also developed one-pass Find algorithms that are more efficient in practice. The algorithm was made well-known by the popular textbook Introduction to Algorithms, listed in the references.
- Chapter 21, Introduction to Algorithms, 2nd ed. Cormen, Leiserson, Rivest, Stein. ISBN 0262032937.
- Compaq Research: Zeus: Animation of Union-Find Algorithms
- Java applet: A Graphical Union-Find Implementation, by Rory L. P. McGuire
- The abstract data type Union-Find , a simple C implementation by Vašek Chvátal
- Wait-free Parallel Algorithms for the Union-Find Problem, a 1994 paper by Richard J. Anderson and Heather Woll describing a parallelized version of Union-Find that never needs to block
- Gont, a little-known programming language with Union-Find in its standard library
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details