Disjoint-Set Data Structures and Union-Find Explained | Chapter 19 in Introduction to Algorithms
Disjoint-Set Data Structures and Union-Find Explained | Chapter 19 in introduction to algorithms
Disjoint-set data structures are essential tools for efficiently managing collections of non-overlapping sets, enabling fast set partitioning, connectivity checks, and core graph algorithms. Chapter 19 of introduction to algorithms explores the inner workings of union-find structures, their practical applications, and the performance-boosting heuristics that make them indispensable in computer science.

Watch the chapter summary below for an accessible breakdown of disjoint-set structures and their algorithmic uses. Subscribe to Last Minute Lecture for more introduction to algorithms tutorials!
Core Disjoint-Set Operations
- MAKE-SET(x): Creates a new set containing element x.
- FIND-SET(x): Returns a pointer to the representative of the set containing x.
- UNION(x, y): Merges the sets containing x and y.
A representative is any consistent member that uniquely identifies a set. These operations are the backbone of many graph algorithms, including connected components, minimum spanning trees, and offline lowest common ancestor (LCA) computations.
Implementations and Efficiency
-
Linked-List Representation: Each set is a linked list.
MAKE-SET
andFIND-SET
are O(1), butUNION
can be costly (up to O(n) per operation) unless using the weighted-union heuristic (append the shorter list to the longer, limiting pointer updates). -
Disjoint-Set Forests: Represent sets as trees, with each node pointing to its parent.
MAKE-SET
is O(1), whileFIND-SET
andUNION
leverage tree structure for efficiency. - Union by Rank: Attach the shorter tree to the taller, using a rank attribute to approximate height, keeping the tree shallow.
-
Path Compression: Flattens the tree during
FIND-SET
by making each visited node point directly to the root, further speeding up future queries.
Amortized Performance and Inverse Ackermann
With both union by rank and path compression, a sequence of m operations on n elements runs in O(m × α(n)) time, where α(n) is the extremely slow-growing inverse Ackermann function (less than 5 for any input size you'll ever encounter). This means disjoint-set operations are practically linear in all real-world scenarios.
Applications in Algorithms
- Connected Components: Quickly determine which vertices belong to the same component in a graph by processing edges with
UNION
andFIND-SET
. - Minimum Spanning Trees (MST): Used in Kruskal's algorithm to avoid cycles while building the MST.
- Offline Lowest Common Ancestors (Tarjan’s Algorithm): Computes LCAs efficiently for multiple queries.
- Offline Minimum Problem: Solve minimum extraction in a sequence using disjoint-set techniques.
Glossary of Key Terms
- Disjoint Set: Collection of non-overlapping sets
- MAKE-SET / FIND-SET / UNION: Core set operations
- Representative: Distinguished member identifying a set
- Union by Rank: Merge by tree height approximation
- Path Compression: Flattens trees to speed up future finds
- α(n): Inverse Ackermann function; grows slower than any practical function
- Tarjan’s Algorithm: Efficient LCA computation using disjoint sets
Conclusion: Powering Efficient Set Partitioning
Disjoint-set data structures, especially with union by rank and path compression, are key for solving connectivity and partitioning problems in graphs with near-linear efficiency. Their clever use of pointers and heuristics enables scalable, high-speed algorithms for networks, databases, and beyond. For more algorithmic deep-dives, subscribe to Last Minute Lecture and follow the full introduction to algorithms series!
Comments
Post a Comment