Disjoint-Set Data Structures and Union-Find Explained | Chapter 19 in introduction to algorithms

Disjoint-set data structures are essential tools for efficiently managing collections of non-overlapping sets, enabling fast set partitioning, connectivity checks, and core graph algorithms. Chapter 19 of introduction to algorithms explores the inner workings of union-find structures, their practical applications, and the performance-boosting heuristics that make them indispensable in computer science.

Watch the chapter summary below for an accessible breakdown of disjoint-set structures and their algorithmic uses. Subscribe to Last Minute Lecture for more introduction to algorithms tutorials!

Core Disjoint-Set Operations

MAKE-SET(x): Creates a new set containing element x.
FIND-SET(x): Returns a pointer to the representative of the set containing x.
UNION(x, y): Merges the sets containing x and y.

A representative is any consistent member that uniquely identifies a set. These operations are the backbone of many graph algorithms, including connected components, minimum spanning trees, and offline lowest common ancestor (LCA) computations.

Implementations and Efficiency

Linked-List Representation: Each set is a linked list. MAKE-SET and FIND-SET are O(1), but UNION can be costly (up to O(n) per operation) unless using the weighted-union heuristic (append the shorter list to the longer, limiting pointer updates).
Disjoint-Set Forests: Represent sets as trees, with each node pointing to its parent. MAKE-SET is O(1), while FIND-SET and UNION leverage tree structure for efficiency.
Union by Rank: Attach the shorter tree to the taller, using a rank attribute to approximate height, keeping the tree shallow.
Path Compression: Flattens the tree during FIND-SET by making each visited node point directly to the root, further speeding up future queries.

Amortized Performance and Inverse Ackermann

With both union by rank and path compression, a sequence of m operations on n elements runs in O(m × α(n)) time, where α(n) is the extremely slow-growing inverse Ackermann function (less than 5 for any input size you'll ever encounter). This means disjoint-set operations are practically linear in all real-world scenarios.

Applications in Algorithms

Connected Components: Quickly determine which vertices belong to the same component in a graph by processing edges with UNION and FIND-SET.
Minimum Spanning Trees (MST): Used in Kruskal's algorithm to avoid cycles while building the MST.
Offline Lowest Common Ancestors (Tarjan’s Algorithm): Computes LCAs efficiently for multiple queries.
Offline Minimum Problem: Solve minimum extraction in a sequence using disjoint-set techniques.

Glossary of Key Terms

Disjoint Set: Collection of non-overlapping sets
MAKE-SET / FIND-SET / UNION: Core set operations
Representative: Distinguished member identifying a set
Union by Rank: Merge by tree height approximation
Path Compression: Flattens trees to speed up future finds
α(n): Inverse Ackermann function; grows slower than any practical function
Tarjan’s Algorithm: Efficient LCA computation using disjoint sets

Conclusion: Powering Efficient Set Partitioning

Disjoint-set data structures, especially with union by rank and path compression, are key for solving connectivity and partitioning problems in graphs with near-linear efficiency. Their clever use of pointers and heuristics enables scalable, high-speed algorithms for networks, databases, and beyond. For more algorithmic deep-dives, subscribe to Last Minute Lecture and follow the full introduction to algorithms series!

Search This Blog

Last Minute Lecture