Data Structure and Algorithm
Disjoint Set Union (DSU) Data Structure with Practical Examples
 Introduction
 Operations in DSU
 DSU Optimizations
 Implementation of DSU in C
 Time Complexity of DSU
 Finding Connected Components in a Graph
 Applications of DSU
 Run C Programming Online Compiler
¶Introduction
The Disjoint Set Union (DSU
), also known as UnionFind, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. It supports two main operations efficiently: finding the representative of a set and merging two sets. This structure is widely used in various computer science applications such as network connectivity, image processing, and Kruskal’s algorithm for finding Minimum Spanning Trees (MST).
¶Operations in DSU
We represent DSU
as a forest of trees, where each tree corresponds to one set, and the root of the tree acts as the representative or leader of that set. In this structure, every element points to its parent, and the root is the ultimate parent for all elements in the set. This hierarchical organization allows efficient management of the sets and enables quick determination of the representative element through traversal to the root. It supports three primary operations: Initializing a set, finding representative of a set and union of two sets. These operations are fundamental in managing the disjoint sets.
¶Initializing a set
This operation initializes a new set containing the single element v
. It sets the parent of v
to v
itself, indicating that v
is the representative (or leader) of its own set. The parent array is used to keep track of the representative of each set.
Implementation in C:
void make_set(int v) {
parent[v] = v; // Each element is its own parent
}
¶Finding representative
This operation returns the representative (or leader) of the set containing element v
. It can be used to check if two elements belong to the same set.
Implementation in C:
int find_set(int v) {
if (parent[v] != v) {
return find_set(parent[v]);
}
return v;
}
¶Union of two sets
This operation merges the sets containing elements a
and b
. It finds the representatives of both sets, if the roots are different connects one set to the other, effectively merging them into a single set.
Implementation in C:
void union_sets(int a, int b) {
int rootA = find_set(a);
int rootB = find_set(b);
if (rootA != rootB) {
parent[rootB] = rootA;
}
}
Time Complexity

make_set(v):
O(1)

find_set(v):
In the worst case,O(n)
if the tree is completely unbalanced. 
union_sets(a, b):
In the worst case,O(n)
if the tree is completely unbalanced.
¶DSU
Optimizations
¶Path Compression
This optimization technique flattens the structure of the tree whenever Find is called by making every node on the path point directly to the root. This helps in speeding up future Find operations, making them nearly constant time on average. Without path compression, the time complexity of Find can be as bad as O(n)
, but with path compression, it becomes O(log n)
.
Implementation in C:
int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]); // Path compression
}
return parent[x];
}
¶Union by Size / Rank
In the previous union implementation, the second tree always got attached to the first one. It can lead to trees containing chains of length O(n)
. This optimization ensures that the smaller tree is always attached under the root of the larger tree. The rank array keeps track of the depth of the trees to decide which tree to merge under which. This prevents the trees from becoming too tall, ensuring that the Find operation remains efficient. To use this optimization we need to introduce new array size
which keeps the size of the components.
Implementation in C:
void union_sets(int a, int b) {
a = find(a);
b = find(b);
if (a != b) {
if (size[a] < size[b]) {
// Swap to ensure that we attach the smaller tree to the larger one
int temp = a;
a = b;
b = temp;
}
parent[b] = a;
size[a] += size[b];
}
}
¶Implementation of DSU
in C
#include <stdio.h>
#include <stdlib.h>
#define MAXN 1000 // Adjust MAXN based on your needs
// Global arrays for parent and size
int parent[MAXN];
int size[MAXN];
// Function to initialize the DSU structure
void init_dsu(int n) {
for (int i = 0; i < n; ++i) {
parent[i] = i;
size[i] = 1;
}
}
// Find with path compression
int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]); // Path compression
}
return parent[x];
}
// Union by size
void union_sets(int a, int b) {
a = find(a);
b = find(b);
if (a != b) {
if (size[a] < size[b]) {
// Swap to ensure that we attach the smaller tree to the larger one
int temp = a;
a = b;
b = temp;
}
parent[b] = a;
size[a] += size[b];
}
}
// Example usage
int main() {
int n = 10; // Number of elements, adjust as needed
init_dsu(n);
// Example union operations
union_sets(1, 2);
union_sets(2, 3);
union_sets(4, 5);
// Example find operations
printf("Find(1): %d\n", find(1));
printf("Find(2): %d\n", find(2));
printf("Find(4): %d\n", find(4));
return 0;
}
Find(1): 1
Find(2): 1
Find(4): 4
¶Time Complexity of DSU
When combining the optimizations of path compression with union by size or rank in a Disjoint Set Union (DSU) data structure, we achieve nearly constant time complexity for queries. Specifically, the amortized time complexity is O(α(n))
, where α(n)
is the inverse Ackermann function. This function grows extremely slowly, so it does not exceed 4 for all practical values of n.
Amortized complexity refers to the average time per operation when considering a sequence of multiple operations. This means that while a single operation may take longer in some cases, the average time across many operations is very small. Without path compression, only using union by size or rank, the time complexity per query is O(log n)
.
¶Finding Connected Components in a Graph
One of the primary applications of the Disjoint Set Union (DSU) or UnionFind data structure is to manage and query connected components in a graph. Let’s consider the following problem,
we are given edges of a graph. We need to answer whether vertices
a
andb
are in the same connected component.
With DSU, we efficiently answer this in almost linear time by:
 Adding vertices and edges to the graph.
 Using union operations to merge connected components.
 Using find operations to check if two vertices are in the same component.
#include <stdio.h>
#include <stdlib.h>
// Define a maximum number of vertices
#define MAXN 1000
// Global arrays for parent and size
int parent[MAXN];
int size[MAXN];
// Function to initialize the DSU structure
void init_dsu(int n) {
for (int i = 0; i < n; ++i) {
parent[i] = i;
size[i] = 1;
}
}
// Function to swap two integers
void swap(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}
// Find with path compression
int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]); // Path compression
}
return parent[x];
}
// Union by size
void union_sets(int a, int b) {
a = find(a);
b = find(b);
if (a != b) {
if (size[a] < size[b]) {
swap(&a, &b); // Swap to ensure we attach the smaller tree to the larger one
}
parent[b] = a;
size[a] += size[b];
}
}
// Function to check if two vertices are in the same connected component
int are_connected(int a, int b) {
return find(a) == find(b);
}
// Example usage
int main() {
int n = 10; // Number of vertices, adjust as needed
init_dsu(n);
// Adding edges
union_sets(1, 2);
union_sets(2, 3);
union_sets(4, 5);
union_sets(6, 7);
union_sets(5, 6);
// Querying if vertices are in the same connected component
printf("Are 1 and 3 connected? %s\n", are_connected(1, 3) ? "Yes" : "No");
printf("Are 1 and 4 connected? %s\n", are_connected(1, 4) ? "Yes" : "No");
printf("Are 4 and 7 connected? %s\n", are_connected(4, 7) ? "Yes" : "No");
return 0;
}
Are 1 and 3 connected? Yes
Are 1 and 4 connected? No
Are 4 and 7 connected? Yes
¶Applications of DSU

Finding Connected Components in a Graph : DSU helps efficiently track and merge connected components as edges are added. It can answer queries about whether two vertices belong to the same connected component.

Kruskal’s Minimum Spanning Tree Algorithm : DSU is used to detect cycles while adding edges in Kruskal’s algorithm. It ensures that adding an edge does not create a cycle, thus maintaining a forest that eventually becomes the MST.

Dynamic Connectivity in Networks : In network management, DSU can handle dynamic connectivity queries, allowing realtime updates and checks on the connectivity status of nodes.

Image Processing : DSU can be used to label connected components in a binary image, identifying and grouping pixels that are part of the same connected region.

Network Design and Optimization : DSU is used in network design algorithms to manage and optimize the connectivity of various components, ensuring efficient design and minimal cost while keeping components connected.

Tarjan’s LCA Algorithm : Tarjan’s algorithm uses DSU to efficiently answer multiple LCA queries after preprocessing the tree.
¶Run C Programming Online Compiler
To make your learning more effective, exercise the coding examples in the text editor below.
SingleSource Shortest Paths Algorithm  Dijkstra's Algorithm
Introduction to Binary Search with Practical Examples
All Tutorials in this playlist
Popular Tutorials
Categories

Artificial Intelligence (AI)
11

Bash Scripting
1

Bootstrap CSS
0

C Programming
14

C#
0

ChatGPT
1

Code Editor
2

Computer Engineering
3

CSS
28

Data Structure and Algorithm
18

Design Pattern in PHP
2

Design Patterns  Clean Code
1

EBook
1

Git Commands
1

HTML
19

Interview Prepration
2

Java Programming
0

JavaScript
12

Laravel PHP Framework
37

Mysql
1

Node JS
1

Online Business
0

PHP
28

Programming
8

Python
12

React Js
19

React Native
1

Redux
2

Rust Programming
15

Tailwind CSS
1

Typescript
10

Uncategorized
0

Vue JS
1

Windows Operating system
1

Woocommerce
1

WordPress Development
2
Tags
 Artificial Intelligence (AI)
 Bash Scripting
 Business
 C
 C Programming
 Csharp programming
 C++
 Code Editor
 Computer Engineering
 CSS
 Data Structure and Algorithm
 Database
 Design pattern
 Express JS
 git
 Git Commands
 github
 HTML
 Java
 JavaScript
 Laravel
 Mathematics
 MongoDB
 Mysql
 Node JS
 PHP
 Programming
 Python
 React Js
 Redux
 Rust Programming Language
 TypeScript
 Vue JS
 Windows terminal
 Woocommerce
 WordPress
 WordPress Plugin Development