Disjoint Set Union (DSU) Data Structure with Practical Examples

Introduction

The Disjoint Set Union (DSU), also known as Union-Find, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It supports two main operations efficiently: finding the representative of a set and merging two sets. This structure is widely used in various computer science applications such as network connectivity, image processing, and Kruskal’s algorithm for finding Minimum Spanning Trees (MST).

Operations in DSU

We represent DSU as a forest of trees, where each tree corresponds to one set, and the root of the tree acts as the representative or leader of that set. In this structure, every element points to its parent, and the root is the ultimate parent for all elements in the set. This hierarchical organization allows efficient management of the sets and enables quick determination of the representative element through traversal to the root. It supports three primary operations: Initializing a set, finding representative of a set and union of two sets. These operations are fundamental in managing the disjoint sets.

Initializing a set

This operation initializes a new set containing the single element v. It sets the parent of v to v itself, indicating that v is the representative (or leader) of its own set. The parent array is used to keep track of the representative of each set.


Implementation in C:


void make_set(int v) {
    parent[v] = v; // Each element is its own parent
}

Finding representative

This operation returns the representative (or leader) of the set containing element v. It can be used to check if two elements belong to the same set.


Implementation in C:


int find_set(int v) {
    if (parent[v] != v) {
        return find_set(parent[v]);
    }
    return v;
}

Union of two sets

This operation merges the sets containing elements a and b. It finds the representatives of both sets, if the roots are different connects one set to the other, effectively merging them into a single set.


Implementation in C:


void union_sets(int a, int b) {
    int rootA = find_set(a);
    int rootB = find_set(b);

    if (rootA != rootB) {
        parent[rootB] = rootA;
    }
}

Time Complexity

  • make_set(v): O(1)
  • find_set(v): In the worst case, O(n) if the tree is completely unbalanced.
  • union_sets(a, b): In the worst case, O(n) if the tree is completely unbalanced.

DSU Optimizations

Path Compression

This optimization technique flattens the structure of the tree whenever Find is called by making every node on the path point directly to the root. This helps in speeding up future Find operations, making them nearly constant time on average. Without path compression, the time complexity of Find can be as bad as O(n), but with path compression, it becomes O(log n).


Implementation in C:


int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // Path compression
    }
    return parent[x];
}

Union by Size / Rank

In the previous union implementation, the second tree always got attached to the first one. It can lead to trees containing chains of length  O(n) . This optimization ensures that the smaller tree is always attached under the root of the larger tree. The rank array keeps track of the depth of the trees to decide which tree to merge under which. This prevents the trees from becoming too tall, ensuring that the Find operation remains efficient. To use this optimization we need to introduce new array size which keeps the size of the components.


Implementation in C:


void union_sets(int a, int b) {
    a = find(a);
    b = find(b);
    if (a != b) {
        if (size[a] < size[b]) {
            // Swap to ensure that we attach the smaller tree to the larger one
            int temp = a;
            a = b;
            b = temp;
        }
        parent[b] = a;
        size[a] += size[b];
    }
}

Implementation of DSU in C


#include <stdio.h>
#include <stdlib.h>

#define MAXN 1000  // Adjust MAXN based on your needs

// Global arrays for parent and size
int parent[MAXN];
int size[MAXN];

// Function to initialize the DSU structure
void init_dsu(int n) {
    for (int i = 0; i < n; ++i) {
        parent[i] = i;
        size[i] = 1;
    }
}

// Find with path compression
int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // Path compression
    }
    return parent[x];
}

// Union by size
void union_sets(int a, int b) {
    a = find(a);
    b = find(b);
    if (a != b) {
        if (size[a] < size[b]) {
            // Swap to ensure that we attach the smaller tree to the larger one
            int temp = a;
            a = b;
            b = temp;
        }
        parent[b] = a;
        size[a] += size[b];
    }
}

// Example usage
int main() {
    int n = 10; // Number of elements, adjust as needed
    init_dsu(n);

    // Example union operations
    union_sets(1, 2);
    union_sets(2, 3);
    union_sets(4, 5);

    // Example find operations
    printf("Find(1): %d\n", find(1));
    printf("Find(2): %d\n", find(2));
    printf("Find(4): %d\n", find(4));

    return 0;
}

Output:
Find(1): 1
Find(2): 1
Find(4): 4

Time Complexity of DSU

When combining the optimizations of path compression with union by size or rank in a Disjoint Set Union (DSU) data structure, we achieve nearly constant time complexity for queries. Specifically, the amortized time complexity is O(α(n)), where α(n) is the inverse Ackermann function. This function grows extremely slowly, so it does not exceed 4 for all practical values of n.
Amortized complexity refers to the average time per operation when considering a sequence of multiple operations. This means that while a single operation may take longer in some cases, the average time across many operations is very small. Without path compression, only using union by size or rank, the time complexity per query is O(log n).

Finding Connected Components in a Graph

One of the primary applications of the Disjoint Set Union (DSU) or Union-Find data structure is to manage and query connected components in a graph. Let’s consider the following problem,

we are given edges of a graph. We need to answer whether vertices  a  and  b  are in the same connected component.

With DSU, we efficiently answer this in almost linear time by:

  • Adding vertices and edges to the graph.
  • Using union operations to merge connected components.
  • Using find operations to check if two vertices are in the same component.

#include <stdio.h>
#include <stdlib.h>

// Define a maximum number of vertices
#define MAXN 1000  

// Global arrays for parent and size
int parent[MAXN];
int size[MAXN];

// Function to initialize the DSU structure
void init_dsu(int n) {
    for (int i = 0; i < n; ++i) {
        parent[i] = i;
        size[i] = 1;
    }
}

// Function to swap two integers
void swap(int *a, int *b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}

// Find with path compression
int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // Path compression
    }
    return parent[x];
}

// Union by size
void union_sets(int a, int b) {
    a = find(a);
    b = find(b);
    if (a != b) {
        if (size[a] < size[b]) {
            swap(&a, &b); // Swap to ensure we attach the smaller tree to the larger one
        }
        parent[b] = a;
        size[a] += size[b];
    }
}

// Function to check if two vertices are in the same connected component
int are_connected(int a, int b) {
    return find(a) == find(b);
}

// Example usage
int main() {
    int n = 10; // Number of vertices, adjust as needed
    init_dsu(n);

    // Adding edges
    union_sets(1, 2);
    union_sets(2, 3);
    union_sets(4, 5);
    union_sets(6, 7);
    union_sets(5, 6);

    // Querying if vertices are in the same connected component
    printf("Are 1 and 3 connected? %s\n", are_connected(1, 3) ? "Yes" : "No");
    printf("Are 1 and 4 connected? %s\n", are_connected(1, 4) ? "Yes" : "No");
    printf("Are 4 and 7 connected? %s\n", are_connected(4, 7) ? "Yes" : "No");

    return 0;
}

Output:
Are 1 and 3 connected? Yes
Are 1 and 4 connected? No
Are 4 and 7 connected? Yes

Applications of DSU

  1. Finding Connected Components in a Graph : DSU helps efficiently track and merge connected components as edges are added. It can answer queries about whether two vertices belong to the same connected component.

  2. Kruskal’s Minimum Spanning Tree Algorithm : DSU is used to detect cycles while adding edges in Kruskal’s algorithm. It ensures that adding an edge does not create a cycle, thus maintaining a forest that eventually becomes the MST.

  3. Dynamic Connectivity in Networks : In network management, DSU can handle dynamic connectivity queries, allowing real-time updates and checks on the connectivity status of nodes.

  4. Image Processing : DSU can be used to label connected components in a binary image, identifying and grouping pixels that are part of the same connected region.

  5. Network Design and Optimization : DSU is used in network design algorithms to manage and optimize the connectivity of various components, ensuring efficient design and minimal cost while keeping components connected.

  6. Tarjan’s LCA Algorithm : Tarjan’s algorithm uses DSU to efficiently answer multiple LCA queries after preprocessing the tree.

Run C Programming Online Compiler

To make your learning more effective, exercise the coding examples in the text editor below.

Run C programming online

Previous
Single-Source Shortest Paths Algorithm - Dijkstra's Algorithm
Next
Introduction to Binary Search with Practical Examples