Introduction to Set Data Structure with Practical Examples

In mathematics, a set is a collection of distinct objects. Set provides a way to organize and categorize elements based on their common properties. Various operations such as union, intersection, and complement are used to manipulate sets, providing a powerful tool for reasoning and describing relationships between different collections of objects.

  • The set of natural numbers less than 10: {1, 2, 3, 4, 5, 6, 7, 8, 9}
  • The set of prime numbers: {2, 3, 5, 7, 11, …}
  • The set of vowels in the English alphabet: {a, e, i, o, u}

In computer programming, the data structure set serves as a mechanism to implement and work with the concepts of set theory from mathematics. It provides a practical and efficient means of managing collections of unique elements, offering operations that align with mathematical set operations

Set Data Structure

From a data structure and algorithmic perspective, a set is a collection of distinct elements, and the order of elements is not necessarily specified. Almost every widely-used programming language offers a built-in set data structure. Typically, sets are implemented using either hash tables or Binary Search Trees (BST) to optimize their performance in terms of insertion, retrieval, and deletion operations.
Depending on how they are implemented sets can be divided into two types: ordered sets and unordered sets. Regardless of these differences, basic operations like adding, finding, and removing elements from a set are quite similar across different programming languages.

Ordered Sets:

An ordered set holds a collection of unique elements while maintaining a specific order among them. Ordered sets are typically implemented using a data structure called a Binary Search Tree (BST).

  • Insertion: O(log n) - Adding an element involves finding the correct position in the ordered structure, typically implemented using a Binary Search Tree (BST).
  • Deletion: O(log n) - Removing an element while maintaining order requires navigating the BST, leading to logarithmic time complexity.
  • Search: O(log n) - Finding an element in an ordered set is efficient due to the organized nature of the BST.

Unordered Sets:

An unordered set holds a collection of unique elements without a specific order. Unordered sets are typically implemented using hash tables.

  • Insertion: O(1) average, O(n) worst-case - Adding an element to an unordered set is generally constant time on average due to hash table implementations. However, in the worst case (hash collisions), it may lead to linear time complexity.
  • Deletion: O(1) average, O(n) worst-case - Similar to insertion, removal is usually constant time on average, but in rare cases of hash collisions, it may degrade to linear time.
  • Search: O(1) average, O(n) worst-case - Searching for an element is constant time on average, but worst-case time complexity occurs when there are hash collisions, leading to a linear search.

Set C++ STL

In C++, the Standard Template Library (STL) provides a built-in data structure called set to efficiently manage a collection of unique elements. This container is particularly useful when the need is to maintain a sorted set of elements where duplicates are not allowed. It’s implementation as a self-balancing binary search tree (BST).
A binary search tree is a hierarchical data structure where each node has at most two children, referred to as the left and right child. It automatically adjusts its structure during operations to maintain balance. This ensures that the tree remains relatively balanced, preventing it from inefficient performance for certain operations. The set data structure is defined under the header file set in C++.

#include <set> // header file for set
set<dataType> mySet; // syntax for set template in c++ stl

Functions in C++ STL Set

  • insert() Inserts an element into the set. Complexity O(log n) on average, where n is the size of the set.
  • erase() Removes an element or a range of elements from the set. Complexity O(log n) on average for a single element. It allows us to remove an element using either its value or an iterator. For erasing by value, if the value is not present in the set, this operation has no effect.
  • find() Searches for an element in the set. Complexity O(log n) on average.
  • size() Returns the number of elements in the set. Complexity O(1) for single operation.
  • empty () Checks if the set is empty. Complexity O(1) for single operation
  • begin() Returns an iterator to the beginning of the set. Complexity O(1).
  • end() Returns an iterator past the end of the set. Complexity: O(1).
  • lower_bound() Returns an iterator to the first element that is not less than a specified value. Complexity O(log n).
  • upper_bound() Returns an iterator to the first element that is greater than a specified value. Complexity: O(log n).

Example of Erasing an Element from Set

#include <iostream>
#include <set>

int main() {
    std::set<int> mySet = {10, 20, 30, 40, 50};

    // Erasing an element with value 30
    mySet.erase(30);

    // Print the set after erasing by value
    std::cout << "After erasing by value 30: ";
    for (const auto& element : mySet) {
        std::cout << element << " ";
    }
    std::cout << std::endl;

    // Finding and erasing an element using an iterator
    auto it = mySet.find(40);
    if (it != mySet.end()) {
        mySet.erase(it);
    }

    // Print the final set after erasing by iterator
    std::cout << "After erasing by iterator pointing to 40: ";
    for (const auto& element : mySet) {
        std::cout << element << " ";
    }
    std::cout << std::endl;

    return 0;
}

Output:
After erasing by value 30: 10 20 40 50
After erasing by iterator pointing to 40: 10 20 50

Example of lower_bound and upper_bound in Set

#include <iostream>
#include <set>

int main() {
    std::set<int> mySet = {10, 20, 30, 40, 50};

    // Using lower_bound() and upper_bound()
    int key = 30;

    // Finding the lower_bound for key
    auto lowerBoundIt = mySet.lower_bound(key);

    // Finding the upper_bound for key
    auto upperBoundIt = mySet.upper_bound(key);

    // Displaying the results
    std::cout << "Set: ";
    for (const auto& element : mySet) {
        std::cout << element << " ";
    }
    std::cout << std::endl;

    std::cout << "Key: " << key << std::endl;
    std::cout << "Lower Bound: ";
    if (lowerBoundIt != mySet.end()) {
        std::cout << *lowerBoundIt;
    } else {
        std::cout << "Not found";
    }
    std::cout << std::endl;

    std::cout << "Upper Bound: ";
    if (upperBoundIt != mySet.end()) {
        std::cout << *upperBoundIt;
    } else {
        std::cout << "Not found";
    }
    std::cout << std::endl;

    return 0;
}

Output:
Set: 10 20 30 40 50
Key: 30
Lower Bound: 30
Upper Bound: 40

Iterator in Set

An iterator is an object that is used to traverse through the elements of a container. It acts as a pointer that points to an element in the container, and it provides a way to access, modify, and navigate the elements of the container. Iterators abstract the underlying details of the container’s implementation, allowing for a uniform way to interact with different types of containers.

Useful Iterators of set in c++ STL

  • begin() Returns an iterator pointing to the first element of the container.
  • end() Returns an iterator pointing one position after the last element of the container.
  • rbegin() Returns a reverse iterator pointing to the last element of the container.
  • rend() Returns a reverse iterator pointing one position before the first element of the container.

Applications of Set

  • Checking uniqueness and duplication: Sets enforce uniqueness, making them ideal for removing duplicate elements from a collection. This is commonly used when ensuring a unique set of items, such as unique user IDs, unique values in a dataset

  • Set Theory Operation Such as Intersection, Union, and Difference: Sets support operations like intersection, union, and difference, making them valuable in scenarios where you need to compare or combine multiple sets of elements.

  • Graph Algorithms: In graph algorithms, sets are often employed to maintain lists of visited nodes, unvisited nodes, or nodes with specific attributes. They are useful for efficiently tracking and manipulating subsets of nodes in a graph.

  • Database Operations: In database systems, sets are employed to perform set operations like union, intersection, and difference on tables or result sets. This is useful in querying and combining data from multiple sources.

  • Data Grouping and Categorization: Sets can be used for grouping and categorizing items based on specific criteria. For example, categorizing items into sets based on their attributes or characteristics.

Run C Programming Online Compiler

To make your learning more effective, exercise the coding examples in the text editor below.

[Run C programming online](https://devsenv.com/page/run-c-programming)
Previous
Introduction to Stack Data Structure with Practical Examples
Next
Introduction to Map Data Structure with Practical Examples