Introduction to Trie Data Structure with Practical Examples

Ever wondered how search engines suggest related words as you type? It’s all thanks to a clever data structure called Trie. Tries also play a vital role in implementing contact lists or address books, allowing for fast searching and retrieval of contact information. When you search for a contact by typing a few letters of their name, the trie efficiently narrows down the search space, making it quick and effortless to find the desired contact

Trie Data Structure

Trie is a tree-like data structure used for efficiently storing and retrieving strings or keys. Unlike traditional data structures such as arrays, linked lists, or hash tables, tries excel at operations involving strings and are particularly adept at tasks like autocomplete, spell checking, and searching for words in lengthy texts. In simpler terms, a trie is a tree structure that starts from a root and branches out into different paths. Each edge in this tree represents a letter, and when you add words or strings, you’re essentially creating new edges or extending existing ones. So, every point in the trie corresponds to a specific word, and you can follow the edges from the root to that point to spell out the word it represents.
Let’s see a simple example of trie data structure with the words “CAT”, “DOG”, “DOC” and “CAR”

Trie

If we traverse the path from the root to node 5 of the trie, we find the word “CAT” in the trie. This allows us to find words present in the trie.

Building Trie Data Structure

Building a Trie involves creating a tree-like data structure where each node represents a single character of a word. Each node holds a pointer to its children nodes, indicating possible next characters in the word. Edges between nodes represent characters, and they are created dynamically as new words are added to the Trie. They are created to efficiently store and retrieve words, as well as to facilitate operations like prefix matching, searching for words.

Node Structure of Trie

In a Trie, each node typically holds the following:

  1. Character data for the edge from parent to the current node.
  2. Pointer to the parent node for traversal and deletion operations.
  3. Array or hashmap of pointers to child nodes representing possible characters following the current character.
  4. Word ending indicator to mark the end of a complete word.
  5. Optional: Counter to track the frequency of word occurrences, useful for applications like autocomplete or spell check.
const int ALPHABET_SIZE = 26; // Number of characters in the alphabet

struct TrieNode {
    int childIndices[ALPHABET_SIZE]; // Array to store indices of child TrieNodes
    bool isEndOfWord; // Indicates if the node represents the end of a word

    TrieNode() {
        fill(begin(childIndices), end(childIndices), -1); // Initialize all elements of childIndices to -1
        isEndOfWord = false; // Initially, no word ends at this node
    }
};
  • const int ALPHABET_SIZE = 26 This constant defines the size of the alphabet, assuming lowercase English letters. It specifies the number of possible characters that can be stored in each TrieNode.

  • struct TrieNode This struct represents a node in the Trie data structure. Each node holds information about the characters in the words stored in the Trie.

  • int childIndices[ALPHABET_SIZE] This array holds the indices of child TrieNodes corresponding to each character in the alphabet. For example, childIndices[0] stores the index of the child node representing the character a, childIndices[1] stores the index for b, and so on.

  • bool isEndOfWord This boolean variable indicates whether the current node represents the end of a complete word. If isEndOfWord is true, it means that a word ends at this node in the Trie.

  • TrieNode() This is the constructor function for TrieNode. It initializes a new TrieNode.

  • isEndOfWord = false This line sets the isEndOfWord variable to false by default in the constructor. It ensures that when a new TrieNode is created, it is not considered the end of any word until explicitly marked otherwise.

Inserting into Trie

To insert a word into the trie, we begin at the root node of the Trie. For each character in the word, we calculate the index of the character based on its position in the alphabet.Then we check if there is a child node for this character at the current node. If not, we create a new child node for this character and move to it.

// Function to insert a word into the Trie
void insertWord(const string& word) {
    int currentNodeIndex = 0; // Start at the root node
    for (char ch : word) {
        int index = ch - 'a'; // Convert character to an index (0 to 25) based on its position in the alphabet
        if (trie[currentNodeIndex].childIndices[index] == -1) {
            trie[currentNodeIndex].childIndices[index] = trie.size(); // Create a new node
            trie.emplace_back(); // Add a new TrieNode to the trie vector
        }
        currentNodeIndex = trie[currentNodeIndex].childIndices[index]; // Move to the next node
    }
    trie[currentNodeIndex].isEndOfWord = true; // Mark the end of the word
}

The time complexity of inserting a word into a Trie is O(L), where L is the length of the word. This is because we iterate over each character of the word and perform constant-time operations to traverse the Trie and create new nodes as needed.

Searching in Trie

To search for a word in the trie, we start at the root node of the Trie. For each character in the target word, we calculate the index of the character based on its position in the alphabet. Then we check if there is a child node for this character at the current node. If there is no child node for the character, the word is not present in the trie, and we return false. If a child node exists, we move to it and continue the search process. Once all characters in the word have been processed, we reach the end of the word. If the current node is marked as the end of a word, we return true, indicating that the word is found in the trie. Otherwise, if the end-of-word marker is not set, the word is a prefix of another word in the trie, but not a complete word, so we return false.

// Function to search for a word in the Trie
bool searchWord(const string& word) {
    int currentNodeIndex = 0; // Start at the root node
    for (char ch : word) {
        int index = ch - 'a';
        if (trie[currentNodeIndex].childIndices[index] == -1) {
            return false; // Word not found
        }
        currentNodeIndex = trie[currentNodeIndex].childIndices[index];
    }
    return trie[currentNodeIndex].isEndOfWord;
}

The time complexity of searching for a word in a Trie is also O(L), where L is the length of the target word. Similar to insertion, we iterate over each character of the word and traverse the Trie to check if the word exists.

Trie Implementation Example

Lets try to solve the following problem with knowledge of trie data structure we have so far,

You are given a list of words [“apple”, “banana”, “application”, “bat”, “ball”] and a target word. Your task is to determine whether the target word exists in the list of words.

#include <iostream>
#include <string>
#include <array>
#include <vector>
using namespace std;

const int ALPHABET_SIZE = 26; // Number of characters in the alphabet

struct TrieNode {
    int childIndices[ALPHABET_SIZE]; // Array to store indices of child TrieNodes
    bool isEndOfWord; // Indicates if the node represents the end of a word

    TrieNode() {
        fill(begin(childIndices), end(childIndices), -1); // Initialize all elements of childIndices to -1
        isEndOfWord = false; // Initially, no word ends at this node
    }
};

vector<TrieNode> trie(1);

// Function to insert a word into the Trie
void insertWord(const string& word) {
    int currentNodeIndex = 0; // Start at the root node
    for (char ch : word) {
        int index = ch - 'a';
        if (trie[currentNodeIndex].childIndices[index] == -1) {
            trie[currentNodeIndex].childIndices[index] = trie.size(); // Create a new node
            trie.emplace_back();
        }
        currentNodeIndex = trie[currentNodeIndex].childIndices[index];
    }
    trie[currentNodeIndex].isEndOfWord = true; // Mark the end of the word
}

// Function to search for a word in the Trie
bool searchWord(const string& word) {
    int currentNodeIndex = 0; // Start at the root node
    for (char ch : word) {
        int index = ch - 'a';
        if (trie[currentNodeIndex].childIndices[index] == -1) {
            return false; // Word not found
        }
        currentNodeIndex = trie[currentNodeIndex].childIndices[index];
    }
    return trie[currentNodeIndex].isEndOfWord; // Check if the current node represents the end of a word
}

int main() {
    // Insert words into the Trie
    vector<string> words = {"apple", "banana", "application", "bat", "ball"};
    for (const string& word : words) {
        insertWord(word);
    }

    // Target word to search
    string targetWord = "banana";

    // Search for the target word in the Trie
    bool found = searchWord(targetWord);

    // Output the result
    if (found) {
        cout << "The word \"" << targetWord << "\" exists in the list of words." << endl;
    } else {
        cout << "The word \"" << targetWord << "\" does not exist in the list of words." << endl;
    }
    
    targetWord = "app";
    found = searchWord(targetWord);

    // Output the result
    if (found) {
        cout << "The word \"" << targetWord << "\" exists in the list of words." << endl;
    } else {
        cout << "The word \"" << targetWord << "\" does not exist in the list of words." << endl;
    }

    return 0;
}

Output:
The word "banana" exists in the list of words.
The word "app" does not exist in the list of words.

Trie in PBDS C++

PBDS stands for Policy-based Data Structures, and it’s a part of the GNU C++ Standard Template Library (STL) extensions. PBDS provides implementations of various data structures, such as sets, tries, maps, and heaps, with customizable policies.

Necessary Headers and Namespace for PBDS

To use Policy-based Data Structures (PBDS) in C++, we need to include the necessary header files and use the appropriate namespace.

// header files for  Policy-based Data Structures
#include <ext/pb_ds/assoc_container.hpp> 
#include <ext/pb_ds/tree_policy.hpp>
// Namespace for Policy-based Data Structures
using namespace __gnu_pbds; 

PBDS Trie Data Structure Implementation


#include <bits/stdc++.h>
using namespace std;

#include <ext/pb_ds/tree_policy.hpp>
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;

typedef trie<string, null_type, trie_string_access_traits<>, pat_trie_tag, trie_prefix_search_node_update> trie_type;


int main() {

    trie_type trie;
    // Insert words into the Trie
    trie.insert("hello");
    trie.insert("world");

    // Search for words in the Trie
    cout << "Search 'hello': " << (trie.find("hello") != trie.end() ? "Found" : "Not Found") << endl;
    trie.erase("hello");
    cout << "After erasing Search 'hello': " << (trie.find("hello") != trie.end() ? "Found" : "Not Found") << endl;

    return 0;
}

Functions in PBDS Trie Data Structure

  • insert() Inserts a key into the Trie. It takes the key as input and adds it to the Trie. If the key already exists, it does not insert a duplicate.

  • erase() Removes a key from the Trie if it exists. It takes the key as input and removes it from the Trie. If the key is not found, no action is taken.

  • find() Searches for a key in the Trie. It takes the key as input and returns an iterator pointing to the key in the Trie if found, otherwise returns end().

  • count() Counts the occurrences of a key in the Trie. It takes the key as input and returns the count of occurrences of the key in the Trie (0 or 1 for a standard Trie).

  • empty() Checks if the Trie is empty. Returns true if the Trie is empty, i.e., it contains no keys, otherwise returns false.

  • size() Returns the number of keys stored in the Trie.

  • clear() Clears the Trie, removing all the keys from it.

Application of Trie Data Structure

  • Auto-completion and Suggestion Systems
  • Spell Checking and Correction
  • IP Routing and Address Lookup
  • Text Mining and Pattern Matching
  • Phone Number Storage and Dialing
  • Genomic Sequence Analysis
  • Compression Algorithms
  • Dictionary and Lexicographic Ordering

Run C Programming Online Compiler

To make your learning more effective, exercise the coding examples in the text editor below.

[Run C programming online](https://devsenv.com/page/run-c-programming)
Previous
Priority Queue Data Structure with Practical Examples
Next
Introduction to Graph Data Structure with Practical Examples