# Tries

This is my first post in the Data Structures category. In this post, I discuss a Python implementation of a widely used, but less widely taught, data structure - *trie*. For a great introduction to tries, check out this video.

## The Data Structure

We begin by creating a class `Node`

that will represent each node in the trie. Each node has zero or more children.

```
class Node:
def __init__(self):
self.children = {}
self.is_complete_word = False
self.num_words = 0
```

The `Node`

class contains three members:

`children`

: To store a node’s children, we use a dictionary. This makes it easier to map a node’s value (which is a single character), to an object of class`Node`

.

**Tip**: Some implementations of trie may store more than one character in a single node to save space. In this implementation, we assume that only a single character will be stored in each node.

`is_complete_word`

: The boolean`is_complete_word`

is used to indicate whether a certain node represents a complete word. For example, consider the following trie: This trie contains three complete words - CAT, CAP and SEA - and seven nodes in total. For nodes containing values T, P and A,`is_complete_word`

will be set to`True`

to indicate that these nodes represent complete words.`num_words`

: The integer`num_words`

denotes the number of*complete*words that can be obtained by traversing all children of some node T. For example, for node`C`

in the trie above,`num_words = 2`

, because we obtain two*complete*words - CAT and CAP - by traversing all children of`C`

. To optimize this retrieval, instead of traversing all children of a node T to find the number of complete words starting with T, we store this information for each node whenever a complete word is inserted into the trie.

## Insert Word

To insert a word into our trie data structure, we create a recursive method `add_word`

. The input to this method is (1) `word`

- the word to be added, and (2) `trie`

- the trie in which the word is to be added.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

def add_word(word, trie):
if not word:
return trie
elif len(word) == 1: # All letters have now been processed
trie.is_complete_word = True
if word[0] not in trie.children: # Letter doesn't exist, so create
trie.children[word[0]] = Node()
else: # Letter exists, increment word counter
trie.children[word[0]].num_words += 1
add_word(word[1:], trie.children[word[0]]) # Process remaining letters
return trie

Our base condition is if `word`

is empty, in which case we simply return the input `trie`

. Line 4 checks if `word`

only contains a single character, in which case all letters have been processed, so we mark the current node as a complete word. As the video by McDowell describes, when adding a new word into a trie, we examine each character in the word in turn, and traverse the trie depthwise until we find the insertion point where to insert the remaining characters. Line 7 checks if the current character `word[0]`

does not exist as a child of the current node `trie`

, and if true, creates a new `Node`

to represent this character. If, however, the character already exists, we simply increment `num_words`

to indicate that the number of complete words starting with `word[0]`

has now increased by one. On line 12, we call `add_word`

recursively to process the remaining characters. Finally, we return `trie`

in which the input `word`

has been added.

## Find Partial

The *find partial* operation enables us to find the number of words that start with a certain combination of letters. For example, in the trie discussed earlier (reproduced below), we may wish to find the number of words starting with CA (hint: the answer is 2).

Recall that, for each node, we store an integer value - `num_words`

- which represents the number of words starting with the character in that node. This significantly simplifies our `find_word`

method, shown below.

1
2
3
4
5
6
7

def find_word(word, trie):
if not word:
return trie.num_words
elif word[0] in trie.children:
return find_word(word[1:], trie.children[word[0]])
return 0

As before for `add_word`

, we write a recursive `find_word`

method that takes in two parameters: (1) `word`

- the word to find in the trie, and (2) `trie`

- the trie to find `word`

in. The output is an integer representing the number of words starting with `word`

.

## Conclusion

That completes my Python implementation of a basic trie! There are several optimizations that can be applied to this implementation, and this post may be updated in the future. Let me know what you think about the implementation in the comments below!

**Note:** This post is a work in progress, and I may modify and/or refine it over time.

## Comments