From 755a0530cb4d6be3c37aaf03bf5f31ad89f91d55 Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Mon, 30 May 2016 13:54:57 +0530 Subject: [PATCH 01/11] First Draft of Data-Structures-Trie.md --- Data-Structures-Trie.md | 97 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 Data-Structures-Trie.md diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md new file mode 100644 index 0000000000..a27a3b1f81 --- /dev/null +++ b/Data-Structures-Trie.md @@ -0,0 +1,97 @@ +# Trie + +## Introduction to Trie + +The word trie is an inflix of the word "re**trie**val", because the trie can find a single word in a dictionary with only a prefix of the word. +Trie is an efficient data retrieval data structure, using trie, search complexities can be brought to an optimal limit, i.e. length of the string. +It is a multi-way tree structure useful for storing strings over an alphabet. +It has been used to store large dictionaries (do not confuse this with the dictionaries of python) of any language, say English, words in spell-checking programs. +However, the penalty on tries is the storage requirement. + +## What is a trie? + +A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. +For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. Yes, you obviously can get information using a hash table, but the worst time complexity for fetching data from a hash table is `O(n)`, where `n` is the number of strings stored. +But when you use a trie for fetching data, the time complexity to get data is `O(w)`, where `w` is the length of the string. + +## How to implement a trie? + +A trie typically, looks something like this, + +![Trie](https://community.topcoder.com/i/education/alg_tries.png) + +The above image stores, assoc, algo, all, also, tree, trie. + +Let's implement a trie in python, for storing words with their meanings from english dictionary. + +```python +ALPHABET_SIZE = 26 # For English + +class TrieNode: + def __init__(self): + self.edges = [None]*(ALPHABET_SIZE) # Each index respective to each character. + self.meaning = None # Meaning of the word. + self.ends_here = False # Tells us if the word ends here. +``` +As you can see, edges are 26 in length, each index referring to each character in the alphabet. 'A' corresponding to 0, 'B' to 1, 'C' to 2 ... 'Z' to 25th index. If the character you are looking for is pointing to `None`, that implies the word is not there in the trie. + +A typical Trie should implement at least these two functions: + - add_word(word,meaning) + - search_word(word) +Additionally, one can also add something like + - get_all_words() + - get_all_words_with_prefix(prefix) + +#### Adding Word to the trie + +```python + def add_word(self,word,meaning): + if len(word)==0: + self.ends_here = True # Because we have reached the end of the word + self.meaning = meaning # Adding the meaning to that node + return + ch = word[0] # First character + # ASCII value of the first character (minus) the ASCII value of 'a'-> the first character of our ALPHABET gives us the index of the edge we have to look up. + index = ord(ch) - ord('a') + if self.edges[index] == None: + # This implies that there's no prefix with this character yet. + new_node = TrieNode() + self.edges[index] = new_node + + self.edges[index].add(word[1:],meaning) #Adding the remaining word + +``` + +#### Retrieving data + +```python + def search_word(self,word): + if len(word)==0 and self.ends_here: + return True + ch = word[0] + index = ord(ch)-ord('a') + if self.edge[index]== None: + return False + else: + return self.edge[index].search_word(word[1:]) + +``` + +The `search_word` function will tell us if the word exists in the Trie or not. Since ours is a dictionary, we need to fetch the meaning as well, now lets declare a function to do that. + +```python + def get_meaning(self,word): + if len(word)==0 and self.ends_here: + return self.meaning + ch = word[0] + index = ord(ch) - ord('a') + if self.edges[index] == None: + return "Word doesn't exist in the Trie" + else: + return self.edges[index].get_meaning(word[1:]) +``` + +## Resources + +- For further reading, you can try this [topcoder](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/) tutorial. +- Also, a tutorial from [geeksforgeeks](http://www.geeksforgeeks.org/trie-insert-and-search/) \ No newline at end of file From 4322f4ad3ead18cdaaebbf5c962ab6b657cdd524 Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Mon, 30 May 2016 13:56:49 +0530 Subject: [PATCH 02/11] Added repl --- Data-Structures-Trie.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index a27a3b1f81..48149f0a11 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -91,6 +91,8 @@ The `search_word` function will tell us if the word exists in the Trie or not. S return self.edges[index].get_meaning(word[1:]) ``` +:rocket: [Run Code](https://repl.it/CWaJ) + ## Resources - For further reading, you can try this [topcoder](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/) tutorial. From dec4b98184de14a2ab540ea072a9d24ba4dd5bfe Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Mon, 30 May 2016 15:05:39 +0530 Subject: [PATCH 03/11] Implemented suggestions --- Data-Structures-Trie.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index 48149f0a11..024d7b65f6 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -4,15 +4,15 @@ The word trie is an inflix of the word "re**trie**val", because the trie can find a single word in a dictionary with only a prefix of the word. Trie is an efficient data retrieval data structure, using trie, search complexities can be brought to an optimal limit, i.e. length of the string. -It is a multi-way tree structure useful for storing strings over an alphabet. -It has been used to store large dictionaries (do not confuse this with the dictionaries of python) of any language, say English, words in spell-checking programs. +It is a multi-way tree structure useful for storing strings over an alphabet, when we are storing them. +It has been used to store large dictionaries of English, say, words in spell-checking programs. However, the penalty on tries is the storage requirement. ## What is a trie? A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. -For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. Yes, you obviously can get information using a hash table, but the worst time complexity for fetching data from a hash table is `O(n)`, where `n` is the number of strings stored. -But when you use a trie for fetching data, the time complexity to get data is `O(w)`, where `w` is the length of the string. +For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. +Yes, you obviously can get information using a hash table, but, the hash tables can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. ## How to implement a trie? @@ -20,7 +20,7 @@ A trie typically, looks something like this, ![Trie](https://community.topcoder.com/i/education/alg_tries.png) -The above image stores, assoc, algo, all, also, tree, trie. +This is an image of a Trie, which stores the words {assoc, algo, all, also, tree, trie}. Let's implement a trie in python, for storing words with their meanings from english dictionary. @@ -36,11 +36,11 @@ class TrieNode: As you can see, edges are 26 in length, each index referring to each character in the alphabet. 'A' corresponding to 0, 'B' to 1, 'C' to 2 ... 'Z' to 25th index. If the character you are looking for is pointing to `None`, that implies the word is not there in the trie. A typical Trie should implement at least these two functions: - - add_word(word,meaning) - - search_word(word) + - `add_word(word,meaning)` + - `search_word(word)` Additionally, one can also add something like - - get_all_words() - - get_all_words_with_prefix(prefix) + - `get_all_words()` + - `get_all_words_with_prefix(prefix)` #### Adding Word to the trie From c47fd31d0b047297ed80dea161e4687a0739767c Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Mon, 30 May 2016 15:20:45 +0530 Subject: [PATCH 04/11] Added delete_word --- Data-Structures-Trie.md | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index 024d7b65f6..73506eff3d 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -38,6 +38,7 @@ As you can see, edges are 26 in length, each index referring to each character i A typical Trie should implement at least these two functions: - `add_word(word,meaning)` - `search_word(word)` + - `delete_word(word)` Additionally, one can also add something like - `get_all_words()` - `get_all_words_with_prefix(prefix)` @@ -66,8 +67,11 @@ Additionally, one can also add something like ```python def search_word(self,word): - if len(word)==0 and self.ends_here: - return True + if len(word)==0: + if self.ends_here: + return True + else: + return "Word doesn't exist in the Trie" ch = word[0] index = ord(ch)-ord('a') if self.edge[index]== None: @@ -81,8 +85,11 @@ The `search_word` function will tell us if the word exists in the Trie or not. S ```python def get_meaning(self,word): - if len(word)==0 and self.ends_here: - return self.meaning + if len(word)==0 : + if self.ends_here: + return self.meaning + else: + return "Word doesn't exist in the Trie" ch = word[0] index = ord(ch) - ord('a') if self.edges[index] == None: @@ -91,6 +98,27 @@ The `search_word` function will tell us if the word exists in the Trie or not. S return self.edges[index].get_meaning(word[1:]) ``` +#### Deleting data + +By deleting data, you just need to change the variable `ends_here` to `False`. Doing that doesn't alter the prefixes, but stills deletes the meaning and the existence of the word from the trie. + +```python + def delete_word(self,word): + if len(word)==0: + if self.ends_here: + self.ends_here = False + self.meaning = None + return "Deleted" + else: + return "Word doesn't exist in the Trie" + ch = word[0] + index = ord(ch) - ord('a') + if self.edges[index] == None: + return "Word doesn't exist in the Trie" + else: + return self.edges[index].delete_word(word[1:]) +``` + :rocket: [Run Code](https://repl.it/CWaJ) ## Resources From 8fcb39069dc9267a1c20fe2b748129b562be4587 Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Mon, 30 May 2016 15:23:12 +0530 Subject: [PATCH 05/11] Moved image to the upper section --- Data-Structures-Trie.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index 73506eff3d..2746d7ea4b 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -14,14 +14,14 @@ A trie is a tree like data structure which stores strings, and helps you find th For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. Yes, you obviously can get information using a hash table, but, the hash tables can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. -## How to implement a trie? - A trie typically, looks something like this, ![Trie](https://community.topcoder.com/i/education/alg_tries.png) This is an image of a Trie, which stores the words {assoc, algo, all, also, tree, trie}. +## How to implement a trie? + Let's implement a trie in python, for storing words with their meanings from english dictionary. ```python From 050c81cc6b2ccca071f6dfc7adaaf48dfc24ee4d Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Mon, 30 May 2016 15:25:24 +0530 Subject: [PATCH 06/11] Formatting corrections --- Data-Structures-Trie.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index 2746d7ea4b..d457fa130b 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -13,7 +13,6 @@ However, the penalty on tries is the storage requirement. A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. Yes, you obviously can get information using a hash table, but, the hash tables can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. - A trie typically, looks something like this, ![Trie](https://community.topcoder.com/i/education/alg_tries.png) @@ -36,12 +35,15 @@ class TrieNode: As you can see, edges are 26 in length, each index referring to each character in the alphabet. 'A' corresponding to 0, 'B' to 1, 'C' to 2 ... 'Z' to 25th index. If the character you are looking for is pointing to `None`, that implies the word is not there in the trie. A typical Trie should implement at least these two functions: - - `add_word(word,meaning)` - - `search_word(word)` - - `delete_word(word)` + +- `add_word(word,meaning)` +- `search_word(word)` +- `delete_word(word)` + Additionally, one can also add something like - - `get_all_words()` - - `get_all_words_with_prefix(prefix)` + +- `get_all_words()` +- `get_all_words_with_prefix(prefix)` #### Adding Word to the trie From a5939d4031f10a6630ef114e91cb45393da095b9 Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Mon, 30 May 2016 15:27:43 +0530 Subject: [PATCH 07/11] repl fix --- Data-Structures-Trie.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index d457fa130b..d5bddf10cc 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -121,7 +121,7 @@ By deleting data, you just need to change the variable `ends_here` to `False`. D return self.edges[index].delete_word(word[1:]) ``` -:rocket: [Run Code](https://repl.it/CWaJ) +:rocket: [Run Code](https://repl.it/CWbr) ## Resources From 9d59049415eef9ffcc76d1ff96f2365da0b9b679 Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Tue, 31 May 2016 12:12:20 +0530 Subject: [PATCH 08/11] Added Data Structure prefix to the title --- Data-Structures-Trie.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index d5bddf10cc..c075bfb683 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -1,4 +1,4 @@ -# Trie +# Data Structure Trie ## Introduction to Trie @@ -10,8 +10,8 @@ However, the penalty on tries is the storage requirement. ## What is a trie? -A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. -For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. +A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. +For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. Yes, you obviously can get information using a hash table, but, the hash tables can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. A trie typically, looks something like this, @@ -41,7 +41,7 @@ A typical Trie should implement at least these two functions: - `delete_word(word)` Additionally, one can also add something like - + - `get_all_words()` - `get_all_words_with_prefix(prefix)` @@ -52,7 +52,7 @@ Additionally, one can also add something like if len(word)==0: self.ends_here = True # Because we have reached the end of the word self.meaning = meaning # Adding the meaning to that node - return + return ch = word[0] # First character # ASCII value of the first character (minus) the ASCII value of 'a'-> the first character of our ALPHABET gives us the index of the edge we have to look up. index = ord(ch) - ord('a') @@ -112,7 +112,7 @@ By deleting data, you just need to change the variable `ends_here` to `False`. D self.meaning = None return "Deleted" else: - return "Word doesn't exist in the Trie" + return "Word doesn't exist in the Trie" ch = word[0] index = ord(ch) - ord('a') if self.edges[index] == None: @@ -126,4 +126,4 @@ By deleting data, you just need to change the variable `ends_here` to `False`. D ## Resources - For further reading, you can try this [topcoder](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/) tutorial. -- Also, a tutorial from [geeksforgeeks](http://www.geeksforgeeks.org/trie-insert-and-search/) \ No newline at end of file +- Also, a tutorial from [geeksforgeeks](http://www.geeksforgeeks.org/trie-insert-and-search/) From fcf03aafe969a269cdfa73153f5dfa5ca1a5e6d0 Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Tue, 31 May 2016 13:25:02 +0530 Subject: [PATCH 09/11] Added link to Hashtables article --- Data-Structures-Trie.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index c075bfb683..8251bc76d9 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -12,7 +12,7 @@ However, the penalty on tries is the storage requirement. A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. -Yes, you obviously can get information using a hash table, but, the hash tables can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. +Yes, you obviously can get information using a hash table, but, the [hash tables](https://freecodecamp.github.io/wiki/en/hash-tables-and-hashing-functions/) can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. A trie typically, looks something like this, ![Trie](https://community.topcoder.com/i/education/alg_tries.png) From e686adad82d36ff172e95eb2c1a225c9d74caf8e Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Tue, 31 May 2016 13:29:16 +0530 Subject: [PATCH 10/11] Updated reference --- Data-Structures-Trie.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index 8251bc76d9..c49ecd6a9d 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -12,7 +12,7 @@ However, the penalty on tries is the storage requirement. A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. -Yes, you obviously can get information using a hash table, but, the [hash tables](https://freecodecamp.github.io/wiki/en/hash-tables-and-hashing-functions/) can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. +Yes, you obviously can get information using a hash table, but, the [hash tables](Hash-Tables-And-Hashing-Functions.md) can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. A trie typically, looks something like this, ![Trie](https://community.topcoder.com/i/education/alg_tries.png) From 9bb14845d3340c36ab82c80f4eb9a64fcb1c2c5c Mon Sep 17 00:00:00 2001 From: Deepak Sattiraju Date: Tue, 31 May 2016 17:19:07 +0530 Subject: [PATCH 11/11] Updated --- Data-Structures-Trie.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Data-Structures-Trie.md b/Data-Structures-Trie.md index c49ecd6a9d..e9c1333490 100644 --- a/Data-Structures-Trie.md +++ b/Data-Structures-Trie.md @@ -12,7 +12,7 @@ However, the penalty on tries is the storage requirement. A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string. For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information. -Yes, you obviously can get information using a hash table, but, the [hash tables](Hash-Tables-And-Hashing-Functions.md) can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. +Yes, you obviously can get information using a hash table, but, the [hash tables](Hash-Tables-And-Hashing-Functions) can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table. A trie typically, looks something like this, ![Trie](https://community.topcoder.com/i/education/alg_tries.png)