Skip to content

Latest commit

 

History

History
164 lines (139 loc) · 5.12 KB

File metadata and controls

164 lines (139 loc) · 5.12 KB

Design a Hashmap

Design a HashMap without using any built-in hash table libraries.

Implement the MyHashMap class:

MyHashMap() initializes the object with an empty map.
void put(int key, int value) inserts a (key, value) pair into the HashMap. 
If the key already exists in the map, update the corresponding value.
int get(int key) returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.
void remove(key) removes the key and its corresponding value if the map contains the mapping for the key.
# Capacity for internal array
INITIAL_CAPACITY = 50

# Node data structure - essentially a LinkedList node
class Node:
   def __init__(self, key, value):
   	self.key = key
   	self.value = value
   	self.next = None
   def __str__(self):
   	return "<Node: (%s, %s), next: %s>" % (self.key, self.value, self.next != None)
   def __repr__(self):
   	return str(self)
# Hash table with separate chaining
class HashTable:
   # Initialize hash table
   def __init__(self):
   	self.capacity = INITIAL_CAPACITY
   	self.size = 0
   	self.buckets = [None]*self.capacity
   # Generate a hash for a given key
   # Input:  key - string
   # Output: Index from 0 to self.capacity
   def hash(self, key):
   	hashsum = 0
   	# For each character in the key
   	for idx, c in enumerate(key):
   		# Add (index + length of key) ^ (current char code)
   		hashsum += (idx + len(key)) ** ord(c)
   		# Perform modulus to keep hashsum in range [0, self.capacity - 1]
   		hashsum = hashsum % self.capacity
   	return hashsum

   # Insert a key,value pair to the hashtable
   # Input:  key - string
   # 		  value - anything
   # Output: void
   def insert(self, key, value):
   	# 1. Increment size
   	self.size += 1
   	# 2. Compute index of key
   	index = self.hash(key)
   	# Go to the node corresponding to the hash
   	node = self.buckets[index]
   	# 3. If bucket is empty:
   	if node is None:
   		# Create node, add it, return
   		self.buckets[index] = Node(key, value)
   		return
   	# 4. Iterate to the end of the linked list at provided index
   	prev = node
   	while node is not None:
   		prev = node
   		node = node.next
   	# Add a new node at the end of the list with provided key/value
   	prev.next = Node(key, value)

   # Find a data value based on key
   # Input:  key - string
   # Output: value stored under "key" or None if not found
   def find(self, key):
   	# 1. Compute hash
   	index = self.hash(key)
   	# 2. Go to first node in list at bucket
   	node = self.buckets[index]
   	# 3. Traverse the linked list at this node
   	while node is not None and node.key != key:
   		node = node.next
   	# 4. Now, node is the requested key/value pair or None
   	if node is None:
   		# Not found
   		return None
   	else:
   		# Found - return the data value
   		return node.value

   # Remove node stored at key
   # Input:  key - string
   # Output: removed data value or None if not found
   def remove(self, key):
   	# 1. Compute hash
   	index = self.hash(key)
   	node = self.buckets[index]
   	prev = None
   	# 2. Iterate to the requested node
   	while node is not None and node.key != key:
   		prev = node
   		node = node.next
   	# Now, node is either the requested node or none
   	if node is None:
   		# 3. Key not found
   		return None
   	else:
   		# 4. The key was found.
   		self.size -= 1
   		result = node.value
   		# Delete this element in linked list
   		if prev is None:
   			self.buckets[index] = node.next # May be None, or the next match
   		else:
   			prev.next = prev.next.next # LinkedList delete by skipping over
   		# Return the deleted result 
   		return result

Solution Explanation

Question1: How the above solves the collisons?

Whenever two keys have the same hash value, it is considered a collision. What should our hash table do? If it just wrote the data into the location anyway, we would be losing the object that is already stored under a different key. With separate chaining, we create a Linked List at each index of our buckets array, containing all keys for a given index. When we need to look up one of those items, we iterate the list until we find the Node matching the requested key.

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None

Each bucket will actually contain a LinkedList of nodes containing the objects stored at that index. This is one method of collision resolution.

Question2: How to handle uneven distribution of keys ?

image

The above is called as uneven distribution , as most of the keys are mapped to bucket 1. Because our hash function must be h(x) = 1. In order to maintain a proper distribution across the buckets , our hash function is like below:

def hash(self, key):
	hashsum = 0
	# For each character in the key

	for idx, c in enumerate(key):
		# Add (index + length of key) ^ (current char code)

		hashsum += (idx + len(key)) ** ord(c)
		# Perform modulus to keep hashsum in range [0, self.capacity - 1]

		hashsum = hashsum % self.capacity
	return hashsum

Screenshot from 2023-04-09 12-22-20