Understanding Hash Tables in Programming
Hash tables are efficient data structures used for storing and retrieving key-value pairs. This guide provides a detailed explanation of hash tables, their implementation in Java and C++, and popular interview problems related to hash tables.
Introduction to Hash Tables
A hash table, also known as a hash map, is a data structure that implements an associative array abstract data type, a structure that can map keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
Key features:
- Fast access for lookups, insertions, and deletions (average-case O(1) time complexity)
- Efficient for large datasets
- Widely used in database indexing, caches, and symbol tables in compilers
Hash Functions
A hash function is used to convert keys into array indices. An ideal hash function should:
- Be deterministic: Equal keys produce equal hash values
- Be efficient to compute
- Uniformly distribute keys across the hash table
Common hash functions include:
- Division method: h(k) = k mod m, where m is the table size
- Multiplication method: h(k) = ⌊m(kA mod 1)⌋, where A is a constant between 0 and 1
- Universal hashing: Choose a hash function randomly from a family of functions
Collision Resolution
Collisions occur when two different keys hash to the same index. There are two main approaches to handle collisions:
-
Chaining: Each bucket is independent, and has a list of entries with the same index. The worst-case scenario is when all the keys are mapped to the same index.
-
Open Addressing: All entry records are stored in the bucket array itself. When a collision occurs, the program probes for the next open slot in the array. Common probing methods include:
- Linear Probing: If a collision occurs at h(k), try h(k)+1, then h(k)+2, etc.
- Quadratic Probing: If a collision occurs at h(k), try h(k)+1^2, then h(k)+2^2, etc.
- Double Hashing: Use a second hash function to determine the probe sequence.
Load Factor and Resizing
The load factor of a hash table is the ratio of the number of stored entries to the size of the hash table. As the load factor increases, the probability of collisions increases.
To maintain good performance, the hash table should be resized (usually doubled in size) when the load factor exceeds a certain threshold (typically 0.75). Resizing involves:
- Creating a new, larger array
- Rehashing all existing entries into the new array
Implementation in Java and C++
Let's implement a basic hash table using chaining for collision resolution in both Java and C++.
Java Implementation
import java.util.LinkedList;
class HashTable<K, V> {
private static final int INITIAL_CAPACITY = 16;
private static final double LOAD_FACTOR = 0.75;
private LinkedList<Entry<K, V>>[] buckets;
private int size;
@SuppressWarnings("unchecked")
public HashTable() {
buckets = new LinkedList[INITIAL_CAPACITY];
size = 0;
}
private static class Entry<K, V> {
K key;
V value;
Entry(K key, V value) {
this.key = key;
this.value = value;
}
}
private int hash(K key) {
return Math.abs(key.hashCode()) % buckets.length;
}
public void put(K key, V value) {
if (key == null) throw new IllegalArgumentException("Key cannot be null");
if ((double) size / buckets.length >= LOAD_FACTOR) {
resize();
}
int index = hash(key);
if (buckets[index] == null) {
buckets[index] = new LinkedList<>();
}
for (Entry<K, V> entry : buckets[index]) {
if (entry.key.equals(key)) {
entry.value = value;
return;
}
}
buckets[index].add(new Entry<>(key, value));
size++;
}
public V get(K key) {
int index = hash(key);
if (buckets[index] != null) {
for (Entry<K, V> entry : buckets[index]) {
if (entry.key.equals(key)) {
return entry.value;
}
}
}
return null;
}
public void remove(K key) {
int index = hash(key);
if (buckets[index] != null) {
buckets[index].removeIf(entry -> entry.key.equals(key));
size--;
}
}
@SuppressWarnings("unchecked")
private void resize() {
LinkedList<Entry<K, V>>[] oldBuckets = buckets;
buckets = new LinkedList[oldBuckets.length * 2];
size = 0;
for (LinkedList<Entry<K, V>> bucket : oldBuckets) {
if (bucket != null) {
for (Entry<K, V> entry : bucket) {
put(entry.key, entry.value);
}
}
}
}
}
C++ Implementation
#include <vector>
#include <list>
#include <stdexcept>
#include <functional>
template<typename K, typename V>
class HashTable {
private:
static const int INITIAL_CAPACITY = 16;
static const double LOAD_FACTOR;
struct Entry {
K key;
V value;
Entry(const K& k, const V& v) : key(k), value(v) {}
};
std::vector<std::list<Entry>> buckets;
size_t size;
size_t hash(const K& key) const {
return std::hash<K>{}(key) % buckets.size();
}
void resize() {
std::vector<std::list<Entry>> oldBuckets = std::move(buckets);
buckets.resize(oldBuckets.size() * 2);
size = 0;
for (const auto& bucket : oldBuckets) {
for (const auto& entry : bucket) {
put(entry.key, entry.value);
}
}
}
public:
HashTable() : buckets(INITIAL_CAPACITY), size(0) {}
void put(const K& key, const V& value) {
if (static_cast<double>(size) / buckets.size() >= LOAD_FACTOR) {
resize();
}
size_t index = hash(key);
for (auto& entry : buckets[index]) {
if (entry.key == key) {
entry.value = value;
return;
}
}
buckets[index].emplace_back(key, value);
size++;
}
V get(const K& key) const {
size_t index = hash(key);
for (const auto& entry : buckets[index]) {
if (entry.key == key) {
return entry.value;
}
}
throw std::out_of_range("Key not found");
}
void remove(const K& key) {
size_t index = hash(key);
auto& bucket = buckets[index];
for (auto it = bucket.begin(); it != bucket.end(); ++it) {
if (it->key == key) {
bucket.erase(it);
size--;
return;
}
}
}
};
template<typename K, typename V>
const double HashTable<K, V>::LOAD_FACTOR = 0.75;
Popular Interview Problems
Let's look at some popular hash table-related interview problems and their solutions in both Java and C++.
1. Two Sum
Problem: Given an array of integers and a target sum, return indices of two numbers such that they add up to the target.
Java Solution
import java.util.HashMap;
class Solution {
public int[] twoSum(int[] nums, int target) {
HashMap<Integer, Integer> map = new HashMap<>();
for (int i = 0; i < nums.length; i++) {
int complement = target - nums[i];
if (map.containsKey(complement)) {
return new int[] { map.get(complement), i };
}
map.put(nums[i], i);
}
throw new IllegalArgumentException("No two sum solution");
}
}
C++ Solution
class Solution {
public:
vector<int> twoSum(vector<int>& nums, int target) {
unordered_map<int, int> map;
for (int i = 0; i < nums.size(); i++) {
int complement = target - nums[i];
if (map.find(complement) != map.end()) {
return {map[complement], i};
}
map[nums[i]] = i;
}
throw runtime_error("No two sum solution");
}
};
Explanation: We use a hash table to store each number and its index. For each number, we calculate its complement (target - number) and check if this complement exists in the hash table. If it does, we've found our pair. If not, we add the current number and its index to the hash table. This solution has O(n) time complexity.
2. First Non-Repeating Character
Problem: Given a string, find the first non-repeating character in it and return its index. If it doesn't exist, return -1.
Java Solution
class Solution {
public int firstUniqChar(String s) {
HashMap<Character, Integer> count = new HashMap<>();
for (char c : s.toCharArray()) {
count.put(c, count.getOrDefault(c, 0) + 1);
}
for (int i = 0; i < s.length(); i++) {
if (count.get(s.charAt(i)) == 1) {
return i;
}
}
return -1;
}
}
C++ Solution
class Solution {
public:
int firstUniqChar(string s) {
unordered_map<char, int> count;
for (char c : s) {
count[c]++;
}
for (int i = 0; i < s.length(); i++) {
if (count[s[i]] == 1) {
return i;
}
}
return -1;
}
};
Explanation: We use a hash table to count the occurrences of each character in the string. Then, we iterate through the string again, and for each character, we check its count in the hash table. The first character with a count of 1 is our answer. This solution has O(n) time complexity.
These problems demonstrate the power of hash tables in solving common algorithmic challenges efficiently. They showcase how hash tables can be used to trade space for time, often resulting in optimal time complexity solutions.
Remember, when solving hash table problems:
- Consider using hash tables when you need to store and quickly retrieve key-value pairs
- Be aware of the potential for collisions and how they're handled
- Think about the choice of hash function and its impact on performance
- Consider the load factor and when resizing might be necessary
By mastering these concepts and practicing various hash table problems, you'll be well-prepared for hash table-related interview questions.