Hashtable Collisions and String Hashcode
Hash based collections is an interesting aspect of Java language although many developers shy away from using them due to the complexities involved. Some of the complexities are in very subtle form that even experienced programmers tend to make mistakes while using them. In this post I will be talking about Hashtables and collisions while using hash table.
So what is a Hashtable ? Wikipedia defines hashtable like this:
“In computer science, a hash table or hash map is a data structure that uses a hash function to efficiently map certain identifiers or keys (e.g., person names) to associated values (e.g., their telephone numbers). The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought. ”
Now Consider this code snippet
import java.util.HashMap;
import java.util.Map;
public class HashFunction {
private int hashCode;
@SuppressWarnings("unchecked")
public static void main(String[] args) {
Map hashTable = new HashMap();
HashFunction key1 = new HashFunction();
HashFunction key2 = new HashFunction();
hashTable.put(key1, 1);
hashTable.put(key2, 2);
System.out.println("Size = " + hashTable.size());
System.out.println("Value when key1 is searched:"+hashTable.get(key1));
System.out.println("Value when key2 is searched:"+hashTable.get(key2));
}
public int hashCode() {
hashCode=10;
return hashCode;
}
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null || getClass() != obj.getClass())
return false;
HashFunction other = (HashFunction) obj;
if (hashCode != other.hashCode)
return false;
return true;
}
}
What will be the output of this program? See the output given below.
Size = 1
Value when key1 is searched:2
Value when key2 is searched:2
Is that result surprising? If it is not surprising to you then there is no need to read further as you are already a master of hashtable
. For the rest, here is the explanation for this behaviour.
Size=1 means there is only one value inside the hashtable. And when we searched for a key it returned only key2 value. So we can infer that key2 is the only value inside the hash table. But hang on. How come it returned key2 value when we were searching for key1. Why did this happen? For understanding that we need to look into how hashtable works.
Hashtable consists of key/value pairs which are internally stored in buckets/bins. The values are stored in appropriate buckets based on the hashValue computed for the key.
So far the concept is good, but why did our program behave differently
Now what happened in our code? When we put key2, the hashcodes of the key are same and so collision happens. When we put a value into a hastable, it calculates the hashcode of the key first and if there is already a value present then it calls the equals method and only if they are not equal, it will insert new value into the hashtable. If the Equals method is also returning true then the ‘put’ operation will replace the existing value and return the old value. See the documentation of put to know more about this operation
Hashtable put operation
Now you understand why people like Joshua Bloch and Brian Goetz reiterates the importance of implementing hashcode and equals properly. Please note that the hashcode calculation and equals method in the above program is just to show you this collision and you should use the proper way of implementation as mentioned here. HashCode and Equals
Now how can I correct this program? Replace the hashcode=10; line with the following code
hashcode= System.identityHashCode(this);
Now the output is the following
Size = 2
Value when key1 is searched:1
Value when key2 is searched:2
[update: System.identityHashcode is not guaranteed to produce unique hashcodes and in a 64 bit system there is high chance of collisions. One has to implement a proper equals method to maintain hashtable data integrity. My example is just to explain hashtable collisions and you should always use the proper way of equals implementation as mentioned earlier]
Now time for a bit of fun with string hashcode calculation. What is the output for following code snippet?
String string1 = new String("BB");
String string2 = new String("Aa");
System.out.println("BB hash Code:"+string1.hashCode()+
",Aa hash Code:"+string2.hashCode());
BB hash Code:2112,Aa hash Code:2112
So both of them are having same hashcode. Still there won’t be any collision when you insert them as the key. So let us add a bit more twist. What about the following code?
Hashtable hashtable = new Hashtable(); hashtable.put(string1.hashCode(), 1); hashtable.put(string2.hashCode(), 2);
We created another collision. Please note that we don’t normally put hashcode as the key and instead will put the object. Hashtable implementation will always calculate the hashcode of the key. In this case the key is an integer and so System will calculate hashcode of that integer. I did it in that way just to show that hashcodes of those two strings are same and hence the key as well as key’s hashcodes will be same.


May 14th, 2010 at 10:13 am
Program updated and entry rewritten. As usual your comments are welcome
December 2nd, 2010 at 4:23 am
Hi,
your implementation with the hashcode= System.identityHashCode(this); is still combersome! identityHashCode(this), which falls back to Object.hashCode is still a hashcode and may collide (to different Objects (here: key1 & key2) can have the same hashcode! This is rare but happens if you deal with loads of data). Since your equals method does not look into a value, but only operates with the hashcode, you would get a problem when you deal with several tens of thousands of objects (it might be, that on 64bit systems where the object pointer is a long collisions become extremly rare, but there is no guaranty for that!)
Best
g
January 9th, 2011 at 6:42 am
@hixxxxx . You are absolutely correct. System.identityHashcode is not guaranteed to produce unique hashcodes and in a 64 bit system there is high chance of collisions. One has to implement a proper equals method to maintain hashtable data integrity. My example was just to explain hashtable collisions and I made it clear that people should use the proper way of equals implementation as mentioned here http://www.ibm.com/developerworks/java/library/j-jtp05273.html