Java Object equals, hashCode, Hashing Collections and Hash Code Collision

Reading Time: 7 minutes

Introduction

In this blog post I’ll be focusing on something very fundamental and assumed as easy in the concept, but fairly the subject requires more focus and better apprehending. Besides that I am quite sure you’ll learn many interesting related to this topic.

Object equals and hashCode Methods

equals and hashCode methods are shipped with the java.lang.Object class as all the objects in Java extend this class even if you don’t implicitly extend it, the JVM will explicitly do favor for you. So what are these methods and the contracts we need to follow up.

The equals() Method

The method is a Boolean type that in the signature accepts another object that is used to compare the given object with the current object. The equals method can be called by collections explicitly or you ca use to compare two given objects. It is that simple. so after this point we need to follow the contacts while filling in our methodology. Before digging into that, I’d to highlight what happens if we don’t override the method.

The original method relies in the Object class checks only the are equal to each other this mainly falls into similarity of the same memory location.

public boolean equals(Object obj) {
        return (this == obj);
    }

The importance comes in handy when we really need to check “equality”. For instance for a given Person object, the equality means when the names and last names are important. So in this matter we need to override and provide our own implementation of the equals method. Before that we need to meet some contacts;

1. Check whether the given object is null,

2.Check whether the given object is the type of the actual object,

3.Check whether the objects are the same,

4.Then last of all check the attributes whether they are the same or not

Here is a proper way of overriding the method to meet the contract criteria;

@Override
        public boolean equals(Object o) {
            if(o == null) return false;
            if (!(o instanceof Person)) return false;
            if (this == o) return true;
            Person person = (Person) o;

            return getName().equals(person.getName());
        }

The hashCode() Method

The default implementation of the hashCode methods returns the memory location of the object, thus if we are going to work on the hashing collections, we need to provide a proper way of working hash code method. In the below “Hash Code Collision” chapter you will study more of this topic for proper implementations and other options. Furthermore in the “Hashing Collection” chapter you’ll also see why the hashCode method is vital to override.

Here is a proper way of overriding the method to meet the contract criteria;

@Override
       public int hashCode() {
           return getName().hashCode();
       }

Now I’d like to give out some examples to see in each situation how the code behaves;

1. Proper equals and hash code methods applied:

public class ProperImplementations {
    private static class Person {
        public Person(String name) {
            this.name = name;
        }

        private String name;

        public String getName() {
            return name;
        }

        @Override
        public boolean equals(Object o) {
            if(o == null) return false;
            if (!(o instanceof Person)) return false;
            if (this == o) return true;
            Person person = (Person) o;

            return getName().equals(person.getName());
        }

        @Override
        public int hashCode() {
            return getName().hashCode();
        }
    }

    public static void main(String[] args) {
        Person person = new Person("Tugrul");
        Person person2 = new Person("Oguz");
        Person person3 = new Person("Oguz");
        System.out.println("Person is equal to Person 2: " + person.equals(person2));
        System.out.println("Person 2 is equal to Person 3: " + person2.equals(person3));
        System.out.println("Person hashcode: " + person.hashCode());
        System.out.println("Person 2 hashcode: " + person2.hashCode());
        System.out.println("Person 3 hashcode: " + person3.hashCode());
    }
}

Console Output:

Person is equal to Person 2: false
Person 2 is equal to Person 3: true
Person hashcode: -1778884893
Person 2 hashcode: 2456221
Person 3 hashcode: 2456221

2. Methods are not overridden;

public class NoImplementation {
    private static class Person {
        public Person(String name) {
            this.name = name;
        }

        private String name;
    }

    public static void main(String[] args) {
        Person person = new Person("Tugrul");
        Person person2 = new Person("Oguz");
        Person person3 = new Person("Oguz");
        System.out.println("Person is equal to Person 2: " + person.equals(person2));
        System.out.println("Person 2 is equal to Person 3: " + person2.equals(person3));
        System.out.println("Person is hashcode: " + person.hashCode());
        System.out.println("Person 2 is hashcode: " + person2.hashCode());
        System.out.println("Person 3 is hashcode: " + person3.hashCode());
    }
}

Console Output:

Person is equal to Person 2: false
Person 2 is equal to Person 3: false
Person is hashcode: 1735600054
Person 2 is hashcode: 21685669
Person 3 is hashcode: 2133927002

Hashing Collections

So now let’s dive into one of the Hashing collection to see how we integrate the equals and hashCode methods with a real-life example. In this chapter I’ll be focusing on HashMap and its internals from Java Version 7, because some of hashing collections including HashMap have been enhanced by Oracle since the version 8. You can read this detailed technical article to see the improvements.

So the HashMap works on an Entry object of arrays not as key and value if you decompile it you will see this entry;

transient Entry[] table;

After this point we will need to dig into the Entry object and how the rest of the operations are handled;

static class Entry<K,V> implements Map.Entry<K,V> {
         final K key;
         V value;
         Entry<K,V> next;
         final int hash;
         // the rest is omitted
}

We know now the era of this data structure. Sometimes in the terminology, to make the understanding of the entry easier, the term “bucket” is used. The term corresponds to the Entry itself in this concept. Now it is time to see the mostly used methods to perform insertion and retrieval of objects in HashMap.

The internals of the put operation

public V put(K key, V value) {
       if (key == null)
           return putForNullKey(value);
       int hash = hash(key.hashCode());
       int i = indexFor(hash, table.length);
       for (Entry<K,V> e = table[i]; e != null; e = e.next) {
           Object k;
           if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
               V oldValue = e.value;
               e.value = value;
               e.recordAccess(this);
               return oldValue;
           }
       }

       modCount++;
       addEntry(hash, key, value, i);
       return null;
   }

Steps:

1.First checked if the key is null
1.1.For the null key: it’s placed in the 0 index,
1.2.If not then the hashCode() method of the Key which is the given object is called to retrieve the hash code,

2.the indexFor() method is then performed to identify the location of the object. It is calculated by the hashCode and the hash table length which is default 16,

3.If the index is empty then the object is inserted at this point. What if the index has another entry? this is the situation called “hash collision”. Then the equals() method is called to check whether the given objects are the same;
3.1. equals() returns true: the old value is replaced by the new value,
3.2. equals() returns false: the new object is linked to the existing object where you see the attribute “next” in the Entry object and the size of the HashMap is increased

The internals of the get operation

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

Steps:

1.First the hash code of the key is calculated and the index is identified,
2.Then we are navigated to the index,
3.If no entry found in the index, the null value is returned,
4.if the entry exists;
4.1.The hashcode is copared,
4.2.1.Hash code matches: the equal methods is called for equality check, if returns true then the value is returned,
4.2.2.Hash code does not match: it will navigate to the next attribute and apply the same pattern in 4.2.1 step recursively, until the hashcode and equal methods return positive values.

Hash Code Collision

A collision happens when distinct keys produce the same hashCode() value that is not unique. In this case, typically the entries will be attached next to each other. Similar to hash code collision, you can also read about this SHA-1 collision security vulnerability found out by professionals. Furthermore, even String class survives with the same situation and will return the same hashCode for the given String values. Observe the below example for the prove;

import java.util.HashMap;

public class HashCodeCollisionExample {
    private static class Person {
        public Person(String name) {
            this.name = name;
        }

        private String name;

        public String getName() {
            return name;
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (!(o instanceof Person)) return false;

            Person person = (Person) o;

            return getName().equals(person.getName());
        }

        @Override
        public int hashCode() {
            return name.hashCode();
        }
    }

    public static void main(String[] args) {
        Person person = new Person("Aa");
        Person person2 = new Person("BB");
        Person person3 = new Person("BB");
        System.out.println("Person is equal to Person 2: " + person.equals(person2));
        System.out.println("Person 2 is equal to Person 3: " + person2.equals(person3));
        System.out.println("Person hashcode: " + person.hashCode());
        System.out.println("Person 2 hashcode: " + person2.hashCode());
        System.out.println("Person 3 hashcode: " + person3.hashCode());
        HashMap<Person, Person> hashMap = new HashMap<>();
        hashMap.put(person, person);
        hashMap.put(person2, person2);
        hashMap.put(person3, person3);
        hashMap.keySet().stream().forEach(p -> System.out.println(p.getName()));
    }
}

Console Output:

Person is equal to Person 2: false
Person 2 is equal to Person 3: true
Person hashcode: 2112
Person 2 hashcode: 2112
Person 3 hashcode: 2112
Aa
BB

Here come some trivial questions in the sense of Hash Collision;

What do we want with uniqueness? Can we not produce random keys?

By using random keys we will never find the inserted values in the HashMap. We shall mind the fact that, the given hash must be re-calculable  at a later to be found at the given index unlike to Encryption mechanisms e.g. hashing passwords that is one way and we just check the hash for the correction in the simple username and password security mechanisms.

What if we get rid of uniqueness and return the same hash, wouldn’t it work that way?

This approach also meets the contact well, however, first we’ll have benefited from the nature of the Hashing collection, which is we can use a simple List collection and give it a go. Other than that, traversing a LinkedList node each time and having all the expensive operations are also another considerations. HashMap is fast when we have the unique numbers and less linked nodes attached to each other.

If you are stuck with the hash code generation here I listed some options you can benefit of;

  • Objects.hash shipped with Java 7 and can be used even higher versions,
  • Apache Commons Hashcode builder, it also does good job but
  • Google Guava Oracle’s new implementation has also inspired from this library as well,
  • Josh Bloch’s Effective Java in Item 9, he emphasizes on using prime numbers, generally Eclipse and Intellij creates hash code by following Josh’s contract.

Collection Interface and Data Structures in Java

Reading Time: 3 minutes

Collection Interface

Introduction

Collections know also as Data Structures are the fundamentals of each and every programming language. Java provides many of useful data structure for different scenarios and needs. In this blog post I’ll cover only mostly used collections as well as keep things simple, not dive into methods and their details. Because those can be easily assumed and Oracle offers very useful java documentations that explain everything in details, plus you can always go and decompile these classes yourself in your IDE.

Vector, ArrayList and LinkedList

Vector

Vector collection is not usually used by developers, however its internal structure is same as the ArrayList identically, however the major difference is that it is synchronized, thus it is thread-safe. Alternatively developers tend to use the “CopyOnWriteArrayList” collection type or call “Collections.synchronizedList()”.

ArrayList

ArrayList is so far mostly used Data Structure in Java. It internally employs a re-sizable array that grows while adding more element into it. Positional access is very fast in ArrayList because it maintains an array, adding is also fairly faster, but remove operation is fairly slower compared to the LinkedList.

transient Object[] elementData;

Properties:

  • Allows null and duplicate values,
  • Initial capacity is 10,
  • Not synchronized,
  • Maintains insertion order(comparable or comparator can be used)

LinkedList

LinkedList internally maintains a double linked list. This design leads to a hierarchy of linking all the objects to themselves, thus each node knows the previous and the next node. A graphical representation of this structure goes as follows;

Furthermore, the major pitfall of the LinkedList is when we need to perform positional access in the data structure which is a Linear time, because the whole structure must be traversed. Other than that it performs much faster in add and remove.

private static class Node<E> {
        E item;
        Node<E> next;
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

Properties:

  • Allows null and duplicate values,
  • Not synchronized,
  • Maintains insertion order(comparable or comparator can be used)

Performance Comparison

HashSet, TreeSet and LinkedHashSet Comparison

HashMap, Hashtable, TreeMap and LinkedHashMap Comparison