Java 7 HashMap Source Code Analysis

Word count: 2.3kReading time: 14 min

 2015/08/05

Linked lists and arrays allow elements to be arranged in an order of our choice. However, if you want to find a specific element but have forgotten its position, you must visit every element until you find it. This can consume significant time if the collection is large. A data structure that allows for rapidly finding objects is the hash table.

HashMap is an implementation of the Map interface based on a hash table. This implementation provides all optional mapping operations and permits null values and null keys. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

1. HashMap Data Structure

HashMap is composed of both an array and linked lists. The underlying structure is an array, where each element serves as the head of a linked list.

HashMap Structure

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;
    // ...
}

Entry is a static internal class within HashMap, package-private, and implements the Map.Entry<K,V> interface. It includes a pointer to the next element in the list.

2. Constructors

HashMap has four constructors:

HashMap(): Constructs an empty HashMap with the default initial capacity (16) and default load factor (0.75).
HashMap(int initialCapacity): Constructs an empty HashMap with a specified initial capacity and the default load factor (0.75).
HashMap(int initialCapacity, float loadFactor): Constructs an empty HashMap with a specified initial capacity and load factor.
HashMap(Map<? extends K,? extends V> m): Constructs a new HashMap with the same mappings as the specified Map.

In essence, these boil down to two types: one where you specify initial capacity and load factor, and another that generates a new HashMap from an existing map. Let’s look at the first type:

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    // Find a power of 2 >= initialCapacity
    int capacity = 1;
    while (capacity < initialCapacity)
        capacity <<= 1;

    this.loadFactor = loadFactor;
    threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    useAltHashing = sun.misc.VM.isBooted() &&
            (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
    init();
}

The parameters are simple: initialCapacity and loadFactor. initialCapacity defines the initial size of the array, while the product of loadFactor and capacity determines a threshold. The maximum threshold is (1 << 30) + 1. The capacity is always a power of 2, specifically the smallest power of 2 that is greater than or equal to the requested value. The default capacity is 16, and the default load factor is 0.75. When the number of elements in the map reaches or exceeds the threshold, the array capacity doubles, and elements are re-inserted into the new array. This is why HashMap does not guarantee constant ordering.

An IllegalArgumentException is thrown if the load factor is less than or equal to zero or is not a number.

3. put Operation

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

Since HashMap allows null keys, it first checks if the key is null and performs special handling.

/**
 * Offloaded version of put for null keys
 */
private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}

A null key is always inserted at index 0 of the array. If an entry already exists at that position with a null key, its value is replaced; otherwise, a new entry is added. We’ll examine the addEntry function shortly.

Question: If a null key is always inserted into table[0], why do we still need to traverse the linked list?

Returning to the put function: if the key is not null, its hash value is calculated, and the indexFor function determines its position in the table.

/**
 * Returns index for hash code h.
 */
static int indexFor(int h, int length) {
    return h & (length - 1);
}

The indexFor function is brief but clever. Typically, mapping a number to a fixed length uses the modulo operator (%), i.e., h % length. However, this implementation exploits the fact that table.length is always a power of 2. In binary, a power of 2 is a 1 followed by N zeros, and (length - 1) is a sequence of N ones. A bitwise AND (&) operation is much faster and more efficient than modulo. However, this efficiency depends on length being a power of 2; otherwise, collisions would increase significantly.

This also answers the previous question. Since indexFor returns values ranging from 0 to (length - 1), it’s possible for non-null keys to also map to table[0]. Thus, we must traverse the linked list to find the exact key.

Once the index is determined, we iterate through the linked list. If the key exists, we replace the value and return the old one. modCount++ increments the modification count, which is used for the fail-fast mechanism (discussed later). If the key is not found, we call addEntry to insert a new Entry.

/**
 * Adds a new entry with the specified key, value and hash code to
 * the specified bucket.  It is the responsibility of this
 * method to resize the table if appropriate.
 *
 * Subclass overrides this to alter the behavior of put method.
 */
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

This function takes the hash, key, value, and the bucket index. It first checks if the current size has reached the threshold AND if the bucket at the target index is not empty. If both conditions are met, resize(2 * table.length) is called. Resizing reduces collisions and maintains access efficiency. The check for an empty bucket before resizing is an optimization to delay resizing as long as possible; if the bucket is empty, adding one element won’t cause immediate performance degradation.

/**
 * Rehashes the contents of this map into a new array with a
 * larger capacity.  This method is called automatically when the
 * number of keys in this map reaches its threshold.
 */
void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    boolean oldAltHashing = useAltHashing;
    useAltHashing |= sun.misc.VM.isBooted() &&
            (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
    boolean rehash = oldAltHashing ^ useAltHashing;
    transfer(newTable, rehash);
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

resize first checks if the table has already reached its maximum capacity (1 << 30). If so, the threshold is set to Integer.MAX_VALUE to prevent further resizing. Otherwise, a new empty array is created with double the previous length. (Note: since capacities are powers of 2, newCapacity will correctly stay within bounds until MAXIMUM_CAPACITY is hit). The transfer function then migrates the elements.

/**
 * Transfers all entries from current table to newTable.
 */
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

This function moves every entry from the old table to the new one. Since the array length has changed, the elements’ indices also change, which is why HashMap does not maintain ordering. Resizing is a resource-intensive operation, so it should be minimized.

Back in addEntry, after potentially resizing, a new entry is created:

/**
 * Like addEntry except that this version is used when creating entries
 * as part of Map construction or "pseudo-construction" (cloning,
 * deserialization).  This version needn't worry about resizing the table.
 */
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

createEntry uses the same parameters as addEntry. Its logic is closely tied to the Entry constructor:

/**
 * Creates new entry.
 */
Entry(int h, K k, V v, Entry<K,V> n) {
    value = v;
    next = n;
    key = k;
    hash = h;
}

The constructor takes an Entry object (n) and sets it as the next value for the new entry. Effectively, createEntry takes the existing list at table[bucketIndex], attaches it to the end of the new entry, and then places the new entry at the head of table[bucketIndex].

Entry Insertion

This concludes the put function. If a new key-value pair is inserted, it returns null.

4. get Operation

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 */
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

Like put, it first handles null keys:

/**
 * Offloaded version of get() to look up null keys. Null keys map
 * to index 0.
 */
private V getForNullKey() {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null)
            return e.value;
    }
    return null;
}

For non-null keys, it calls getEntry:

/**
 * Returns the entry associated with the specified key in the
 * HashMap. Returns null if the HashMap contains no mapping
 * for the key.
 */
final Entry<K,V> getEntry(Object key) {
    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

It calculates the hash, finds the bucket index, and traverses the list. It returns the Entry if found, or null otherwise.

5. Fail-Fast Mechanism

/**
 * The number of times this HashMap has been structurally modified.
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash). This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.
 */
transient int modCount;

java.util.HashMap is not thread-safe. If another thread modifies the map while an iterator is in use, a ConcurrentModificationException will be thrown. This is the fail-fast strategy.

This strategy is implemented using the modCount field, which ensures visibility of modifications across threads. modCount tracks the number of modifications made to the HashMap. During iterator initialization, this value is assigned to the iterator’s expectedModCount.

Note: Fail-fast behavior cannot be absolutely guaranteed. In an unsynchronized concurrent modification environment, it’s impossible to provide firm guarantees. Fail-fast iterators attempt to throw ConcurrentModificationException on a best-effort basis. Therefore, code should not rely on this exception for program correctness. Instead, fail-fast behavior should be used only to detect programming errors.

References

Core Java
Java API

About Me && Blog

Below are my personal details and links. I look forward to connecting and sharing knowledge with fellow developers!

About Me: Includes my WeChat and WeChat group links.
Blog Navigation: A guide to the content on this blog.
Curated Android Performance Articles: A collection of must-read performance optimization articles. Self-nominations/recommendations are welcome!
Android Performance Knowledge Planet: Join our community for more insights.