Skip to content

Conversation

@ytatichno
Copy link

According to https://issues.apache.org/jira/projects/COLLECTIONS/issues/COLLECTIONS-878

Total summary:

  • Precompute initial capacity for new HashMap in MapUtils.invertMap to avoid resize during following insertion.
  • Implementation same as JDK's HashMap static factory method.
// from HashMap (since 19)
static int calculateHashMapCapacity(int numMappings) {
    return (int) Math.ceil(numMappings / (double) DEFAULT_LOAD_FACTOR);
}

Why 0.75d:

  • It is DEFAULT_LOAD_FACTOR casted to double
  • load factor is being casted to double in HashMap.calculateHashMapCapacity
  • For very big maps we rarely can encounter lack of float precision and have extra resize.

We have extra resize for 50% of Maps. In example:
Let's inspect for maps with size on [4, 16].

  • On size 4 will be allocated map with size 4 and when the 4th element will be inserted the HashMap will be resized.
  • On sizes 5, 6 will be allocated map with size rounded to 8 and there will not be any resizes.
  • On sizes 7, 8 will be allocated map with size rounded to 8 and there will be resize when the 7th element will be inserted.
  • On sizes 9, 10, 11, 12 will be allocated map with size rounded to 16 and everything will be ok
  • On sizes 13, 14, 15, 16 will be allocated map with size rounded to 16 and there will be resize on 13th element insertion.

To sum up:
For each power of two we have segment from 50% to 100%. Because 50% is previous power of two. And for each initial size from 50% to 75% (DEFAULT_LOAD_FACTOR) everything ok, but for sizes from 75% to 100% we have extra resize. So we have 50% of probability to have extra resize.

@garydgregory
Copy link
Member

garydgregory commented Oct 16, 2025

Hello @ytatichno

-1: You are confusing initial capacity and load factor. Step through the debugger for MapUtilsTest.testInvertMapDefault() and MapUtilsTest.testInvertMap() and you'll see that the resulting map of the invertMap() call has the same table size as the input and a load factor of 0.75. This is true even if the input table has a load factor of 1.00. The parameter to HashMap(int) is the initial capacity.

@ytatichno
Copy link
Author

ytatichno commented Oct 16, 2025

Thank you for your instant feedback!

I fully agree that the constructor parameter of HashMap(int initialCapacity) is not the same as the table size and that load factor remains constant (0.75) regardless input map's loadFactor.

The optimization I proposed is not related to the load factor value itself, but to avoiding a resize that can happen when the map is filled exactly up to its declared size.

For example, when using new HashMap<>(map.size()), the internal threshold becomes (int)(map.size() * 0.75).
Therefore, after inserting all elements from the source map, in about 50% of practical cases the threshold is exceeded and an extra resize occurs.

Using Math.ceil(map.size() / 0.75d) as initial capacity avoids that resize and results in a table that is sized exactly to hold all entries without rehashing.

This is also consistent with the JDK’s approach in HashMap.calculateHashMapCapacity (introduced in Java 19).

Here it is:

    /**
     * Calculate initial capacity for HashMap based classes, from expected size and default load factor (0.75).
     *
     * @param numMappings the expected number of mappings
     * @return initial capacity for HashMap based classes.
     * @since 19
     */
    static int calculateHashMapCapacity(int numMappings) {
        return (int) Math.ceil(numMappings / (double) DEFAULT_LOAD_FACTOR);
    }

    /**
     * Creates a new, empty HashMap suitable for the expected number of mappings.
     * The returned map uses the default load factor of 0.75, and its initial capacity is
     * generally large enough so that the expected number of mappings can be added
     * without resizing the map.
     *
     * @param numMappings the expected number of mappings
     * @param <K>         the type of keys maintained by the new map
     * @param <V>         the type of mapped values
     * @return the newly created map
     * @throws IllegalArgumentException if numMappings is negative
     * @since 19
     */
    public static <K, V> HashMap<K, V> newHashMap(int numMappings) {
        if (numMappings < 0) {
            throw new IllegalArgumentException("Negative number of mappings: " + numMappings);
        }
        return new HashMap<>(calculateHashMapCapacity(numMappings));
    }
    ```

I can add a short test showing the extra resize if you think it would clarify my intentions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants