Saturday, August 21, 2010

Collection Initialization

Early in my career, fresh out of college, I inherited the code for one of our tools that ran on a nightly timer. I was in charge of enhancing part of the application to add some functionality.

Fresh out of college I remembered that initializing hash maps size to just over the desired size leads to optimal hashing. Basically without proper sizing everything is hashed to the same node in a linked list which defeats the purpose of a hash map.

This tool used to take around 8 hours which was scheduled to run at night.
So you might think the following:
1) So what if it is more efficient
2) It re-sizes itself so why bother doing it myself

After I initialized the the hash map to the appropriate size the tool went from running 8 hours to running in less than an hour. That's a 7 hour gain.
After this I realized that:
1) Collections take time to dynamically re-size.
2) Collections will constantly be resizing as they run out of room.

Now that the tool took less than an hour it could be run a couple times during the day ad-hoc. When changes were needed to the tool you could verify the results in a manageable time period. When the job failed for whatever reason the whole task could be restarted and do two days of work in 2 hours instead of 16 hours.

Suggestions as to when you should initialize collections:
1) If you are going to put more than 100 objects in it.
2) If you are a cautious/defensive developer like me.

I'm not as worried about getting the optimal size as long as you are relatively close.

No comments: