Thanks to everybody who participated! Let's dig into this puzzle.
We have a HashSet<StringBuilder> here (which has a HashMap inside), we fill it with some objects, then we add one more StringBuilder (sb), and it's correctly added (contains() returns 'true'). Next, we modify the sb contents and contains() returns 'false' now. How it's possible?
Usually, it's a bad idea to mutate the objects used as hash table keys if the hashCode depends on the changed state. In this case, the updated hashCode will point to the wrong bucket, and the hash table won't be able to find the corresponding entry.
However, StringBuilder never overrides equals/hashCode, so it has the identity equals and the identity hashCode, which doesn't depend on content. This way, regardless of the updates, we will always find the proper bucket, and contains() will always return true. Isn't it?
Well, since Java 8 the HashMap implementation was updated to build a red-black tree if we have 8 or more elements inside the single bucket to avoid linear scan. For this, it should be possible to compare elements.…
To compare elements, HashMap uses first complete hashCode (not only bits used to identify the bucket), then if elements are Comparable and have the same class it uses compareTo method. Then it uses runtime class name and identityHashCode as the last resort.
This procedure was still stable for StringBuilder in Java 8..10, as StringBuilder was not comparable. However, in Java 11 it was improved and it actually implements a Comparable interface that compares two StringBuilders by content.…
So, if we actually have many collisions in HashSet/HashMap that uses StringBuilder as keys, it may rely on compareTo() and fail if the StringBuilder content was changed since the StringBuilder was added to the collection, as we may take the wrong route inside the r-b tree.
To make this happen, we need to generate some collisions and force HashMap to treeify the bucket. The identity hash codes are just random numbers, so you need to create a number of StringBuilders until you find one with the desired bucket number. Image
hc ^ (hc >>> 16) is the formula used in HashMap implementation to improve the bucket distribution when low bits change not very often. A bucket number is just several lower bits, hence & 0x3F…
So this code will generate StringBuilder objects that always fall into the first bucket inside the HashMap having the capacity of 64 (0x40). It's the minimal treeification capacity.…
So if we generate about 40 of such StringBuilders, we will have a pretty treeified HashMap. However, compareTo() won't be called, unless we have exactly duplicate hashCode(). Luckily, we need only one exact duplicate and it can collide with any of 40 objects we already have.
Putting this all together, we can generate a list of 40 StringBuilders that go to the same bucket. Let's also sort them by hashCode to simplify the search. Image
Next, let's find a single collision: Image
That's it. We initialize the first 40 StringBuilders with "a" to make them lexicographically bigger than the empty one. However, when we append "oops", their order changes, and compareTo() will return the different result. Here's the complete solution…
A similar solution was proposed by @AndreiPangin who just generated 10,000,000 StringBuilder objects and find the collision between them. His solution works faster!
Another solution was proposed by @quydxnguyen but he looks for collision with exact StringBuilder hash, rather than any hash which makes the things slower. But he parallelized the search for speedup!
One more solution is submitted by @heinzkabutz who used a bigger table than me and did a trick with an anonymous class static initializer to write some imperative code. Good job, Heinz!
The main question that this puzzle poses: is this ok to mutate StringBuilder objects used as HashMap keys/HashSet elements in production code? As @VladimirSitnikv suggests here, you should test your code with -XX:+UnlockExperimentalVMOptions -XX:hashCode=2
This option makes the identity hashCode of all the objects the same. So with this option, your code will fail without special preparation of StringBuilders. On the other hand, is it a big chance that many StringBuilders will actually fall into a single bucket?
After all, when designing programs we often assume that something unlikely is impossible (the whole Git thing relies on the absence of SHA1 collisions). Can we consider the program to be correct if it uses mutable StringBuilders in hash tables? How long will it run until failure?

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Tagir Valeev

Tagir Valeev Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!