We’ve pushed the silicon so hard that silent data corruptions (SDCs) are no longer a theoretical problem.
Mercurial Cores are terrifying because they don’t hard-fail; they produce rare, but *incorrect* computations!
*When* exactly the problem occurred is hard to pinpoint.
The possibility was brought up at the Dependable Systems and Networks conference in 2008.
The first real SDC disclosure happened in 2021 with Meta. Google and Alibaba also confirmed later.
Perhaps more terrifying is that cores can *become* mercurial over time.
Chips are pushed so hard that electromigration aging can make compute “more wrong”.
No one knows for sure what process node started the phenomenon...but it's statically likely to be 14nm or 7nm.
For decades SDCs were considered a myth, only caused by cosmic rays and such.
We’ve only just entered the magic era where:
1. silicon is being pushed HARD 2. hyperscalers are getting SO big the issue is statistically visible
Humanity is *just* starting to enter an era of needing to accept potentially imperfect compute.
Current metrics imply about ~1 in 1000 CPUs are mercurial. Every indicator is pointing towards the issue getting worse in the future.
A good (broad) overview about the subject is available from IEEE Computer magazine here, although I’d also encourage you to search the Meta, Google, and Alibaba research on the subject: computer.org/csdl/magazine/…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
If you take a picture of a Raspberry Pi 2 with a strong flash it will reboot.
A specific power regulator (U16) was chip-scale packaged to save on cost and die space.
Since the silicon is basically naked, a xeon flash can cause a massive (but very short) current spike.
Naked silicon (specifically, WLCSP) isn’t “bad” per se; it’s heavily used in mobile phones.
The thing is…phones are usually sealed. The Pi is an exposed development board.
Don't blame the engineers too hard, Apple actually had a similar issue with the iPhone 4 (back glass).
The fix for the RPi is a bit obvious of course.
either:
1. don’t do that (take pictures with high powered flash inches away) 2. if you must…put a little blu-tak, nail polish, or other opaque inert substance on U16