Bartłomiej Cupiał Profile picture
May 24 14 tweets 3 min read Read on X
So here's a story of, by far, the weirdest bug I've encountered in my CS career.

Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old roguelike game, that looks like in the screenshot. Recenlty, something unexpected happened. Image
We use a model by @JensTuyls that clones expert behavior on NetHack, and we improve it using RL methods. That model gets 5000 points and we finetune it in the game so that the score improves. However, suddenly in a recent run, Jens' model only got 3000 points. Quite a drop.
This problem is consistent between seeds so it's not just a fluke. Well, we probably screwed up something in the code for loading the model in the recent commit. Let's revert, no biggie. Except that after reversing to a version of the code from a few days back, we still get 3000.
Revert code a few weeks back? Still 3000 points. Luckily, the server we run our experiments on saves the files from the previous runs. We find the files corresponding to a run that previously got 5000 points, we re-run, and, well, it gets 3000. Nothing about the code changed.
We start suspecting our software stack. Thankfully, we use Singularity which means that our whole environment is in a single, self-contained file. That file hasn't changed for a few months, so that shouldn't be the problem. However, the container loads one thing from the server.
Namely, the CUDA libraries that allow us to compute things quickly on GPU. So we suspect that maybe something about these libraries changed that degraded the model. Because what else could have? And yes, recently the version was changed from 11.8 to 12.4.
The CUDA mismatch probably shouldn't impact the results in this particular way, but we see no other explanation. We override the version to 11.8 - we still get 3000 points. We build a new environment from scratch, for CUDA 12.4 - 3000 points. Welp.
We repeat the evaluation on a personal laptop. This is slow and expensive without the specialized hardware, but we make it work. Again, 3000 points. We disable multithreading, GPU, and some other things that have at least a conceivable chance of causing the problem - 3000 points.
By the point we've spent several hours on this, it's 7 PM. I am starting to feel like a madman. I can't even watch a TV show constantly thinking about the bug. Before going to sleep I decide to ask @JensTuyls, the author of the model, if he knows what might be broken.
Next day in the morning I see a lot of messages on slack. Jens replied "Oh yes, it's probably a full moon today."

What. Image
I check a moon phase calendar, and yes, it's a full moon today. Hands shaking, I start a new NetHack game, and the message says "You are lucky! Full moon tonight."

What.
So apparently NetHack has a mechanic that slightly changes how the game plays every time it's full moon according to your system clock: The player character is luckier, werewolves appear in their animal form, and the dogs howl ominously.nethackwiki.com/wiki/Time
It doesn't make the game harder, but the model hasn't seen full moon data in its training set, so the score drops. In this particular case, it drops from 5k points to 3k points. We override the time so it's not a full moon, we evaluate the model - and it's 5k points again.
The moral is, if you encounter an unexpected bug, be sure to consult lunar calendar. Big thanks to @JensTuyls for solving this for us!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bartłomiej Cupiał

Bartłomiej Cupiał Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(