Profile picture
Gravis McElroy @gravislizard
, 39 tweets, 8 min read Read on Twitter
OKAY
SO
I DIAGNOSED THE SNES9X NETPLAY CRASH BUG
THE ROOT CAUSE IS FUCKING DUMB
first, I will regale you with some details to titillate your mind.
so, when writing your own layer 7 network protocol, the procedure is:
1) take three shots of whiskey
2) finish writing your will
3) get out the binary calculator
because you gotta get everything perfect
with, say, HTTP, the work has been done for you. if you're writing SOAP or whatever you're past layer 7. you're writing layer 7+7. you're doing everything with unlimited length strings.
I've worked on a few C programs before that had their own protocols. The problem is you don't know when you're done getting input; recv requires you to specify how many bytes you're /expecting/. How can you know that without having already received data successfully?
Now - I'm not a professional. I barely know what I'm talking about. Trouble is, neither do a lot of other people, and unlike me they actually start projects and release them, as is the case here. So snes9x has Problems with packet size.
So the initial failure mode was this: I would start the game server and connect three other clients. In server mode, the host opens a session *to itself* (quake style), so that's a total of four clients. This worked perfectly.
Then I would connect a fourth client (so five connections including the host) and the following would happen:
- the host window would disappear
- all four other windows would report "Bad magic number in server heart beat"
I am not a professional programmer, but I have been coding on and off since I was twelve. I have a decent amount of expertise in narrowing down a problem. I knew this was a protocol error, because it was crashing both ends. The server had to be doing something wrong.
If the server had a simple local bug, like an unhandled exception causing it to crash, the clients wouldn't report that error. They would hang for an indefinite amount of time, the socket would time out, and they'd shut down the connection. Maybe crash at that point.
It's possible the server could have sent a malformed packet due to memory corruption and then exited, but unlikely. My presumption is that the server was working "correctly", sending data the clients couldn't understand.
The fact the server was closing was unusual - why was there no error? Well, as soon as i ran the server with a debugger attached, Visual Studio notified me of heap corruption. Stared at the server code and realized they underallocated a buffer. Easy fix there.
After I applied that, the server didn't crash anymore but the clients were still crashing. So now I turned around and attached the debugger to the client. No exceptions in the client - whatever the issue was, it was being handled. So, breakpoints
I was stunned at how patient the process was. The server just let my client sit, not responding to TCP, for *minutes* while I single-stepped through the code. It was actually really cool to debug like that. And I found my problem
Mind you, there is NO protocol documentation, so i had to reverse engineer it. This protocol is custom so Wireshark wasn't a ton of help, just generic TCP packets with "Data". I set a breakpoint in the "receive data" routine and then stepped through as each packet came in
Each time a packet was received I copied the buffer contents to a text file, then stepped through the client code to see what code paths it followed for that packet, and labeled each one in turn until I understood the connect and sync procedure.
The process goes:
establish TCP connection
server sends one packet instructing client to load a ROM
server sends one packet containing the SRAM dump for the game
server sends one packet telling the client to reset the game
server begins sending joypad updates
The status of each joypad is packed in 4 bytes, each bit representing the state of up, down, left, right, a b x y l r select start and I think a little more for mouse, light gun etc. So the joypad update packet length is 4 bytes times the number of possible players
NP_MAX_CLIENTS is 8. I noticed the packets in wireshark changed size if you added more players, and that instantly tells us that the client /has to be/ reading a header value to find out how long the packet is supposed to be in order to know how many bytes to receive.
I stared at the packets in wireshark for a long time, and the packet headers I'd dissected in notepad++ for a long time, and nothing was coming clear. I literally spent... 12 hours on this? kept chasing completely incorrect conclusions.
Finally, I'm staring at the code, and - I swear to god, you know that bit about the dude running into a tree in the desert, the /only tree in a ten mile radius?/
this is one of, literally, three comments in the entire netplay.cpp. I had no idea how to parse it earlier because I didn't know where the opcode sat in the packet so I was just ignoring it. But I had figured that out, so I stared at this a little longer and YOU PROBABLY SEE IT
joypad data count
is the top two bit
the top
TWO
BITS
WHICH STORES A MAXIMUM
OF FOUR VALUES

MEANING AT FIVE PLAYERS IT FUCKING OVERFLOWS AND WRAPS AROUND
they fucking didn't allot enough room in the protocol format for more than four players worth of joypad data. and that's why, when i looked at the packet that was killing my client, it inexplicably had joypad data instead of a header. it was reading the REST of the first packet.
this can't be practically fixed. the only way to do it is to completely rewrite the protocol, which I wasn't willing to do. well, i could also have done something like extended the length of the server reset packet and packed in a client count but you know what's easier
hardcoded both ends. server always sends 8 structs, client always decodes 8. who cares. done. christ almighty. works now. having a bug due to an off by one i have to fix but i had it working perfectly with five players earlier.
in 2018 we don't need to save 27 bytes in each packet It'll Be Fine
I guess if you want a moral to this story: never assume the person before you did a reasonable job. There could be gaping holes in their logic. Nothing precludes it, so don't let yourself believe assumptions like "the protocol is well-defined and fundamentally works"
Another thought though: tcp delivery order is guaranteed afaik, so, why didn't they choose to define a "set player count" opcode and send that packet only during state changes? That would have saved effort in several places and introduces no flaws I can think of.
SORRY SORRY I HAVE ANOTHER BUG TO SHARE, SORRY
So, alright. Here's the original code that packs the joypad info into each update packet. For each connected client n, push one Joypad struct into the data buffer.
So, we patch this to always send eight structs. It takes me four tries of course because - yes, I FULLY understand why we have to zero-index, but this is the oldest programming problem and nobody will ever be proof against it.
But curiously, I find this crashes, complaining that the data object (to which *ptr is a pointer) was corrupted on the heap. That's odd. That was the first thing I fixed. See, it originally looked like this:
I updated it to this, which I know to be defined as 8. So what gives?

You probably already see it.
<adds two parens>
unroll
THERE'S EIGHT SIIIIIMULTANEOUS KONGS
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Gravis McElroy
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!