Last weekend was Tekken 8 beta and many gamers were talking about rollback netcode (Tekken finally has it). I used to write network code in the past, including rollback netcode, so I will explain how it works in fighting games now...
I will be limiting the discussion to deterministic games running at fixed step rate (60 simulation steps per second is the most popular choice). Deterministic simulation means that all clients agree 100% on the game state every frame, except in case of rollback of course...
Let's start with no rollback netcode for deterministic games first. In order to simulate one frame, you need all player inputs (commands) for that frame. Since there's network latency between player devices, each device can't know the other's commands immediately.
To make deterministic network play possible, we delay each command. The delay must be at least equal to the network latency. Both players send their commands instantly to the other player. Each command has a time stamp (frame index). The command is locally also delayed.
Now both players execute both their and their opponents commands at the same frame. If you have 50ms latency in 60 fps game, you need to delay each command by 50ms / 16.6ms = 3 frames. This is still playable. Even single player game engines tend to have 1-2 frames of display lag.
There starts to be issues when the latency increases. This increases the input lag and it becomes harder to play the game. Timings of combos change and reacting to moves gets difficult due to the added lag.
Rollback means that player commands can be added to past frames. The game engine must be able to return the game state to that frame, add that command and then catch up. Catch up means running N logic frames immediately in one frame. This must be fast to avoid frame drops.
Let's say a command arrives 4 frames late. The local game state is reverted to 4 frame old state. That command is added. Then the game immediately ticks 4 logic frames to catch up. Now the state is corrected and matches the other player. Both have identical game state.
Without rollback we must delay all commands by N frames, where N must be greater or equal to the latency. If latency prediction fails, the other player must wait for that command to arrive, which you see as slow down / stutter.
With rollback, we don't need to add any delay. Your own character feels identical to local single player game. Just frame inputs work perfectly. Animation looks perfect. There's no slow down, stutter or timing issues...
But rollback is not magic. The latency is still there. The other player will receive your commands late. With 50ms latency, they receive your commands 3 frame late and insert them to the past frame. This causes various issues...
The opponent will see their own movements perfectly too, but your movements will start 3 frames late. The most trivial rollback implementation simply skips the 3 first frames of opponent animations due to catch up. Enemy moves look less good. If latency is long, you see warping.
Fighting games allow buffering moves. During the recovery frames of previous attack or previous movement (dash, step) you can input the next move to make them frame tight. This works well with rollback. As long as you buffered 3 frames before the move ended, it looks perfect.
-> Opponent combos tend to look perfect, unless the combos require time specific cancels or similar that can't be buffered. Also spamming attacks after each other looks perfect. Opponent presses a jab (+1 block) and then buffers a df1 (i13 mid) looks perfect in Tekken.
Most of the rollback warping issues are caused by movement in Tekken. Tekken allows canceling dash and sidestep animations instantly with a crouch, dash or step. These commands arrive late to opponent and cause visible warping (depending on latency).
However, this is mostly a visual issue in Tekken. This is because dash/step can always be instantly cancelled to block or crouch block. People can't commit punishes to movement as movement is hazy and can be instantly cancelled to block.
Also in Tekken, movement tends to look glitchy even in offline games, because people cancel movement animations repeatedly all the time to maximize their movement speed. Couple of frames of animation warping doesn't really matter. It's visually a bit worse, but...
But the game logic still works the same. You see a dash and you have to guess whether opponent cancels it to block or step or not. You simply see the result couple of frames late. In Tekken fastest attacks are 10 frames. 10 frames is 166ms, which is already a massive latency...
In majority of cases the opponent block/step/crouch rollback happens before your 10 frame jab hits. You see correct block animation. However if they block in the last jab frames, you actually see hit sparks and then rollback immediately removes hit sparks and you see block anim.
This can be problematic in some scenarios. For example some sequences require hit confirming, and Tekken players use hit sparks for that. Your brain registers the hit spark and you input your followup. But due to rollback the opponent actually blocked the move...
For example the Mishima 1,1,2 jab is hit confirmable. You input 1,1 and input 2 if you see the hit spark of the first jab. The last hit is launch punishable on block. If you hit confirm wrong, you can lose 50% of your life. It's a big deal.
To make rollback less glitchy, game developers usually still use delay with it. You delay the input by a small amount of frames. Let's say 2 frame delay is still unnoticeable by the players. You pick that as your network latency, and use rollback to solve longer latencies.
This still gives both players predictable latency for all their inputs. It's just a bit longer latency than single player. Combos feel the same all the time and timing doesn't change radically. And there's no stutters or slow downs.
If the network latency is less than the rollback frame delay, then there's zero glitches. If the network latency is more, then you see some rollback, but 2 frames less (since we added constant delay of 2 frames). It's a good compromise.
There are techniques to reduce the visual glitches of the rollback. The most common is to interpolate the animations. Instead of skipping N first frames of the opponent animation, you play the start of that animation a bit faster to catch up with the delay.
For example, if you have 20 frame animation, and rollback 3 frames. Now your animation must complete in 17 frames. You could play the first 3 frames at double rate. This avoids the visible warp. Player can barely see the difference.
Let's discuss some special situations in Tekken: Opponent whiffs a big move and you whiff punish. Let's say their move is 20 frames to impact + 30 recovery frames = 50 frames. You play on very big 10 frame lag (166ms).
You see the move 10 frames later. Game interpolates the start of the animation faster. You have 40 frames time to react + land your whiff punisher. This is tight, but still doable. If the latency is even higher and the move is faster, then it becomes impossible of course.
Let's talk about seeable lows (snake edges in Tekken terminology). These low attacks are usually 27-30 frames and designed to be reactable. In Tekken 7 they are hard to react online, causing people to complain about these.
Rollback netcode doesn't really help with seeable low attacks. Let's say network latency is 3 frames. With traditional delay netcode, opponent's low attack starts 3 frames later. You see it fully and try to crouch block it. But your crouch block is also delayed 3 frames.
This means that 27 frame seeable low must be blocked on frame 24, which is very hard to do. Rollback netcode works differently. Opponent does the low attack, you get it 3 frames late (warp or interpolate), and now you have 24 frames time to react. But your reaction is instant.
The result is that in both cases you only have 24 frames of time to react to the unseeable low attack. These "Snake Edge" attacks in Tekken are usually launchers which result in big combos. Rollback netcode doesn't solve this. Snake Edges continue to be hated by online community.
Next situation is close range poking: Player presses jabs and fast mids and plus on block moves. They buffer followups. This is all water tight with rollback netcode and looks perfect, even in high latency situations. Feels like playing offline.
Extension to the above: You can press a move and step. Step can't be buffered in Tekken. However with rollback netcode the step still looks perfect locally. Opponent sees the step animation a few frames late (warp/interpolate)...
Opponent seeing the step a few frames late doesn't really matter. If they pressed a linear attack they whiff and you can pushish them. Steps are too fast in Tekken to be reacted to, and even if you could react to a long step, opponent can always block. You can't commit.
Dash block in Tekken works perfectly too with rollback. For the player doing it, it feels like offline. For the receiving player, they see the block a few frames late (+small warp). If they pressed a button it gets blocked if opponent cancelled early enough. Timing mix works.
That's it mostly. If you have any questions please ask.
Addition: Projectiles and particle effects. These are difficult for rollback. The worst offender is hit sparks in Tekken. Some characters also have projectiles. You have to be able to track which event spawned particles/projectiles and delete them too.
Rollback will delete the particle effect or projectile and then recreate it, if that effect was spawned during the rollback period. Effects must also be stepped many times per frame to catch up. Which might be a performance concern. If you don't do it, you see more glitches.
Rollback is often only used in games with small amount of state and simple game logic. Such as fighting games. Rollback requires running the whole game logic multiple times per frame (could be 10 times), which means that game logic code must be very fast.
I implemented rollback into RTS game. It's definitely doable for more complex games too. We had up to 30 frame rollback window too. You just need very well optimized deterministic game logic. Rollback netcode makes writing game logic harder and limits what you can do.
If you want deterministic rollback netcode, you have to design your game logic for it from the beginning. Adding it later is hard. You need very good debug tooling to detect causes for desync. It's even harder than deterministic netcode to keep in sync.
Nowadays most RTS games use time boxing instead of rollback. It's much simpler. Basically in time boxing you measure latency and set time box size = latency. You always queue commands to the next time box. Good enough for RTS, but in fighting games you need frame time stamps.
And now some technical info for programmers: If you have a lot of state, you can still implement rollback. Instead of storing the snapshot every frame (N buffering everything), you want to separate the state of each game object and have a list of state per object.
For all active objects, you store their state every frame. This allows you to rollback to any frame and return the active objects to that state. Same works for destructible environment. They have state change commands too and remember multiple states (for the rollback duration).
Worth noting that if the state doesn't change every frame (or there's a trivial animation that can be reset to frame N), then you only store the actual state changes. And you rollback to the last of those states + set the animation frame correctly. This saves a lot of state.
@marcsh Of course you could cheat by running a script that automatically blocks all enemy low attacks for example. But that's just augmenting the player. The script must do proper game inputs too. Otherwise it will desync.
@marcsh Also with rollback you could write a script that automatically counterhits all opponent moves. You see opponent move coming, the script insert a faster attack command just before it. Assuming the rollback window allows that.
@marcsh But that can also be done with delay netcode. Script sees a move coming, it inputs a faster move or a evasive (crush) move that beats that particular move. Or a parry or similar. If you run script with super reactions that does normal game inputs that will win always.
@marcsh Tekken has a reporting system for cheaters and you can definitely implement automation for this too. Of course script kiddies can improve their scrips to make them more human like (pro player reaction instead of immediate and failing randomly). It's an arms race.
@marcsh In Tekken the big tournaments are all offline. If you want to make money then you can't cheat. Online is just training for Tekken professionals. All what matters is offline performance in big tournaments. Cheating in training mode isn't really that big of a deal.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
When I was designing V-buffer style renderer in 2015 I was a bit concerned about having to run vertex shader 3 times per pixel. People might say that this is fine if you are targeting 1:1 pixel:triangle density like Nanite does, but that's cutting corners...
If you look at a generic triangle grid (like terrain on highly tessellated object surface), you have N*N vertices and (N-1) * (N-1) * 2 triangles. Shading vertices once and sharing the results costs N^2. Shading 3x vertices per triangle costs 6x more. That's a significant cost.
Let's talk about fast draw calls on bottom 50% mobile devices. And let's ignore the bottom 10% so that we can assume Vulkan 1.0 support = compute shaders.
Why are we not doing GPU-driven rendering? Why instancing is not always a win?
The first thing I want to talk about is memory loads on old GPUs. Before Nvidia Turing and AMD RDNA1 all buffer loads on these vendors were going though the texture samplers. Texture samplers have high latency (~100 cycles). The only exception was uniform buffer & scalar loads.
In Pascal and below 32 bit RGBA unfiltered texture load was half rate. If you load a 4x4 fp32 matrix from memory, you basically pay equivalent TMU cost of sampling eight RGBA8 textures. And in addition to this, you need 16 vector registers to store the load results.
Splatting gaussians instead of ray-marching. Reminds me of particle based renderer experiments. Interesting to see whether gather or scatter algos win this round.
C++20 designated initializers, C++11 struct default values and a custom span type (with support for initializer lists) is a good combination for graphics resource creation:
Declaring default values with C++11 aggregate initialization syntax is super clean. All the API code you need is this struct. No need to implement builders or other code bloat that you need to maintain.
C++20 span type doesn't support initializer lists, so you have to create your own. This is because initializer list life time is very short. Easy to use a dead list. I use "const &&" in the resource creation APIs to force a temporary object.
Managed to generate binding ids for the generated GLSL shader for GLES3/WebGL2 using SPIRV-Cross API.
GLES doesn't have sets, so I must generate a flat contiguous range of binding ids per set and store the set start index. Runtime binds N slots at a time (bind groups).
I also must dump a mapping table for combined samplers in the shader. Our renderer has separate samplers and images.
Our textures and bind groups are both immutable. So I can just store 2x GLint per combined sampler. This is 64 bits per combined sampler. Easy to offset allocate them all in a big buffer. Bind group has start offset and count.
I was talking about the new DOTS hybrid renderer GPU persistent data model 2 years ago at SIGGRAPH. We calculated object inverse matrices in the data upload shader, because that was practically free. ALU is free in shaders that practically just copy data around.
On mobile memory bandwidth is a big bottleneck, and using it wastes a lot of power. Thus I prefer to pack my data and unpack in shader. That's usually just a few extra ALU, but you get big bandwidth gains. Performance improves and perf/watt improves.