My Authors
Read all threads
Tonight, we'll discuss some of the little known low level optimizations made to async/await in .NET Core 3.x! By now, most of you know what asynchronous programming is, if not look at docs.microsoft.com/en-us/dotnet/c… #dotnet
As a primer, remember I can write asynchronous logic in a sequential manner. The keywords async/await originated in C# 5.0 8 years ago, though there were inspirations in other languages that lead to this final syntax. I get sad when people think javascript invented async/await ;)
To see what the compiler generates, we decompile the async state machine back into C# to compare sharplab.io/#v2:D4AQTAjAsA…. The state machine keeps track of local state and all of the required context to pause and resume at each await point.
For my non .NET followers, Task = Promise. It's the return type used to represent a value that will be produced in the future.
The state machine starts off as a struct to avoid heap allocations if the methods runs completely synchronously. Because this method returns a Task<int> though, it still needs to allocate the result on the heap.
Allocations like these can add up causing more GC pressure which lead to poor application performance. To reduce these allocations for synchronous completion, we introduced ValueTask. This allows async methods that complete synchronously to avoid allocating on the heap
But what about allocations in the asynchronous case? Why do we care? Consider the below code that's trying to allocate parse some incoming payload off a socket and handling a message.
This is a long lived operation that lives for as long as a connection lasts. We want to reuse as much memory as we can because there will be a large number of these running concurrently (see the C10K problem en.wikipedia.org/wiki/C10k_prob…).
Lets look at how we optimized each of these allocations for long running asynchronous operations. First, the easy part, lets avoid allocation per socket read. You can read this epic post by Stephen Toub devblogs.microsoft.com/dotnet/underst… but TL;DR we gave ValueTask super powers
It's now possible to re-use state to removed allocations for repeated non-overlapping allocations. When you issue a Socket.ReadAsync, there's a new overload that will return ValueTask<int> instead of Task<int>, but there's more...
Lets go back to the state machine, what happens when you go asynchronous? The struct needs to now exist on the heap since the asynchronous operation is going to be kicked off and the current thread can be used for something else.
When the method goes async (that is, the awaiter isn't completed), there's a call into AwaitUnsafeOnCompleted, passing in the awaiter and the state machine (how often do you see ref this?!)
As an aside, "ref this" is to avoid copying this giant struct around. The state machine contains all of the will hoist locals into fields when they cross an await boundary so the initial struct can get big and copying around big structs is expensive.
Each state machine has an Async*MethodBuilder field that's responsible for doing a couple of things:
- Moving the state machine from the stack to the heap (boxing)
- It needs to capture the execution context (remember this!?)
- It needs to give the "awaiter" the continuation. This allows the awaiter to resume execution of the state machine when it's ready.

Side note: C# allows you to write your own Task-like types (invent your own promise type) and you can make any object awaitable.
Let's talk about awaiters for a minute. In C# you can make something awaitable by exposing a couple of methods on a type. You can see this by trying to await a number. It's looking for a method called GetAwaiter (there are a couple more methods you need to expose).
Awaiters look like the following. This is the contract that the compiler generated code and the framework use to interact with the awaiter. Somebody feeds a continuation action to the awaiter and at some point in the future, the awaiter will invoke it to resume execution.
You can use it manually like this. But who wants to write call backs? This is all machinery for async/await to use. The key point here is that awaiters are responsible for scheduling continuations. We'll see why that's important to the optimizations that were made.
How does this all tie together? Well the state machine contains some code with awaits and each awaiter will stash a reference to the state machine and call MoveNext whenever it's supposed to continue execution.
That code is a simplified version of what the compiler generates (that probably has bugs) but it shows the interaction. Now where does the overhead come from? There's a bunch of hidden code in the framework to make this all work.
First we introduce a FrameworkMoveNext, that tries to capture the execution context and execute MoveNext under that execution context. This is what makes async locals work. We also updated the original code to call into FrameworkMoveNext as the continuation.
Historically, state machines allocate a Task object for the result, it boxes the state machine to put it on the heap and it allocated a delegate to pass to the awaiter continuation. It would also allocate when scheduling the continuation to the thread pool.
Those initial 3 allocations were per state machine, stored once and reused, the 4th allocation was per queued work item (in this case every second). Lets talk about how we made the first 3 a single allocation in the 99% case:
In .NET Core 3.x we use the resulting Task allocation to store all of the state. We make a derived Task type that stores the result, state machine (without boxing), and we don't allocate the delegate yet... It was made lazy.
Remember this contract between the state machine and awaiters? Well the framework will detect well know awaiters and use a different code path. The below code isn't what happens when you use Task or ValueTask. The runtime does a slight of hand to avoid delegate overhead.
So what happens? When you await a Task within a async Task method, the continuation object is set to this special internal object that holds all of the state. This same object can be scheduled directly to the thread pool without allocating, it becomes the thread pool work item!
This allows the "state machine box" to flow all the way to the thread pool directly without incurring a new allocation. What we have is a schedulable object that maintains the state and can be reused over and over. Pseudo code looks like this:
This optimization also works for custom IValueTaskSource implementations and Task.Yield(). If you combine this all together, you get:
- Reusable allocations for things like socket reads and writes
- Reusable state machine for scheduling
- Less allocations per state machine
Performance tips, big state machines that are re-used over and over are much better (at least for now) then lots of small finer grain state machines github.com/dotnet/runtime…
Custom schedulers and sync context disable the scheduling optimization (which also result in a delegate allocation)! Avoid them in your high performance code.
aaaaand I'm done. That was way too long
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with David Fowler #BlackLivesMatter

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!