I think people really don't appreciate just how incomplete Linux kernel API docs are, and how Rust solves the problem.

I wrote a pile of Rust abstractions for various subsystems. For practically every single one, I had to read the C source code to understand how to use its API.
Simply reading the function signature and associated doc comment (if any) or explicit docs (if you're lucky and they exist) almost never fully tells you how to safely use the API. Do you need to hold a lock? Does a ref counted arg transfer the ref or does it take its own ref?
When a callback is called are any locks held or do you need to acquire your own? What about free callbacks, are they special? What's the intended locking order? Are there special cases where some operations might take locks in some cases but not others?
Is a NULL argument allowed and valid usage, or not? What happens to reference counts in the error case? Is a returned ref counted pointer already incremented, or is it an implied borrow from a reference owned by a passed argument?
Is the return value always a valid pointer? Can it be NULL? Or maybe it's an ERR_PTR? Maybe both? What about pointers returned via indirect arguments, are those cleared to NULL on error or left alone? Is it valid to pass a NULL ** if you don't need that return pointer?
Sometimes these requirements were reasonable, just unwritten. Sometimes they were a bit too flexible/wild and I had to make some opinionated decisions when writing the Rust abstractions to narrow it down to a safe usage.
Sometimes I had to add extra locking inside the abstraction in order to make it practical to make safe. Sometimes I had to make some small changes to the C side to make it more orthogonal or logical and usable, e.g. to expose an unlocked function to be used with a lock taken.
Sometimes the locking was subtle enough that while I was able to write a safe Rust abstraction, it came with a big doc comment explaining how you have to be careful with usage and drop order to avoid deadlocks (deadlocks are "safe", Rust doesn't inherently protect against them).
Sometimes it was a lost cause without making the C side more reasonable (drm_sched only, really).

However, most of the time the compromises made when writing the Rust abstraction point at issues with the C side design and how it could be improved.
In general the approach is "write the Rust side making as few changes as possible to the C side first to avoid conflict, then maybe propose changes to the C side based on lessons learned" (we haven't really gotten to the second part yet at all).
But the end result of all this is that you CAN, in fact, just look a the Rust API and know how to use it correctly for the most part. You never have to worry about reference counts, about NULL pointers, about forgetting to check results, about dropping refs in error cases.
You never have to worry about holding the right locks, about accidentally forgetting to take a ref or dropping it twice. You never have to wonder how error returns are encoded.

Because if you make a mistake with these things, your code won't compile.
Of course you can still misuse APIs, but the worst that will happen is that you'll get an error return, or maybe a deadlock (deadlocks are easy to debug with lockdep and I wrote a really neat Arc<> integration to catch potential drop/decref related locking errors).
Even with APIs that mostly are fairly rigorously documented (OpenFirmware/Device Tree comes to mind), following all the rules in C is often tedious and error prone. Look at some random OF code in a driver and there's a good chance it leaks references.
(This doesn't really matter for most systems since they don't compile kernels with OF_DYNAMIC so ref counts are ignored, so this never gets noticed and fixed.)

But with my OF Rust abstractions? They do ref counting for you. You can just forget about it.
In the end, coding kernel code in Rust is a huge change from coding C. With C you have two options:

- Wing it and either hope reviewers catch it or suffer debugging subtle oopses
- Spend hours understanding the code before you dare use it, and hope you caught everything.
This adds extra reviewer and maintainer workload too! It means that they need to review submissions to ensure they follow all these hidden rules that aren't documented. Sometimes they miss things. Sometimes the problem is major enough the code needs a big refactor.
All that just goes away with Rust. Poof. Gone. If it compiles it's safe and won't oops or leak references (except unsafe code, but then you only have to review THAT and the rule is it has to be carefully documented).
Of course we still need code reviews, and help from experts in specific subsystems. Rust doesn't magically make code perfect.

But it does get rid of all the silly low level problems and mistakes, so you can focus on the high level ones.
To be clear, I don't blame Linux developers for the incomplete docs. For better or worse, the Linux kernel is very complex and has to deal with a lot of subtlety. Most userspace APIs have much simpler rules you have to follow to use them safely. Kernels are hard!
Even experienced kernel developers get these things wrong all the time. It's not a skill issue. It's simply not possible for humans to keep all of these complex rules in their head and get them right, every single time. We are not built for that.

We need tooling to help us.
The solution is called Rust. Encode all the rules in the code and type system once, and never have to worry about them again.

Just like the solution to coding style arguments is to encode all the rules in an auto formatter and never have to worry about them again (hint hint! ^^)
And then we can stop worrying about all the low-level safety, ownership, and locking problems, and start worrying about more important things like high-level driver and subsystem design.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Asahi Lina / 朝日リナ 🐘 @lina@vt.social 🦋 @lina.yt

Asahi Lina / 朝日リナ 🐘 @lina@vt.social 🦋 @lina.yt Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LinaAsahi

Oct 17, 2024
Almost every thread about Rust for Linux ends up with someone saying "why not Zig instead"? And usually the answer is just "it's less mature" or "nobody pushed it".

I didn't know anything about Zig, so I decided to take a look today... and I'm not very impressed ^^;;
I learned that Zig does not have:

- Destructors
- Macros
- Lifetimes
- Operator overloading

Those are major reasons why I chose Rust for the drm/asahi driver...

It sounds like Zig is trying to be "modern C"... but the whole point of R4L is to not get stuck with C!
All those things Rust has that Zig doesn't are important for the things I'm doing.

Destructors/RAII are fundamental to how the driver tracks and cleans up firmware structures safely and reliably when needed. If I had to write "defer" everywhere it would be a bug-prone mess...
Read 14 tweets
Sep 1, 2024
Honestly, I'm kind of sad about Wedson leaving RfL. He developed a huge part of the foundation that made Rust for Linux possible.

I'll still work on DRM (except sched) and driver upstreaming when the core stuff is in place, but I don't know about other subsystems.
At the rate things are going, I wouldn't be surprised if upstreaming the drm/asahi driver isn't possible until 2026 at the earliest. I had hopes for things to move much faster, but that's not possible without active cooperation from existing maintainers, and we aren't getting it.
Reading upstreaming mailing list threads is painful. Every second comment is "why is this not like C" or "do it like C". Nobody is putting any effort into understanding why Rust exists and why it works. It's just superficial "this code is scary and foreign" type reactions.
Read 8 tweets
Aug 29, 2024
I regretfully completely understand Wedson's frustrations.



A subset of C kernel developers just seem determined to make the lives of the Rust maintainers as difficult as possible. They don't see Rust as having value and would rather it just goes away.lore.kernel.org/lkml/202408282…
When I tried to upstream the DRM abstractions last year, that all was blocked on basic support for the concept of a "Device" in Rust. Even just a stub wrapper for struct device would be enough.

That simple concept only recently finally got merged, over one year later.
When I wrote the DRM scheduler abstractions, I ran into many memory safety issues caused by bad design of the underlying C code. The lifetime requirements were undocumented and boiled down to "design your driver like amdgpu to make it work, or else".
Read 14 tweets
Oct 21, 2022
🎉🎉🎉 My Linux M1 GPU driver passes >99% of the dEQP-GLES2 compliance tests!!!!! 🎉🎉🎉

Most of this is thanks to @alyssarzg's prior work on macOS, but now I can replicate it on Linux! ^^ Image
@alyssarzg Got some hints from Alyssa, now at 99.3%! Image
@alyssarzg The projected tests are known broken according to her, and the etc1 ones look like some weird rounding thing (they actually pass at 128x128?).

So really, it's down to one weird compiler issue, one weird rounding issue, the projection thing, and whatever is up with those last 2. Image
Read 4 tweets
Oct 14, 2022
Hello everyone~! Stream starts in 2 hours!!

I'm honestly not too confident about this one... it feels like every time I look at the problem, it looks like something else! Maybe it's time to investigate some related issues and see if they shed any light on the issue? ^^;;
Things that might be worth doing:
- Implement tracepoints for GPU stuff instead of printk
- Hook up GPU stats & ktrace to tracepoints
- Look closer at ASIDs
- Write the firmware heap allocator so we can stop leaking firmware objects as a workaround
Tracepoints are a fun one, because it's basically a bunch of C macros and the entire tracepoint.h has to be rewritten for Rust! But it's something I really want to start using soon... so maybe that's a good thing to work on today?
Read 4 tweets
Oct 5, 2022
There's a lot of weird debate about whether Rust in the kernel is useful or not... in my experience, it's way more useful than I could've ever imagined!

I went from 1st render to a stable desktop that can run run games, browsers, etc. in about two days of work on my driver (!!!)
All the concurrency bugs just vanish with Rust! Memory gets freed when it needs to be freed! Once you learn to make Rust work with you, I feel like it guides you into writing correct code, even beyond the language's safety promises. It's seriously magic! ✨
There is absolutely no way I wouldn't have run into race conditions, UAFs, memory leaks, and all kinds of badness if I'd been writing this in C.

In Rust? Just some logic bugs and some core memory management issues. Once those were fixed, the rest of the driver just worked!!
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(