There's a lot of weird debate about whether Rust in the kernel is useful or not... in my experience, it's way more useful than I could've ever imagined!
I went from 1st render to a stable desktop that can run run games, browsers, etc. in about two days of work on my driver (!!!)
All the concurrency bugs just vanish with Rust! Memory gets freed when it needs to be freed! Once you learn to make Rust work with you, I feel like it guides you into writing correct code, even beyond the language's safety promises. It's seriously magic! ✨
There is absolutely no way I wouldn't have run into race conditions, UAFs, memory leaks, and all kinds of badness if I'd been writing this in C.
In Rust? Just some logic bugs and some core memory management issues. Once those were fixed, the rest of the driver just worked!!
I tried kmscube, and it was happily rendering frames. Then I tried to start a KDE session, and it crashed after a while, but you know what didn't cause it? 3 processes trying to use the GPU at the same time, allocating and submitting commands in parallel. In parallel!!
After things work single-threaded in a driver as complex as this, having all the locking and threading just magically working as intended with no weird races or things stepping on top of each other is, as far as I'm concerned, completely unheard of for a driver this complex.
And then all the memory management just... happens as if by magic. A process using the GPU exits, and all the memory and structs it was using get freed. Dozens of lines in my log of everything getting freed properly. I didn't write any of that glue, Rust did it all for me!
(Okay, I wrote the part that hooks up the DRM subsystem to Rust, including things like dropping a File struct when the file is closed, which is what triggers all that memory management to happen... but then Rust does the rest!)
I actually spent more time tracking down a single forgotten `*` in the DCP driver (written in C by Alyssa and Janne, already tested) that was causing heap overflows than I spent tracking down CPU-side safety issues (in unsafe code) in Rust on my brand new driver, in total.
Even things like handling ERESTARTSYS properly: Linux Rust encourages you to use Result<T> everywhere (the kernel variant where Err is an errno), and then you just stick a ? after wherever you're sleeping/waiting on a condition (like the compiler tells you) and it all just works!
Seriously, there is a huuuuuuge difference between C and Rust here. The Rust hype is real! Fearless concurrency is real! And having to have a few unsafe {} blocks does not in any way negate Rust's advantages!
Some people seem to be misunderstanding the first tweet in this thread... I didn't write a driver in 2 days, I debugged a driver in 2 days! The driver was already written by then!
What I'm saying is that Rust stopped many classes of bugs from existing. Sorry if I wasn't clear!
There was also a bit of implementation work involved in those 2 days of work though - buffer sharing in particular wasn't properly implemented when I got first renders, so that was part of it, but the bulk of the driver was already done.
Apparently I have to clarify again?
I did write the driver myself (and the DRM kernel abstractions I needed). The 2 days were the debugging once the initial implementation was done. The Rust driver took 7 weeks, and I started reverse engineering this GPU 6 months ago...
This was the first stream where I started evaluating Rust to write the driver (I'd been eyeing the idea for a while, but this was the first real test).
Initially it was just some userspace experiments to prove the concept, then moved onto the kernel.
I mostly worked on stream and it's been 12 streams plus the debugging one, so I guess writing the driver took about 12 (long) days of work, plus a bit extra (spread out over 7 weeks because I stream twice per week and took one week off)
So just to be totally clear:
Reverse engineering and prototype driver: ~4 calendar months, ~20 (long) days of work
Rust driver development (including abstractions): ~7 calendar weeks, ~12 (long) days of work
Debugging to get a stable desktop: 5 calendar days, 2 days of work.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Almost every thread about Rust for Linux ends up with someone saying "why not Zig instead"? And usually the answer is just "it's less mature" or "nobody pushed it".
I didn't know anything about Zig, so I decided to take a look today... and I'm not very impressed ^^;;
Those are major reasons why I chose Rust for the drm/asahi driver...
It sounds like Zig is trying to be "modern C"... but the whole point of R4L is to not get stuck with C!
All those things Rust has that Zig doesn't are important for the things I'm doing.
Destructors/RAII are fundamental to how the driver tracks and cleans up firmware structures safely and reliably when needed. If I had to write "defer" everywhere it would be a bug-prone mess...
Honestly, I'm kind of sad about Wedson leaving RfL. He developed a huge part of the foundation that made Rust for Linux possible.
I'll still work on DRM (except sched) and driver upstreaming when the core stuff is in place, but I don't know about other subsystems.
At the rate things are going, I wouldn't be surprised if upstreaming the drm/asahi driver isn't possible until 2026 at the earliest. I had hopes for things to move much faster, but that's not possible without active cooperation from existing maintainers, and we aren't getting it.
Reading upstreaming mailing list threads is painful. Every second comment is "why is this not like C" or "do it like C". Nobody is putting any effort into understanding why Rust exists and why it works. It's just superficial "this code is scary and foreign" type reactions.
I think people really don't appreciate just how incomplete Linux kernel API docs are, and how Rust solves the problem.
I wrote a pile of Rust abstractions for various subsystems. For practically every single one, I had to read the C source code to understand how to use its API.
Simply reading the function signature and associated doc comment (if any) or explicit docs (if you're lucky and they exist) almost never fully tells you how to safely use the API. Do you need to hold a lock? Does a ref counted arg transfer the ref or does it take its own ref?
When a callback is called are any locks held or do you need to acquire your own? What about free callbacks, are they special? What's the intended locking order? Are there special cases where some operations might take locks in some cases but not others?
I regretfully completely understand Wedson's frustrations.
A subset of C kernel developers just seem determined to make the lives of the Rust maintainers as difficult as possible. They don't see Rust as having value and would rather it just goes away.lore.kernel.org/lkml/202408282…
When I tried to upstream the DRM abstractions last year, that all was blocked on basic support for the concept of a "Device" in Rust. Even just a stub wrapper for struct device would be enough.
That simple concept only recently finally got merged, over one year later.
When I wrote the DRM scheduler abstractions, I ran into many memory safety issues caused by bad design of the underlying C code. The lifetime requirements were undocumented and boiled down to "design your driver like amdgpu to make it work, or else".
🎉🎉🎉 My Linux M1 GPU driver passes >99% of the dEQP-GLES2 compliance tests!!!!! 🎉🎉🎉
Most of this is thanks to @alyssarzg's prior work on macOS, but now I can replicate it on Linux! ^^
@alyssarzg Got some hints from Alyssa, now at 99.3%!
@alyssarzg The projected tests are known broken according to her, and the etc1 ones look like some weird rounding thing (they actually pass at 128x128?).
So really, it's down to one weird compiler issue, one weird rounding issue, the projection thing, and whatever is up with those last 2.
I'm honestly not too confident about this one... it feels like every time I look at the problem, it looks like something else! Maybe it's time to investigate some related issues and see if they shed any light on the issue? ^^;;
Things that might be worth doing:
- Implement tracepoints for GPU stuff instead of printk
- Hook up GPU stats & ktrace to tracepoints
- Look closer at ASIDs
- Write the firmware heap allocator so we can stop leaking firmware objects as a workaround
Tracepoints are a fun one, because it's basically a bunch of C macros and the entire tracepoint.h has to be rewritten for Rust! But it's something I really want to start using soon... so maybe that's a good thing to work on today?