Pierre H. 🔥🌸 Profile picture
present: security (zalloc, kalloc_type, IPC, VM, …) | past: GCD, synchro, objc_direct, perf… | timeless: 🇫🇷 snark | @madcoder@infosec.exchange
ćť± Profile picture 1 subscribed
Jul 20, 2022 • 4 tweets • 1 min read
iOS 15.6 (and aligned macOS trains) has some cool stuff, in the vm_kern subsystem!

github.com/apple-oss-dist…

The core of vm_kern is now kmem_{alloc,free,...}_guard, kmem_alloc, kernel_memory_allocate… are all wrappers around it. It has 2 cool features… one is that we ported Sad Feng Shui to the kernel VM, by having more “pointer ranges”. When the allocation comes from kalloc_type_var, it uses types to determine the layout. For nameless allocations, it uses a seeded hash of the allocation backtrace.
Jun 7, 2022 • 5 tweets • 1 min read
[thread] Oh. By the way. iOS 16 is the time to pour one to good old friends that left.

I’ll add tweets as people find them. And I’ll start with a freebie: the venerable voucher user data attribute, is no longer. I know it had already gone out of favor recently, but now it’s really entirely gone.

🍻 Let’s pour one to voucher user data🍻
Jun 5, 2022 • 6 tweets • 2 min read
I think I vastly disagree. From a defensive PoV, I know ideas aren’t “new”, but making them practical and shippable is actually at least as much work as coming up with the ideas if not more.

Commonly referred to as the “second 90% of the project”. It’s not as sexy and flashy as coming up with the idea, but it is often an engineering feat, the kind that actually deserves papers.

I remember @bcantrill lamenting at a Usenix keynote a few years ago that practitioners do not get to publish anymore and it was a loss. I agree.
May 21, 2022 • 6 tweets • 2 min read
So this is aligned with iOS 15.5 which had a couple new exciting things for memory safety. PGZ is a cool feature (Probabilistic Guard Zalloc) which has a mode to find out-of-bounds bugs passively github.com/apple-oss-dist…

It is a cute cheap thing that aligns zone chunks “rightward” and can slide the last allocation, followed by a guard page.
Mar 30, 2022 • 5 tweets • 2 min read
I’m not trying to mock them too hard either. but 20+% memory cost is not “low perf overhead”. I think the problem is that academia has decided that the goal is “shippable in products” which is IMO invalid.

The best products/shippable ideas are a refinement and … … compromise and subtle balance between tons of ideas, by typically making them simpler in a way that academia would not accept (for example making a mitigation probabilistic with a less than 99.9% protection rate, or other “impure” simplifications)

...
Mar 13, 2022 • 4 tweets • 1 min read
🧵 15.2 aligned XNU had 2 really important memory safety mitigations.

The first one is the use of the read-only (PPL-protected) allocator, which has been used to protect the chain from current_thread() to the various creds and labels.

See e.g. github.com/apple-oss-dist…

... But really, I want to present an extremely elegant (I can say so because it wasn't my idea) memory safety mitigation in the heap that landed in that kernel.

A colleague of mine keeps saying well thought mitigations compose and reinforce each other…
Feb 2, 2022 • 4 tweets • 3 min read
@_saagarjha @Catfish_Man I do love C bashing but in this instance it's not a C issue. Most fancier data structures (and arrays count as "fancy" here) require external storage and in a kernel doing allocations is not a given and that makes them rare data structures. @_saagarjha @Catfish_Man Concurrency similarly needs its objects (or the path to them) not to be relocated all the time which requires functional data structures and well you end up with lists fairly quickly.
Nov 29, 2021 • 5 tweets • 4 min read
@jsherma100 @s1guza That was a really nice write up of the zone allocator work in passing Ty. @jsherma100 @s1guza Btw: zone_pva_t is 32 bits so that the metadata is 16 bytes per page (down from 24 in iOS 13) and that zone require inlines with LTO and do no stinkin’ multiplication (so that we can sprinkle it all over)
Nov 25, 2021 • 13 tweets • 8 min read
@Gok @lazytyped I’ll bite and see if @lazytyped trained me well.

In general post exploitation mitigations are “bad” because they fight a lost battle. So you want to make sure that they do not introduce insane complexity or costs.

In the instance here per function aslr (I suppose… @Gok @lazytyped … as I didn’t look at the implementation) requires for your functions to be page aligned so that you can relocate them. It will introduce padding and cost you wired memory, will possibly hurt inlining.
Nov 19, 2021 • 5 tweets • 1 min read
There’s an obvious simple reason why: context switches.

With more words. A weak CAS is an ll/sc. if you got context switched between an ldrex and strex well it’ll fail. There’s a more fundamental reason which is that apple SOCs there’s a decrementer programmed to generate the equivalent of an SEV regularly. But even without that the context switch is reason enough. I wouldn’t even be surprised it the HW is allowed to do it for other reasons too
Aug 12, 2021 • 4 tweets • 2 min read
@spendergrsec We typically do not comment on future roadmaps so I won’t go in details. But yes unlike autoslab we rely on manual adoption and it isn’t thorough yet, and covers zones (~= sub-page) for now.

We are indeed type based which gives us precise free sites (the free site pins types) @spendergrsec Overall from what I gather from autoslab we made significantly different choices in the details but it is completely the same spirit.

Eventually this will be Opensourced and I can’t wait to show our work :)
Jun 30, 2020 • 14 tweets • 3 min read
Today is dedicated to the last thread about obj-c optimizations...

IMP Caches, the crazy stuff: precomputing them (memory). This is what the new codepath in objc_msgSend () is about. The whole explanation just doesn't fit, it would be a talk on its own. Several engineers worked on that.
Jun 28, 2020 • 13 tweets • 5 min read
And now some work from this release, mostly focused on memory.

Unlike yesterday, this touches to the scale of optimizing for an operating system rather than for a single process.

Again, this is the work of several people. Optimizing class_rw_t (memory)

@mikeash / @AirspeedSwift's talk covers this. This structure holds Writeable runtime metadata for the classes to work at runtime. But only half of that 8-word structure was used commonly.
Jun 28, 2020 • 12 tweets • 3 min read
Here is to the last 2 years of optimizing the Obj-C Runtime for speed and memory. A great many deal of people were involved.

Some of it was covered already, and I will post several of these threads over the next few days.... and tonight we'll start with last year's work. The objective-C *refs (speed, some memory)

Obj-C has Classes, Protocols and Selectors all have a canonical unique pointer.

The runtime will unique those when your image is loaded. References to those objects uses a pointer indirection called a {class,proto,sel}ref.
Feb 19, 2020 • 17 tweets • 4 min read
This is a question that was asked soooo many times, and is actually the _wrong_ question to ask, so let's make a thread:

propagating QoS and priorities, what does that mean (1/...) QoS is a label, its rules of propagation are semi complex, but DO NOT depend on the state of the system.

It's propagated by only 2 mechanisms (and anything built atop of it), and one secondary obsolete subsystem.
Nov 20, 2019 • 18 tweets • 3 min read
About objc_direct, a thread.

I should have probably anticipated that people would raise eyebrows and spent more time explaining the point in the LLVM commit, so here it is... The Obj-C dynamic dispatch comes with many costs, this is common "knowledge". However the details of it are rarely known.

Beside the obvious cost of the h-lookup, it comes with 4 other kinds of costs:
- codegen size
- optimization barrier
- static metadate
- runtime metadata
Jun 4, 2019 • 6 tweets • 2 min read
apple.com/ios/ios-13-pre… All is now live in Beta 6!

You need to reinstall your apps (restore from iCloud is the easiest way) to get the repackaged app version.

Then have fun with 2nd/3rd party apps. And yes, the speed improvement happens on all devices (from the 6s to the Xs, iPads too).
Jan 5, 2019 • 8 tweets • 2 min read
Re the last discussions, dispatch_async() can be used for 3 different things:

(1) asynchronous state machines (onto the same queue hierarchy), which is a way to address C10k and is fast

(2) getting concurrency (a better pthread_create())

(3) parallelism (dispatch_apply() (1) provided you use dispatch_async_f for the shortest things to avoid allocating blocks, dispatch is fast, and it's great almost whatever the size of your workitem (assuming you do something meaningful).