The basic problem we're looking at in this paper is: if you buy some embedded/IoT device, it may come with a bunch of features that you don't use (say, Bluetooth) that nonetheless require driver support and expose unnecessary attack surface.
Maybe you're a company deploying a fleet of Meraki routers, you don't need the Bluetooth Low Energy localization stuff, and you're worried about vulnerabilities like this one arstechnica.com/information-te…
So we want to disable some hardware functionality by rewriting the firmware. The problem is, it's hard to do this in a way that generalizes to the myriad embedded OSes and different hardware platforms out there.
Key idea: almost all input from the outside world is delivered to an embedded system when some peripheral raises an interrupt, or IRQ. By focusing on automatically reverse engineering the IRQ handling for a device, we can enumerate IRQ sources and disable the ones we don't want!
IRQ handling usually looks something like this. The CPU has a small number (one or two) dedicated pins connected to an interrupt controller that multiplexes interrupts from individual peripherals. To support more peripherals, interrupt controllers can be chained together.
By exploring the interrupt handling code starting from the top-level handler, we can eventually discover the individual handlers for each peripheral! And once found, we can replace the ones we don't want with a small stub that just ignores the interrupt and returns.
We initially tried to do this with symbolic execution, but the liberal use of function pointers and privileged instructions (which most sym exec engines don't support) made this approach impractical. Instead, we use snapshot-based fuzzing.
We start by collecting a system-level snapshot (RAM and CPU regs) from the target embedded device. We then load this snapshot in an emulator (PANDA) and trigger a top-level interrupt. From there, we fuzz the interrupt handling by responding to memory-mapped with fuzzed inputs.
This fuzzing gives us a big collection of traces. Somewhere in there should be the actual handlers we want. How do we find them? The key observation is that the traces will differ from one another (diverge) when they branch off into the individual handlers–so we can "diff" them!
We borrow and extend a technique called execution indexing (Xin et al., PLDI '08) to compare traces and precisely identify handlers. Then all we have to do is disable each handler one at a time until we find the one the corresponds to the functionality we want to get rid of.
The big challenge with this paper is evaluation – there are tons of different embedded platforms running all sorts of OSes. We ended up choosing seven different devices, running four OSes across two CPU architectures (ARM and MIPS).
This is also why I have a giant pile of embedded junk sitting around my apartment now :p
The fuzzing patterns are very effective at uncovering the handlers! We were able to discover all handlers in our devices, except for two on the SABRE Lite that turned out to be already disabled. Most handlers are discovered within 3 hours of fuzzing.
We also did a detailed attack surface analysis on the Steam Link by looking at CVEs reported in the Linux kernel over the past 5 years and matching them to drivers on the device. We found that up to 44 kernel CVEs could have been mitigated depending on which IRQs were disabled.
To wrap up this thread, I want to thank my co-author Zhenghao (@highw4y2h3ll) who put in tons of work doing RE to establish ground truth for our evaluation and implementing the binary analyses we used. I'm very lucky to have him as my advisee!
@highw4y2h3ll This project took a *long* time to come to fruition—we started working on it in 2016! So I must also thank the undergrads and masters students I've worked with during that time, nibbling away at small pieces of the problem until we understood it well enough to build this.
@highw4y2h3ll You can find all our code and data (including sample memory and CPU snapshots for the evaluated devices) on GitHub: github.com/messlabnyu/irq…
And that's all, folks (though there are of of course many more details in the paper)! You can also read this thread all in one place with ThreadReaderApp: threadreaderapp.com/thread/1473695…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Hmm, this is actually much less impressive than I expected as far as inverting PhotoDNA (based only on reading @hackerfactor's blog post) reddit.com/r/MachineLearn…
@hackerfactor@matthew_d_green perhaps of interest if you haven't seen it yet and want to take a break from fighting with half of CS twitter about NFTs ;)
Ah, I see, it's taking a pure black box ML approach to try and learn the inverse straight from the hashes. OK, that is pretty impressive!
So, with Broadcom's acquisition of Symantec, it seems like the source code for PGP Desktop (aka Symantec Encryption Desktop) is nowhere on the internet? I have a copy but I'm pretty sure I can't host it anywhere:
Seems like a loss for archival and data recovery work! :(
FWIW, the version I have is:
MD5 (PGPDesktop10.0.1_Source.zip) = c9193850f923cda995e6d4b9f45fcbdf
Probably getting old, I opted to just pay for a janky conversion utility rather than try to RE the Microsoft Outlook 15 message format :(
(I may still RE it)
The format is a pain in the ass, it stores messages in 3 undocumented binary parts: metadata, message body, and attachments. It has an sqlite database but that just points you to the metadata file.
Also, everything is referenced by GUIDs, which are in a mix of
- Raw binary GUID data
- ASCII GUIDs
- UTF-16-LE GUIDs
- Base64-encoded blobs that contain GUIDs
The camera-ready version of our @IEEESSP 2022 paper evaluating the security of code generated by GitHub CoPilot is now up on arXiv! arxiv.org/abs/2108.09293
@IEEESSP We designed 89 different scenarios for Copilot to complete based on MITRE's "Top 25 Most Dangerous Software Weaknesses" (cwe.mitre.org/top25/archive/…), and then had Copilot generate completions for each scenario, creating 1,689 programs.
@IEEESSP This is too many to check by hand, so we used CodeQL with a combination of built-in queries and our own custom queries to check the resulting code for the relevant vulnerability. Surprisingly (at least to me), ~40% of the suggestions overall were vulnerable!
Quite neat: they hooked GPT-3 up to the web and let it search for sources using a text-based web browser & used RL+human feedback to improve the truthfulness of its answers! It can even cite its sources: openai.com/blog/improving…
Although I imagine the restriction to sites that actually have any usable content without JavaScript changes the quality of info - might even make it more accurate :p
The next obvious step is to give it the ability to ask questions on Quora/StackOverflow ;)