My Authors
Read all threads
Welcome back to the NDSS Bar livetweet! In our afternoon session we're looking at obfuscation and how to defeat it. First up is Vaibhav Sharma, with "It Doesn’t Have to Be So Hard: Efficient Symbolic Reasoning for CRC"
Some papers, including some of the anti-fuzzing work from USENIX Security last year, have argued that CRCs are good for slowing down symbolic execution. Sharma et al. ask: is this true?
What do CRC implementations look like? One technique used in Fuzzitication is this
The problem is that each side of the comparison if ((int)(crc ^ byte) < 0) is feasible, so the analysis has to branch on every byte of the input. This causes state explosion.
The authors point out that there is already a good technique that deals with this kind of path explosion: path merging, which represents both side of the branch in a single symbolic formula.
The other major class of CRC implementation uses a lookup table indexed by the input byte
But the authors note that this table has a special structure: T[i XOR j] = T[i] XOR T[j]. This lets them develop a more efficient representation.
The linear structure of the table means that you use just a few values (the "spine" of the table) and then represent the rest of the table in terms of that spine. This makes it much more efficient to solve via SMT.
So now they have techniques that can hopefully deal with both kinds of implementation. Are these effective at defeating CRCs? Time to evaluate.
Left: the loop-based bytewise implmentation, right: the table lookup implementation. In both cases although the naive approach is very slow, more tuned techniques can solve them effectively.
So, myth busted: CRCs are not really effective at foiling symbolic execution, because they have a nice mathematical structure that is efficiently solved once that structure can be exposed to the solver.
Up next is a talk on defeating opaque predicates by Yu-Jye Tung: "A Heuristic Approach to Detect Opaque Predicates that Disrupt Static Disassembly"
Opaque predicates are essentially branches in the program that are guaranteed to always go one direction—but determining that statically (i.e., without running the program) is very difficult. This leads to more dead code to explore, hindering analysis.
Lots of prior work has looked at identifying which predicates are opaque using techniques like abstract interpretation/VSA and dynamic symbolic execution.
This work takes a different approach. They note that when opaque predicates are used to insert dead code, that dead code is often invalid in many ways. So instead of trying to detect the predicate, they try to identify the junk code.
They develop a set of heuristics, including rules like "bad memory address", "abrupt basic block end" that identify code that, if it were run, would have obvious errors. This code is likely to be junk.
The nice thing about this approach is that it's agnostic to how clever the opaque predicate is—even very complicated predicates that use IPC, etc. will be detected as long as they guard junk code.
More heuristics: registers set without being used, use of privileged instructions in user-mode code
They evaluate on code obfuscated using Tigress and look at false positives and negatives. Results against Tigress are promising and don't depend on the hardness of the predicates used.
Last talk of the session will have not just a slideshow but a live demo! Robin David presents "QSynth - A Program Synthesis based approach for Binary Code Deobfuscation"
Starts by introducing two types of obfuscation: control flow obfuscation and data flow obfuscation. Both types of obfuscation are hard to undo!
In prior work they've used dynamic symbolic execution to defeat constraints, but it's limited because difficulty of symbolic execution scales with *syntactic* complexity of the code (not just semantic complexity).
Here instead they use program synthesis. They can look at I/O samples of the obfuscated code and the try to synthesize a smaller expression that does the same thing. Downside is that there is a big search space of possible programs.
They use Offline Enumerative Search: generate a set of programs offline using a grammar and index them according to I/O behavior. This is unsound, but conveniently can be reused across programs.
Evaluation uses Tigress 2.2 with EncodeArithmetic, EncodeData, and Virtualization obfuscations. Dataset is 500 functions per obfuscation. Lookup tables are 3.3M expressions.
Compared to Syntia (prior work using program synthesis for deobfuscation), they are able to deobfuscate more expressions, faster
Audience question: how many inputs do you need to distinguish? Answer: (backup slide!) they tested this experimentally, and found that ~15 inputs was enough to get very low FP rate on their dataset.
Demo time: they start off by generating a trace of the program and then load the trace into IDA. IDA plugin shows all the concrete values along the trace too (nice!)
Demo works (pretty fast!) and shows transformation of a really gnarly program down to a tiny expression. Now going to go through the steps in more detail.
Showing off a nice python interface to the traces. Uses @capstone_engine capstone for disassembly, makes concrete register and memory values available, can quickly jump anywhere in the trace.
@capstone_engine Working with the symbolically executed trace in Python
Here’s the expression we want to simplify
After loading the lookup tables we can look at the expressions generated from the grammar
The ASTs can be evaluated which is what the synthesizer will use
Successfully deobfuscated!
Authors are still working on integrating it all into IDA; goal is to be able to just specify start/stop in IDA, hit "deobfuscate", and get back a nice simplified expression.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Brendan Dolan-Gavitt

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!