12,399 views

Brendan Dolan-Gavitt

@moyix

, 36 tweets, 7 min read

My Authors

Welcome back to the NDSS Bar livetweet! In our afternoon session we're looking at obfuscation and how to defeat it. First up is Vaibhav Sharma, with "It Doesn’t Have to Be So Hard: Efficient Symbolic Reasoning for CRC"

Some papers, including some of the anti-fuzzing work from USENIX Security last year, have argued that CRCs are good for slowing down symbolic execution. Sharma et al. ask: is this true?

What do CRC implementations look like? One technique used in Fuzzitication is this

The problem is that each side of the comparison if ((int)(crc ^ byte) < 0) is feasible, so the analysis has to branch on every byte of the input. This causes state explosion.

The authors point out that there is already a good technique that deals with this kind of path explosion: path merging, which represents both side of the branch in a single symbolic formula.

The other major class of CRC implementation uses a lookup table indexed by the input byte

But the authors note that this table has a special structure: T[i XOR j] = T[i] XOR T[j]. This lets them develop a more efficient representation.

The linear structure of the table means that you use just a few values (the "spine" of the table) and then represent the rest of the table in terms of that spine. This makes it much more efficient to solve via SMT.

So now they have techniques that can hopefully deal with both kinds of implementation. Are these effective at defeating CRCs? Time to evaluate.

Left: the loop-based bytewise implmentation, right: the table lookup implementation. In both cases although the naive approach is very slow, more tuned techniques can solve them effectively.

So, myth busted: CRCs are not really effective at foiling symbolic execution, because they have a nice mathematical structure that is efficiently solved once that structure can be exposed to the solver.

Up next is a talk on defeating opaque predicates by Yu-Jye Tung: "A Heuristic Approach to Detect Opaque Predicates that Disrupt Static Disassembly"

Opaque predicates are essentially branches in the program that are guaranteed to always go one direction—but determining that statically (i.e., without running the program) is very difficult. This leads to more dead code to explore, hindering analysis.

Lots of prior work has looked at identifying which predicates are opaque using techniques like abstract interpretation/VSA and dynamic symbolic execution.

This work takes a different approach. They note that when opaque predicates are used to insert dead code, that dead code is often invalid in many ways. So instead of trying to detect the predicate, they try to identify the junk code.

They develop a set of heuristics, including rules like "bad memory address", "abrupt basic block end" that identify code that, if it were run, would have obvious errors. This code is likely to be junk.

The nice thing about this approach is that it's agnostic to how clever the opaque predicate is—even very complicated predicates that use IPC, etc. will be detected as long as they guard junk code.

More heuristics: registers set without being used, use of privileged instructions in user-mode code

They evaluate on code obfuscated using Tigress and look at false positives and negatives. Results against Tigress are promising and don't depend on the hardness of the predicates used.

Last talk of the session will have not just a slideshow but a live demo! Robin David presents "QSynth - A Program Synthesis based approach for Binary Code Deobfuscation"

Starts by introducing two types of obfuscation: control flow obfuscation and data flow obfuscation. Both types of obfuscation are hard to undo!

In prior work they've used dynamic symbolic execution to defeat constraints, but it's limited because difficulty of symbolic execution scales with *syntactic* complexity of the code (not just semantic complexity).

Here instead they use program synthesis. They can look at I/O samples of the obfuscated code and the try to synthesize a smaller expression that does the same thing. Downside is that there is a big search space of possible programs.

They use Offline Enumerative Search: generate a set of programs offline using a grammar and index them according to I/O behavior. This is unsound, but conveniently can be reused across programs.

Evaluation uses Tigress 2.2 with EncodeArithmetic, EncodeData, and Virtualization obfuscations. Dataset is 500 functions per obfuscation. Lookup tables are 3.3M expressions.

Compared to Syntia (prior work using program synthesis for deobfuscation), they are able to deobfuscate more expressions, faster

Audience question: how many inputs do you need to distinguish? Answer: (backup slide!) they tested this experimentally, and found that ~15 inputs was enough to get very low FP rate on their dataset.

Demo time: they start off by generating a trace of the program and then load the trace into IDA. IDA plugin shows all the concrete values along the trace too (nice!)

Demo works (pretty fast!) and shows transformation of a really gnarly program down to a tiny expression. Now going to go through the steps in more detail.

@capstone_engine

@capstone_engine

Showing off a nice python interface to the traces. Uses @capstone_engine capstone for disassembly, makes concrete register and memory values available, can quickly jump anywhere in the trace.

@capstone_engine

@capstone_engine

@capstone_engine Working with the symbolically executed trace in Python

Here’s the expression we want to simplify

After loading the lookup tables we can look at the expressions generated from the grammar

The ASTs can be evaluated which is what the synthesizer will use

Successfully deobfuscated!

Authors are still working on integrating it all into IDA; goal is to be able to just specify start/stop in IDA, hit "deobfuscate", and get back a nice simplified expression.

Enjoying this thread?

Keep Current with Brendan Dolan-Gavitt

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

More from @moyix see all

Related threads

Trending hashtags

Embed code for your website

Did Thread Reader help you today?