Recently modified code and sanitizer instrumentation seem to be among the most effective heuristics for target selection in directed #fuzzing according to this recent SoK by Weissberg et al. LLMs show much promise for target selection, too.
But in an interesting twist, the authors find that choosing functions by their complexity might be even better at retrieving functions that contained vulnerabilities in the past.
Now, this analysis is hypothetical and w.r.t. the discovery of vulnerabilities across the *entire history* of a repository. Since the objective of a directed fuzzer is to find (unknown) vulns in the *current version*, I would be excited to see this hypothesis put to the test.
When we actually run fuzzers implementing each of these heuristics on the most recent version of a program, do we expect the cyclomatic-complexity-guided fuzzer to outperform the other heuristics?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
- Human artifacts (documentation) as oracles.
- How to infer oracles, e.g. from JavaDoc comments? What about false pos? Consider them as signal for user.
- Oracle problem impacts how good deduplication works.
- Metamorphic testing. Explore in other domains, e.g. perf. testing!
- Mine assertions and use them in a fuzzer feedback loop
- Assertions are the best way to build oracles into the code
- hyperproperties are free oracles (differential testing)
- ML to detect vuln patterns. Use as oracles
- Bugs as deviant behavior (Dawson)
- Bi-abductive symbolic execution
- Infer ran "symbolic execution" on changed part of every commit/diff
- Post-land analysis versus diff-time analysis changed fix rate from 0% to 70%. Why?
* Cost of context switch
* Relevance to developer
- Deploying a static analysis tool is an interaction with the developers.
- Devs would accept false positives and work with the team to "fit" the tool to the project rather.
- Audience matters!
* Dev vs SecEng
* Speed tolerance
* FP/FN tolerance
Security tooling
- ideal solution mitigates entire classes of bugs
- performance is important.
- adoption is critical!
- works with the ecosystem
Rewriting in memory-safe language (e.g. Swift)
- View new code as green islands in a blue ocean of memory-unsafe code.
- Objective: Turn blue to green.
- We need solutions with low adoption cost.
Motivation
- Keeping dependencies up2date is not easy.
- Breaking changes are problematic for dependants.
- Informally specified and difficult to check against your project
- general tools don't assist with changes.
Research challenges
- we fully trust the dependencies ecosystem.
- supply chain is reported to be full of vulnerabilities, how does a maintainer interpret this? 95% false positives?
Can we use LLMs for bug detection?
- compiler testing: generate programs
- "like" static analyzers:
* what is wrong, how to fix it?
* this is wrong, how to fix it?
- cur. challenge: limited prompt size
- reasoning power? #Dagstuhl
Q: Isn't it the *unusual* and the *unlikely* that makes us find bugs?
A: You can increase temperature. Make it hallucinate more.
C: LLMs can't be trusted. Instead of bug finding, we should find use cases where we don't *need* to trust it. Maybe use it as a fuzzer guidance?
"Coverage-guided fuzzing is probably the most widely used bug finding tool in industry. You can tell by the introductory slides everyone presented this morning".
--Dmitry Vyukov
In the future, we need more practical, simple, and sound techniques for bug finding,
- Find bugs in production
- Find new types of bugs
- Develop better dynamic tools
- Develop better static tools
- Require less human time
- Reports bugs in a way to improve fix rate!
Q: Should we add assertions to make fuzzers more effective at finding bugs?
A: Can do, but people do not even fix memory corruption bugs. The number of critical bugs found is not currently a problem.