Can we use LLMs for bug detection?
- compiler testing: generate programs
- "like" static analyzers:
* what is wrong, how to fix it?
* this is wrong, how to fix it?
- cur. challenge: limited prompt size
- reasoning power? #Dagstuhl
Q: Isn't it the *unusual* and the *unlikely* that makes us find bugs?
A: You can increase temperature. Make it hallucinate more.
C: LLMs can't be trusted. Instead of bug finding, we should find use cases where we don't *need* to trust it. Maybe use it as a fuzzer guidance?
C: LLMs are good at dialogue. Maybe use it in an interactive manner?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
- Human artifacts (documentation) as oracles.
- How to infer oracles, e.g. from JavaDoc comments? What about false pos? Consider them as signal for user.
- Oracle problem impacts how good deduplication works.
- Metamorphic testing. Explore in other domains, e.g. perf. testing!
- Mine assertions and use them in a fuzzer feedback loop
- Assertions are the best way to build oracles into the code
- hyperproperties are free oracles (differential testing)
- ML to detect vuln patterns. Use as oracles
- Bugs as deviant behavior (Dawson)
- Bi-abductive symbolic execution
- Infer ran "symbolic execution" on changed part of every commit/diff
- Post-land analysis versus diff-time analysis changed fix rate from 0% to 70%. Why?
* Cost of context switch
* Relevance to developer
- Deploying a static analysis tool is an interaction with the developers.
- Devs would accept false positives and work with the team to "fit" the tool to the project rather.
- Audience matters!
* Dev vs SecEng
* Speed tolerance
* FP/FN tolerance
Security tooling
- ideal solution mitigates entire classes of bugs
- performance is important.
- adoption is critical!
- works with the ecosystem
Rewriting in memory-safe language (e.g. Swift)
- View new code as green islands in a blue ocean of memory-unsafe code.
- Objective: Turn blue to green.
- We need solutions with low adoption cost.
Motivation
- Keeping dependencies up2date is not easy.
- Breaking changes are problematic for dependants.
- Informally specified and difficult to check against your project
- general tools don't assist with changes.
Research challenges
- we fully trust the dependencies ecosystem.
- supply chain is reported to be full of vulnerabilities, how does a maintainer interpret this? 95% false positives?
"Coverage-guided fuzzing is probably the most widely used bug finding tool in industry. You can tell by the introductory slides everyone presented this morning".
--Dmitry Vyukov
In the future, we need more practical, simple, and sound techniques for bug finding,
- Find bugs in production
- Find new types of bugs
- Develop better dynamic tools
- Develop better static tools
- Require less human time
- Reports bugs in a way to improve fix rate!
Q: Should we add assertions to make fuzzers more effective at finding bugs?
A: Can do, but people do not even fix memory corruption bugs. The number of critical bugs found is not currently a problem.
The FUZZING'22 Workshop is organized by
* Baishakhi Ray (@Columbia)
* Cristian Cadar (@imperialcollege)
* László Szekeres (@Google)
* Marcel Böhme (#MPI_SP)
Baishakhi (@baishakhir) is well-known for her investigation of AI4Fuzzing & Fuzzing4AI. Generally, she wants to improve software reliability & developers' productivity. Her research excellence has been recognized by an NSF CAREER, several Best Papers, and industry awards.
Cristian Cadar (@c_cadar) is the world leading researcher in symbolic execution and super well-known for @kleesymex. Cristian is an ERC Consolidator, an ACM Distinguished Member, and received many, many awards, most recently the IEEE CS TCSE New Directions Award.