Marcel Böhme👨‍🔬 Profile picture
Mar 27 4 tweets 1 min read Twitter logo Read on Twitter
Can we use LLMs for bug detection?
- compiler testing: generate programs
- "like" static analyzers:
* what is wrong, how to fix it?
* this is wrong, how to fix it?
- cur. challenge: limited prompt size
- reasoning power?
#Dagstuhl
Q: Isn't it the *unusual* and the *unlikely* that makes us find bugs?
A: You can increase temperature. Make it hallucinate more.
C: LLMs can't be trusted. Instead of bug finding, we should find use cases where we don't *need* to trust it. Maybe use it as a fuzzer guidance?
C: LLMs are good at dialogue. Maybe use it in an interactive manner?

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Marcel Böhme👨‍🔬

Marcel Böhme👨‍🔬 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mboehme_

Mar 28
After oracles for memory-safety, what's next?

- generic correctness prop.
- dataflow based properties
- "unusually large" resource consumptions

* Program-specific vs generic oracles
* One input (e.g., crash) vs distribution (e.g. performance)
* Ref implementation(s)

#Dagstuhl
- Human artifacts (documentation) as oracles.
- How to infer oracles, e.g. from JavaDoc comments? What about false pos? Consider them as signal for user.
- Oracle problem impacts how good deduplication works.
- Metamorphic testing. Explore in other domains, e.g. perf. testing!
- Mine assertions and use them in a fuzzer feedback loop
- Assertions are the best way to build oracles into the code
- hyperproperties are free oracles (differential testing)
- ML to detect vuln patterns. Use as oracles
- Bugs as deviant behavior (Dawson)
Read 5 tweets
Mar 28
Peter O'Hearn (@PeterOHearn12) on "Hits and Misses from a decade of program analysis in industry".

#Dagstuhl
- Bi-abductive symbolic execution
- Infer ran "symbolic execution" on changed part of every commit/diff
- Post-land analysis versus diff-time analysis changed fix rate from 0% to 70%. Why?
* Cost of context switch
* Relevance to developer
- Deploying a static analysis tool is an interaction with the developers.
- Devs would accept false positives and work with the team to "fit" the tool to the project rather.
- Audience matters!
* Dev vs SecEng
* Speed tolerance
* FP/FN tolerance
Read 5 tweets
Mar 28
Anna Zaks on "From Bug Detection to Mitigation and Elimination".

- Static and dynamic analysis.
- Hard to ensure coverage at scale!

#Dagstuhl
Security tooling
- ideal solution mitigates entire classes of bugs
- performance is important.
- adoption is critical!
- works with the ecosystem
Rewriting in memory-safe language (e.g. Swift)
- View new code as green islands in a blue ocean of memory-unsafe code.
- Objective: Turn blue to green.
- We need solutions with low adoption cost. Image
Read 4 tweets
Mar 28
Anders Møller (@amoellercsaudk) on "Dependencies Everywhere".

#Dagstuhl Image
Motivation
- Keeping dependencies up2date is not easy.
- Breaking changes are problematic for dependants.
- Informally specified and difficult to check against your project
- general tools don't assist with changes.
Research challenges
- we fully trust the dependencies ecosystem.
- supply chain is reported to be full of vulnerabilities, how does a maintainer interpret this? 95% false positives? Image
Read 5 tweets
Mar 27
"Coverage-guided fuzzing is probably the most widely used bug finding tool in industry. You can tell by the introductory slides everyone presented this morning".
--Dmitry Vyukov
In the future, we need more practical, simple, and sound techniques for bug finding,
- Find bugs in production
- Find new types of bugs
- Develop better dynamic tools
- Develop better static tools
- Require less human time
- Reports bugs in a way to improve fix rate!
Q: Should we add assertions to make fuzzers more effective at finding bugs?
A: Can do, but people do not even fix memory corruption bugs. The number of critical bugs found is not currently a problem.

syzkaller.appspot.com/upstream
Read 6 tweets
Apr 23, 2022
The FUZZING'22 Workshop is organized by
* Baishakhi Ray (@Columbia)
* Cristian Cadar (@imperialcollege)
* László Szekeres (@Google)
* Marcel Böhme (#MPI_SP)

Artifact Evaluation Committee Chair (Stage 2)
* Yannic Noller (@NUSComputing)
Baishakhi (@baishakhir) is well-known for her investigation of AI4Fuzzing & Fuzzing4AI. Generally, she wants to improve software reliability & developers' productivity. Her research excellence has been recognized by an NSF CAREER, several Best Papers, and industry awards. Image
Cristian Cadar (@c_cadar) is the world leading researcher in symbolic execution and super well-known for @kleesymex. Cristian is an ERC Consolidator, an ACM Distinguished Member, and received many, many awards, most recently the IEEE CS TCSE New Directions Award. Image
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(