YES! We need to present our plots on a log-x-scale. Why? mboehme.github.io/paper/FSE20.Em…
Two fuzzers. Both achieve the same coverage eventually. Yet, one performs really well at the beginning while the other performs really well in the long run. (What is a reasonable time budget? 🤔)
Nice! I agree, comparing *time-to-same-coverage* provides more information about fuzzer efficiency than comparing coverage-at-a-given-time.
For my new followers, my research group is interested in techniques that make machines attack other machines with maximal efficiency. All our tools are open-source, so people can use them to identify security bugs before they are exploited.
This is how it all started.
My first technical paper introduced a technique that could, in principle, *prove* that no bug was introduced by a new code commit [ICSE'13]. This was also the first of several symbolic execution-based whitebox fuzzers [FSE'13, ASE'16, ICSE'20].
Yet, something was amiss. Even a simple random input generator could outperform my most effective whitebox fuzzer if it generated inputs fast enough. To understand why, we modelled fuzzing as a sampling process and proved some bounds [FSE'14, TSE'15].
Kostya's keynote: LibFuzzer hasn't found new bugs in <big software companie>'s library. We didn't know why. Later we got a note that they are now using LibFuzzer during regression testing in CI and that it prevented 3 vulns from reaching to production.
In Chrome, libFuzzer found 4k bugs and 800 vulns. In OSS-Fuzz, libFuzzer found 2.4k bugs (AFL found 500 bugs) over the last three years.