Follow @mboehme_

12,399 views

Marcel Böhme

Follow @mboehme_

, 10 tweets, 4 min read

My Authors

@gamozolabs

@gamozolabs

[#Fuzzing Evaluation] How do we know which fuzzer finds the largest number of important bugs within a reasonable time in software that we care about?

A commentary on @gamozolabs' perspective.
(Verdict: Strong accept).

https://twitter.com/gamozolabs/status/1293156877564436480

YES! We need to present our plots on a log-x-scale. Why? mboehme.github.io/paper/FSE20.Em…
Two fuzzers. Both achieve the same coverage eventually. Yet, one performs really well at the beginning while the other performs really well in the long run. (What is a reasonable time budget? 🤔)

Nice! I agree, comparing *time-to-same-coverage* provides more information about fuzzer efficiency than comparing coverage-at-a-given-time.

On a log-x-scale, you might be able to see convergence. You can confidently extrapolate by one or two orders of magnitude. Running the campaign 10 days or 100 days instead of 1 day is not practical, though.

In papers, we should prefer coverage plots over tables.

Excellent point! This is part of the answer for "How do we compare techniques instead of their implementations?"

(More on that in my talk at #FuzzConEurope2020).

I agree, scalability is an important property of a fuzzer, but it is "orthogonal" to other properties. E.g., if I work on a power schedule, I could compare my schedule against the baseline on a single core & assume perfect scaling. Scaling is an independent research question.

I fully agree that we should study all kinds of fuzzer properties. The problem with subject-specific fuzzers is that you can't just throw them on the next random subject. It's like trying to build a custom car for everyone who needs a car. It would just not be scalable/practical.

@gamozolabs

@gamozolabs

In summary, go read @gamozolabs' blog post!

The FuzzBench team has been doing an awesome job! For them, there are some nice feature requests (e.g., use default log-x-scale, report coverage over fuzz cases, evaluate scalability). For us, fuzzer evaluation is an open challenge.

LibFuzzer, AFL++, and HonggFuzz went through major performance improvements -- enabled by FuzzBench.

(From the FuzzBench team via github.com/google/fuzzben…)

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Try unrolling a thread yourself!

Related hashtags

More from @mboehme_ see all

Embed code for your website

Did Thread Reader help you today?