A commentary on @gamozolabs' perspective.
(Verdict: Strong accept).
Two fuzzers. Both achieve the same coverage eventually. Yet, one performs really well at the beginning while the other performs really well in the long run. (What is a reasonable time budget? 🤔)
(More on that in my talk at #FuzzConEurope2020).
The FuzzBench team has been doing an awesome job! For them, there are some nice feature requests (e.g., use default log-x-scale, report coverage over fuzz cases, evaluate scalability). For us, fuzzer evaluation is an open challenge.