I asked #AcademicChatter about incentives & processes behind paper machines (i.e., researchers publishing top-venue papers at unusually high rates).
This is what I learned 🧵
TL;DR: Any incentive emerges from our community values. It is not "them" who needs to change. It is us.
It was tremendously exciting to get so many perspectives from so many junior and senior researchers across different disciplines. This was only a random curiosity of mine but it seemed to hit a nerve. I loved the positive, constructive tone in the thread.
Let's get started.
2/12
Some of you raised serious concerns about academic misconduct. However, to keep the discussion constructive, let's assume researcher integrity. We'll explore alternative explanations and processes below.
3/12
Everyone has their own aspirations and goals. Most of you focus on your own growth, try to make positive impact & benefit society. We should avoid metrics-induced pressure, stop glorifying paper machines. We'll not be remembered for doing well in a metric.
4/12
Why paper machines? We evaluate academic performance based on things we can count (h-index) while research significance is clearly uncountable. Papers become products, an end-in-itself, not a means. Our focus on metrics stalls scientific progress.
5/12 en.wikipedia.org/wiki/Goodhart%…
Whichever incentives there are, they emerge from our own community values.
On grant proposals. On recruitment and promotion committees. When inducting fellows into societies. When granting awards.
6/12
If you find this focus on metrics awful, write recommendation letters and assessor reports with a *qualitative* assessment of research significance. When you serve on recruiting, promotion, or grant committees, read the candidate's papers and evaluate the contents carefully.
7/12
In terms of processes, the most cited enabler of paper machines was a large network or a deep hierarchy. Indeed, the profile we discussed had 250+ co-authors only 6 years Post-PhD. An open question is: How do well-networked individuals handle reviewing conflicts?
8/12
In a network, the researcher could be member in several working groups or an association of collaborators. Or, the researcher might work in a big consortium where authors from different institutions need to exist on every paper to represent the consortium (e.g., CERN).
8.1/12
In a hierarchy, the researcher is in a position of power. This could be a professor managing a large group of PostDocs and PhD students who actually do the work. Or this could be someone holding a lot of funding, like the Principal Investigator in a large project.
8.2/12
The researcher might work in an area with a higher maximum reasonable publication rate. Some research topics are just better suited than others for rapid publication (e.g, when an abundance of data is efficiently accessible, or an existing tool gives competitive advantage).
9/12
The researcher might choose to artificially inflate the quantity of publications using salami slicing, where a meaningful paper is split into several "least-publishable units". en.wikipedia.org/wiki/Least_pub…
10/12
The researcher might be extremely obsessed with research and publishing, and simply work much harder and longer than anyone else. This could make them better writers, which increases the probability to get their paper accepted.
11/12
Finally, imo, we should not police each other and decide what is a reasonable publication rate (not too much & not to little). Instead, we should facilitate, support, and enable each other to become better researchers. Here is Parnas '07 Call to Action:
Stop the numbers game!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The Devil's Guide by @AndreasZeller is a satirical perspective on how members of our research community are systematically cheating to maximize their publication rate. Let's build a community that doesn't need the Devil's Guide.
The Cherry Shuffle would be identified as a problem in the experimental methodology, assuming peer review is effective. Any comparison of a random algorithm to a baseline requires a statistical assessment of significance and effect size. (3/11)
YES! We need to present our plots on a log-x-scale. Why? mboehme.github.io/paper/FSE20.Em…
Two fuzzers. Both achieve the same coverage eventually. Yet, one performs really well at the beginning while the other performs really well in the long run. (What is a reasonable time budget? 🤔)
Nice! I agree, comparing *time-to-same-coverage* provides more information about fuzzer efficiency than comparing coverage-at-a-given-time.
For my new followers, my research group is interested in techniques that make machines attack other machines with maximal efficiency. All our tools are open-source, so people can use them to identify security bugs before they are exploited.
This is how it all started.
My first technical paper introduced a technique that could, in principle, *prove* that no bug was introduced by a new code commit [ICSE'13]. This was also the first of several symbolic execution-based whitebox fuzzers [FSE'13, ASE'16, ICSE'20].
Yet, something was amiss. Even a simple random input generator could outperform my most effective whitebox fuzzer if it generated inputs fast enough. To understand why, we modelled fuzzing as a sampling process and proved some bounds [FSE'14, TSE'15].
Kostya's keynote: LibFuzzer hasn't found new bugs in <big software companie>'s library. We didn't know why. Later we got a note that they are now using LibFuzzer during regression testing in CI and that it prevented 3 vulns from reaching to production.
In Chrome, libFuzzer found 4k bugs and 800 vulns. In OSS-Fuzz, libFuzzer found 2.4k bugs (AFL found 500 bugs) over the last three years.