@nlpnoah: NLP research and practice ask fundamentally different questions
/1
@nlpnoah: NLP practice asks whether X improves the outcome. NLP research tries to fill in the gaps in the knowledge map.
/2
@nlpnoah: Leaderboards are the dominant frame for presenting research findings. That frame by its very nature puts the winner at the top, and un-focuses all of this:
/3
@nlpnoah: let's admit to ourselves that "sota" is a verb. and it should be lowercased.
/4
@nlpnoah: depressingly many parallels between leaderboard-driven research and competitive sports
/5
@nlpnoah The very notion of "negative results" presupposes that very same sports-like frame! Useful as the leaderboards are, they are not the only thing we need.
(with follow-up comment from Margot Mieskes: the Insights workshop should be renamed!)
/6
@nlpnoah: here's some alternative frames that might be useful in NLP research.
/7
@nlpnoah: a bonus from focusing on gaining knowledge and not on sota-ing is improved mental health. If you're trying to answer a question, whatever answer you get is a result.
/8
Question: in an exploding field the simple leaderboard frame is partly a coping mechanism for the authors to try to reach a broader audience. @nlpnoah: NLP community is not all that homogeneous. Let's be brave, non-mainstream papers may find a wider audience than we think.
/9
Question: one reason for leaderboards is to enable people to easily compare with prior work. How about we just publish multiple metrics, for maximal reach to future work? @nlpnoah: Might work. We could also release raw system outputs.
/10
• • •
Missing some Tweet in this thread? You can try to
force a refresh
#EMNLP2021 ends, but the Insights for Negative Results are coming tomorrow! The workshop is hybrid: virtual posters, talks by/for a mix of on-site & online speakers & attendees. Hosts: @JoaoSedoc@shabnamt1@arumshisky@annargrs
Really proud of the program this year🧵:
8:45 Opening remarks
9:00 🗣️ Invited talk by Bonnie Webber: The Reviewers & the Reviewed: Institutional Memory & Institutional Incentives
A highlight from #EMNLP2021 fascinating keynote by @StevenBird:
NLP often comes with a set of assumptions about what are the needs of communities with low-resource languages. But we need to learn what they *actually* need, they may have a completely different epistemology.
/1
AR: this is such a thought-provoking talk, pointing at the missing bridges between language tech and social sciences, esp. anthropology. As a computational linguist lucky to spend a year in @CPH_SODAS - I still don't think I even see the depth of everything we're missing.
/2
An audience question (@bonadossou from @MasakhaneNLP?): how do we increase the volume of NLP research on low-resource languages when such work is not as incentivized? @StevenBird: keep submitting. I've had many rejections. Theme track for ACL2022 will be language diversity.
/3
It can't even work, since peer review is only reliable for the clearly bad papers. Decisions on borderline papers are as good as random. This won't "raise the bar", it'll only reinforce the AC/SAC preferences. And likely improve the chances for preprinted papers by famous ppl.
/2
TLDR: In its current form, peer review is a poorly defined task with apples-to-oranges comparisons and unrealistic expectations. /1
Reviewers resort to heuristics such as reject-if-not-SOTA to cope with uncertainty, so the only way to change that is to reduce uncertainty. Which is at least partly doable: better paper-reviewer matching, unambiguous eval criteria, fine-grained tracks, better review forms etc /2
Which criteria and forms, exactly? Each field has to find out for itself, through iterative development and experiments. Except that in NLP such work would be hard to publish, so there are no incentives to do it - and no mechanisms to test and compare any solutions. /3
TLDR for those who missed the prior discussion: non-anonymous preprints systematically disadvantage the unknown labs and/or underrepresented communities.
My previous post: hackingsemantics.xyz/2020/anonymity/ /1
To summarize both posts, we have the following trade-off for the unknown/underrepresented authors:
* anonymous preprints: better acceptance chance;
* arXiv: lower acceptance chance, but more chances to try to promote unpublished work and get invited for talks and interviews.
/3