The fact that there's been total silence from the @aclmeeting channel about the wave of technical issues that has derailed much of the conference is pretty bizarre. Especially when there's no other working channel for technical support or rapid updates.
For anyone tempted to boycott future ACL events, be reassured (?) that this event is put together almost from scratch by an almost-entirely-new group of volunteer researchers every year for some reason.
So, (i) I don't envy the Virtual Infrastructure Committee right now—they're presumably just figuring this out themselves—and (ii) they won't have this job next time.
Can we please hire some professionals next time (beyond our one permanent staffer, whose job doesn't include this kind of event planning)? Or at least have longer terms of service for these roles?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Neat negative result spotted at #ACL2021:
I've seen a number of efforts that try to use MNLI models to do other classification tasks by checking whether the input entails statements like 'this is a negative review'. (1/...)
This never really made sense. The data collection process behind SNLI/MNLI was meant to capture the relationship between two things that the same speaker could have said in the same situation.
That means strings like 'the text' or 'the premise' or 'the author' are rare in MNLI, and when they appear, they refer to something _that was referred to in the premise_, not to the premise itself or to its author.
In my experience with *ACL events, reviewer and AC expectations don't differ in any significant or predictable way across tracks. (Plus, many other AI/ML conferences don't use tracks, and it doesn't seem like the dynamics at these conferences are meaningfully different.)
So, adding/removing/renaming tracks doesn't, on its own, seem likely to make any predictable change in outcomes.
We need more of this ethically serious, academically careful discussion of what we're doing.
(I'm on record arguing against model-in-the-loop data collection for test sets, which Chris metions, and I still think it's incompatible to the goals stated here, but the Dyna*Board*-style leaderboard design that Chris focuses on more directly in the talk is important/exciting.)
I've been thinking a lot lately about what we can do to keep pushing progress on language understanding once we start to reach the scaling limits of self-supervised pretraining... (🚨 new paper, thread 🚨)
Grounding and embodiment are obviously one promising direction, but there's a lot that will be difficult or impossible to learn that way under anything that resembles current technology.
How about we just create or use *annotated* data to teach our models the skills they aren't already learning well through pretraining? We already know that this works, but we don't know much about when or why...