Tweet

Prof. Sam Bowman

6 Aug, 7 tweets, 2 min read

Neat negative result spotted at #ACL2021:
I've seen a number of efforts that try to use MNLI models to do other classification tasks by checking whether the input entails statements like 'this is a negative review'. (1/...)

This never really made sense. The data collection process behind SNLI/MNLI was meant to capture the relationship between two things that the same speaker could have said in the same situation.

That means strings like 'the text' or 'the premise' or 'the author' are rare in MNLI, and when they appear, they refer to something _that was referred to in the premise_, not to the premise itself or to its author.

So, examples like the one in the screenshot seem broken—that kind of hypothesis doesn't make sense in the context of most presises, and we should expect models to be confused, and to behave erratically.

And they do! Tingting Ma et al. from MSR find that when using this kind of prompting, you'll actually do _better_ if you use a next-sentence-prediction model rather than an MNLI model, ...

... suggesting that the limited successes we've seen here have to do with a vague ability for BERT-style models to recognize topic/style similarity, and that the specific abilities that a model learns from MNLI aren't really helping. The world makes sense!

Conference-internal link: underline.io/events/167/pos…
Paper link: aclanthology.org/2021.acl-short…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @sleepinyourhat

Prof. Sam Bowman

@sleepinyourhat

4 Aug

@aclmeeting

The fact that there's been total silence from the @aclmeeting channel about the wave of technical issues that has derailed much of the conference is pretty bizarre. Especially when there's no other working channel for technical support or rapid updates.

For anyone tempted to boycott future ACL events, be reassured (?) that this event is put together almost from scratch by an almost-entirely-new group of volunteer researchers every year for some reason.

So, (i) I don't envy the Virtual Infrastructure Committee right now—they're presumably just figuring this out themselves—and (ii) they won't have this job next time.

Read 4 tweets

Prof. Sam Bowman

@sleepinyourhat

3 Aug

https://twitter.com/annargrs/status/1422581519483383820

I was surprised to see so much discussion of the boundaries of paper submission 'tracks' at this #ACL2021 panel and the business meeting. (1/?)

https://twitter.com/annargrs/status/1422581519483383820

In my experience with *ACL events, reviewer and AC expectations don't differ in any significant or predictable way across tracks. (Plus, many other AI/ML conferences don't use tracks, and it doesn't seem like the dynamics at these conferences are meaningfully different.)

So, adding/removing/renaming tracks doesn't, on its own, seem likely to make any predictable change in outcomes.

Read 7 tweets

Prof. Sam Bowman

@sleepinyourhat

3 Aug

https://twitter.com/ChrisGPotts/status/1421634947312340995

This talk is really good. (And you don't have to be registered at ACL to attend.)

https://twitter.com/ChrisGPotts/status/1421634947312340995

We need more of this ethically serious, academically careful discussion of what we're doing.

(I'm on record arguing against model-in-the-loop data collection for test sets, which Chris metions, and I still think it's incompatible to the goals stated here, but the Dyna*Board*-style leaderboard design that Chris focuses on more directly in the talk is important/exciting.)

Read 4 tweets

Prof. Sam Bowman

@sleepinyourhat

5 May 20

I've been thinking a lot lately about what we can do to keep pushing progress on language understanding once we start to reach the scaling limits of self-supervised pretraining... (🚨 new paper, thread 🚨)

Grounding and embodiment are obviously one promising direction, but there's a lot that will be difficult or impossible to learn that way under anything that resembles current technology.

How about we just create or use *annotated* data to teach our models the skills they aren't already learning well through pretraining? We already know that this works, but we don't know much about when or why...

Read 17 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Prof. Sam Bowman

Try unrolling a thread yourself!

More from @sleepinyourhat

Prof. Sam Bowman

Prof. Sam Bowman

Prof. Sam Bowman

Prof. Sam Bowman

Did Thread Reader help you today?

Like this author's thread?