Anna Ivanova Profile picture
Dec 6 13 tweets 5 min read
My co-lead @KaufCarina and I present: an in-depth investigation of event plausibility judgments in language models.

A 🧵 1/

arxiv.org/abs/2212.01488
Knowledge of event schemas is a vital component of world knowledge. How much of it can be acquired from text corpora via the word-in-context prediction objective?

2/
We test this Q using a controlled minimal pairs paradigm, with simple event descriptions. We manipulate plausibility by swapping the agent and the patient (The teacher bought the laptop / The laptop bought the teacher) or changing the patient (The actor won the award/battle). 3/
LLMs are almost perfect when assigning likelihood to possible vs. impossible events (The teacher bought the laptop / The laptop bought the teacher) but aren’t as good when it comes to likely vs. unlikely events (The nanny tutored the boy / The boy tutored the nanny).

4/ Image
In follow-up tests, we show that
- LLM scores depend both on plausibility and surface-level factors like word frequency (meaning that distributions for plausible and implausible sentences are highly overlapping)

5/ Image
- LLMs generalize very well between active and passive versions of the same sentence BUT not as well as humans for synonymous sentences (The teacher bought the laptop / The instructor purchased the computer).

6/ Image
- explicit plausibility information emerges in mid LLM layers and then stays high
- as in beh tests, animate-inanimate (AI; impossible) events are easier than animate-animate non-reversible (AAN; unlikely) events

7/ Image
Minor but fun:
- a probe trained on both active and passive voice sentences is as successful as a within-voice probe (but a probe trained on only one voice type fails to generalize, especially in early layers)

8/ Image
Check out the paper for an interpretation of these results, including a discussion of selectional restrictions, reporter bias, and more!

9/
This is an international collaboration brought together by @ev_fedorenko and @AlexLenci1966, with vital contributions from @g_rambelli and @EmmanueleChers1 (and our undergrads Selena She & Zawad Chowdhury).

10/
It's been a crazy run. We started in early 2020, had zoom (or skype?) calls during the 2020 covid lockdowns & have coordinated many meetings between Boston, Italy, Hong Kong, and sometimes Germany and Russia. Glad the project has finally come to fruition!

11/end
Bonus: a preliminary exploration of ChatGPT responses shows that it might also have an impossible-implausible gap (although a more detailed investigation is of course needed).

12/11 ImageImageImageImage
This is not an #EMNLP2022 submission but tagging this crowd anyways :D

arxiv.org/abs/2212.01488

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Anna Ivanova

Anna Ivanova Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(