Discover and read the best of Twitter Threads about #acl2022nlp

Most recents (4)

Excited to share RankGen, a 1.2B param contrastive encoder mapping prefixes & model generations to vectors.

✅ large improvements over nucleus/typical sampling
✅ score & rank generations from any LM
✅ human eval with writers
✅ HuggingFace ckpts, code👇
arxiv.org/abs/2205.09726 Image
Despite great progress, text generation continues to underperform. Even large LMs generate text that has hallucination, poor continuity etc.

Part of the issue is LMs are trained to predict the next one token given the ground truth prefix, encouraging reliance on local context. Image
To tackle this we build RankGen, which map prefixes close to their gold continuation, but away from other continuations in the same document, as well as model-generations from a large LM.

We train RankGen using large-scale contrastive learning with minibatch size of 3K. Image
Read 8 tweets
🌸 The @BigScienceLLM BLOOM 176B parameters model training has just passed 230B tokens: that’s more than a million books in two months!

🤔 But how did we decide what model to train with our one million GPU hours?

⬇️ Thread time! #acl2022nlp
🏅 We had five main considerations: it needed to be proven, scalable, efficient, multilingual, and to exhibit emergent capabilities (e.g. zero-shot generalization)

⏰ At the >100B scale, every inefficiency matters! We can’t afford an unoptimized setup…
🤗 Thanks to a generous grant from @Genci_fr on #JeanZay, we had plenty of compute to benchmark our dream architecture.

📈 We ran our experiments with 1.3B models, pretraining on 100-300B tokens, to increase the likelihood our findings would transfer to the final >100B model.
Read 14 tweets
How can agents infer what people want from what they say?

In our new paper at #acl2022nlp w/ @dan_fried, Dan Klein, and @ancadianadragan, we learn preferences from language by reasoning about how people communicate in context.

Paper: arxiv.org/abs/2204.02515
[1/n]
@dan_fried @ancadianadragan We’d like AI agents that not only follow our instructions (“book this flight”), but learn to generalize to what to do in new contexts (know what flights I prefer from our past interactions and book on my behalf) — i.e., learn *rewards* from language. [2/n]
@dan_fried @ancadianadragan The challenge is that language only reveals partial, context-dependent information about our goals and preferences (when I tell a flight booking agent I want “the jetblue flight,” I don’t mean I always want a jetblue flight — just in this particular case!). [3/n]
Read 8 tweets
#acl2022nlp
What happens inside a multilingual neural cognate prediction model?
We show that predicting cognates between current Romance languages latently teaches the model about their proto-forms, allowing reconstruction without fine-tuning encoders on the task!🧵
In layman's terms, learning to predict special words (cognates) between related languages (French, Italian, Spanish, Portuguese, Galician, Catalan, Occitan, Romanian, and Aromanian) gives the model 'intuitive' knowledge about their parent, Latin!
How? We have no idea! The model does learn a phonetic "language model" latently (similar phones appear close to one another across languages), but not phonotactic information (sound order), apparently?
Read 7 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!