Most recents (4)

@kalpeshk2011

Excited to share RankGen, a 1.2B param contrastive encoder mapping prefixes & model generations to vectors.

✅ large improvements over nucleus/typical sampling
✅ score & rank generations from any LM
✅ human eval with writers
✅ HuggingFace ckpts, code👇
arxiv.org/abs/2205.09726

Despite great progress, text generation continues to underperform. Even large LMs generate text that has hallucination, poor continuity etc.

Part of the issue is LMs are trained to predict the next one token given the ground truth prefix, encouraging reliance on local context.

To tackle this we build RankGen, which map prefixes close to their gold continuation, but away from other continuations in the same document, as well as model-generations from a large LM.

We train RankGen using large-scale contrastive learning with minibatch size of 3K.

Read 8 tweets

Julien Launay

@slippylolo

🌸 The @BigScienceLLM BLOOM 176B parameters model training has just passed 230B tokens: that’s more than a million books in two months!

🤔 But how did we decide what model to train with our one million GPU hours?

⬇️ Thread time! #acl2022nlp

🏅 We had five main considerations: it needed to be proven, scalable, efficient, multilingual, and to exhibit emergent capabilities (e.g. zero-shot generalization)

⏰ At the >100B scale, every inefficiency matters! We can’t afford an unoptimized setup…

@Genci_fr

🤗 Thanks to a generous grant from @Genci_fr on #JeanZay, we had plenty of compute to benchmark our dream architecture.

📈 We ran our experiments with 1.3B models, pretraining on 100-300B tokens, to increase the likelihood our findings would transfer to the final >100B model.

Read 14 tweets

Jessy Lin

@realJessyLin

@dan_fried

How can agents infer what people want from what they say?

In our new paper at #acl2022nlp w/ @dan_fried, Dan Klein, and @ancadianadragan, we learn preferences from language by reasoning about how people communicate in context.

Paper: arxiv.org/abs/2204.02515
[1/n]

@dan_fried

@dan_fried @ancadianadragan We’d like AI agents that not only follow our instructions (“book this flight”), but learn to generalize to what to do in new contexts (know what flights I prefer from our past interactions and book on my behalf) — i.e., learn *rewards* from language. [2/n]

@dan_fried

@dan_fried @ancadianadragan The challenge is that language only reveals partial, context-dependent information about our goals and preferences (when I tell a flight booking agent I want “the jetblue flight,” I don’t mean I always want a jetblue flight — just in this particular case!). [3/n]

Read 8 tweets

Clémentine Fourrier 🍊

@clefourrier

#acl2022nlp
What happens inside a multilingual neural cognate prediction model?
We show that predicting cognates between current Romance languages latently teaches the model about their proto-forms, allowing reconstruction without fine-tuning encoders on the task!🧵

In layman's terms, learning to predict special words (cognates) between related languages (French, Italian, Spanish, Portuguese, Galician, Catalan, Occitan, Romanian, and Aromanian) gives the model 'intuitive' knowledge about their parent, Latin!

How? We have no idea! The model does learn a phonetic "language model" latently (similar phones appear close to one another across languages), but not phonotactic information (sound order), apparently?

Read 7 tweets

Discover and read the best of Twitter Threads about #acl2022nlp

Most recents (4)

Related hashtags

Discover and read the best of Twitter Threads about #acl2022nlp

Most recents (4)

Related hashtags

Did Thread Reader help you today?