Kalpesh Krishna Profile picture
May 24, 2022 8 tweets 5 min read Read on X
Excited to share RankGen, a 1.2B param contrastive encoder mapping prefixes & model generations to vectors.

✅ large improvements over nucleus/typical sampling
✅ score & rank generations from any LM
✅ human eval with writers
✅ HuggingFace ckpts, code👇
arxiv.org/abs/2205.09726 Image
Despite great progress, text generation continues to underperform. Even large LMs generate text that has hallucination, poor continuity etc.

Part of the issue is LMs are trained to predict the next one token given the ground truth prefix, encouraging reliance on local context. Image
To tackle this we build RankGen, which map prefixes close to their gold continuation, but away from other continuations in the same document, as well as model-generations from a large LM.

We train RankGen using large-scale contrastive learning with minibatch size of 3K. Image
Since RankGen considers the relationship between two long sequences, it learns non-local dependencies well.

During inference, RankGen can be efficiently incorporated in a beam-search setup with any large LM. We open-source checkpoints & code compatible with HuggingFace LMs. Image
We evaluate RankGen using MAUVE and compare it against several other decoding strategies like greedy decoding, nucleus sampling, and typical sampling, and other re-ranking methods.

RankGen significantly outperforms all baselines (85.0 vs 77.3 MAUVE). For human evaluation... Image
We hire English writers and teachers on Upwork to evaluate our models in A/B tests, and ask for a short explanation for their preference.

Writers prefer outputs from RankGen 74.5% of the time over nucleus sampling, mentioning improvements in relevance and continuity to prefix. ImageImage
Besides text generation, we find that RankGen is a strong zero-shot retriever. RankGen achieves new state-of-the-art results in two recent literary retrieval benchmarks (RELiC & ChapterBreak).

Retrieval-augmented generation with RankGen is a promising direction for future work. Image
This work was done as a student researcher in @GoogleAI with @johnwieting2, @MohitIyyer, @YapeiChang.

code / HuggingFace checkpoints: github.com/martiansideoft…

I’m currently at #acl2022nlp in Dublin! Come chat with me if you’d like to know more! (or DM / email me) #NLProc

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Kalpesh Krishna

Kalpesh Krishna Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(