Post

Kalpesh Krishna

@kalpeshk2011

May 24, 2022 • 8 tweets • 5 min read • Read on X

Scrolly

Excited to share RankGen, a 1.2B param contrastive encoder mapping prefixes & model generations to vectors.

✅ large improvements over nucleus/typical sampling
✅ score & rank generations from any LM
✅ human eval with writers
✅ HuggingFace ckpts, code👇
arxiv.org/abs/2205.09726

Despite great progress, text generation continues to underperform. Even large LMs generate text that has hallucination, poor continuity etc.

Part of the issue is LMs are trained to predict the next one token given the ground truth prefix, encouraging reliance on local context.

To tackle this we build RankGen, which map prefixes close to their gold continuation, but away from other continuations in the same document, as well as model-generations from a large LM.

We train RankGen using large-scale contrastive learning with minibatch size of 3K.

Since RankGen considers the relationship between two long sequences, it learns non-local dependencies well.

During inference, RankGen can be efficiently incorporated in a beam-search setup with any large LM. We open-source checkpoints & code compatible with HuggingFace LMs.

We evaluate RankGen using MAUVE and compare it against several other decoding strategies like greedy decoding, nucleus sampling, and typical sampling, and other re-ranking methods.

RankGen significantly outperforms all baselines (85.0 vs 77.3 MAUVE). For human evaluation...

We hire English writers and teachers on Upwork to evaluate our models in A/B tests, and ask for a short explanation for their preference.

Writers prefer outputs from RankGen 74.5% of the time over nucleus sampling, mentioning improvements in relevance and continuity to prefix.

Besides text generation, we find that RankGen is a strong zero-shot retriever. RankGen achieves new state-of-the-art results in two recent literary retrieval benchmarks (RELiC & ChapterBreak).

Retrieval-augmented generation with RankGen is a promising direction for future work.

@GoogleAI

This work was done as a student researcher in @GoogleAI with @johnwieting2, @MohitIyyer, @YapeiChang.

code / HuggingFace checkpoints: github.com/martiansideoft…

I’m currently at #acl2022nlp in Dublin! Come chat with me if you’d like to know more! (or DM / email me) #NLProc

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Kalpesh Krishna

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!