Dense retrieval models (e.g. DPR) achieve SOTA on various datasets. Does this really mean dense models are better than sparse models (e.g. BM25)?
No! Our #EMNLP2021 paper shows dense retrievers even fail to answer simple entity-centric questions.

arxiv.org/abs/2109.08535 (1/6) Image
We construct EntityQuestions, consisting of simple, entity-rich questions such as “Where was Arve Furset born?”. We find dense retrieval models drastically underperform sparse models! (2/6)
We decouple the two distinct aspects of these questions: the entities and the question patterns. We find that dense retrieval models can only generalize to common entities or the question patterns that have been observed during training. (3/6) Image
We fine-tune DPR on these simple questions and find updating the passage encoder is particularly crucial to get good results. Our visualization also shows that gold passage vectors for these questions are clustered together, so that it is difficult to discriminate them. (4/6) Image
We study two simple techniques aiming at fixing the issue. We find (1) data augmentation is unable to consistently improve performance on new questions; (2) fixing a robust passage index and specializing question encoder leads to memory-efficient transfer to new domains. (5/6) Image
Please check out our paper for details!
The code/dataset is available on GitHub: github.com/princeton-nlp/…

Joint work with @cdsciavolino, @leejnhk, @danqi_chen
(6/6)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Zexuan Zhong

Zexuan Zhong Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(