Latest Twitter Threads by @jobergum on Thread Reader App

Apr 8 • 10 tweets • 2 min read

Yup, current search infrastructure assumes human users, optimizing for short keyword queries and millisecond response times. AI agents have fundamentally different retrieval patterns.

My thoughts on this shift in a thread 1/

https://twitter.com/lennypruss/status/1909652555677737085

AI agents execute complex, multi-step retrieval workflows. They formulate detailed questions, analyze intermediate results, refine their approach, and synthesize information across disparate sources—a far cry from the traditional single turn query=>response pattern.

Nov 19, 2024 • 8 tweets • 3 min read

🚀 The Rise of Vision RAG!

Launching a complete RAG app that you can deploy to production in minutes!

- Hybrid fusion of ColPali + BM25 with @vespaengine
- Gemini 1.5 Flash-8B
- FastHTML frontend
- Runs on Huggingface Spaces

Interpretable SERP with snippets + patch highlights!

RAG with ColPali doesn't need to be sluggish.

Huge s/o to the team that built it @thomas_thoresen @andreer @ldalves

Please give it a try over at @huggingface spaces huggingface.co/spaces/vespa-e…

Also, read this comprehensive blog post by the team blog.vespa.ai/visual-rag-in-…

Sep 7, 2024 • 6 tweets • 2 min read

A new Vespa + ColPali notebook just dropped!

We demonstrate how to scale ColPali (and MaxSim) to large collections of PDF pages.

- HNSW index over binary patch embeddings
- Efficient candidate retrieval over HNSW
- Set of re-ranking steps from coarse to finer precision implementing late-interaction MaxSim

github.com/vespa-engine/p…

Colab hot link colab.research.google.com/github/vespa-e…

Jul 15, 2024 • 4 tweets • 2 min read

ColPali is probably one of the most significant innovations in complex document retrieval.

Visual embeddings allow the inclusion of complex elements such as charts, tables, and figures without the complexity of OCR and ad-hoc extraction routines.

In this blog, I demonstrate how to use ColPali with @vespaengine.

blog.vespa.ai/retrieval-with… If you want to jump straight to the notebook.

pyvespa.readthedocs.io/en/latest/exam…

May 12, 2024 • 6 tweets • 3 min read

Great list of low-hanging fruit to improve RAG.

Allow me to do a short thread on how to address some of these with @vespaengine

https://twitter.com/jxnlco/status/1789364394704380391

1) Synthetic data can be used both for evaluation and to improve the retrieval function with fine-tuning. In this example, we trained a cross-encoder model using synthetic data. blog.vespa.ai/improving-text…

Apr 27, 2024 • 4 tweets • 1 min read

Hot take but it’s never been a time where it has been easier to bootstrap good quality web scale search.

Yes, it costs infrastructure but never has there been a better opportunity to take on G in quality.

https://twitter.com/vboykis/status/1784215456552784278

Largely due to LLMs and synthetic data, you can get decent text retrieval performance. Then your largest problem is going to be infrastructure cost, crawling and dealing with spam which is also easier now. Crawling is the hardest problem where there are few good alternatives.

Apr 27, 2023 • 6 tweets • 2 min read

Tensor and vector databases will replace most legacy databases in this decade. A disruption fueled by natural language interfaces and deep neural representations. In other words:

Natural query languages (NQL) replace the lstructured query language (SQL). How developers interface with data through these structured query languages will become legacy. Most of the new data created is unstructured. Text, images, video, sensory data. All data that we can derive meaning from using deep neural representations.

Dec 21, 2022 • 5 tweets • 1 min read

The effectiveness versus efficiency debate in the IR community is interesting, but it also misses that most search deployments are tiny. Most have fewer than 10M documents and less than 10 QPS. The deployment cost is a small tiny fraction of engineering costs. For example, many criticize ColBERT v1 because of the storage footprint with a vector per token. That is not a significant cost driver for small-sized deployments.

Mar 17, 2022 • 11 tweets • 5 min read

We have updated our news search and recommendation tutorial for Vespa.ai - A small thread to highlight some awesome Vespa features which are demonstrated in the tutorial 🧵1/10 The first part is basically kicking the tires and indexing a hello world example. All examples use the awesome new vespa-CLI tool which allows deploying locally using docker, or in Vespa cloud. 🧵2/10

docs.vespa.ai/en/tutorials/n…

Jan 26, 2022 • 5 tweets • 2 min read

Alternative framing

We introduce this dense model which can be used unsupervised, don't worry that it's worse than plain BM25 in that setting, but hey we beat some other dense model in an unsupervised setting.

https://twitter.com/OpenAI/status/1486047258499948544

And the large model has 6B params and embedding dim 4096😂

Jan 14, 2022 • 23 tweets • 6 min read

I don't understand why organizations use Elasticsearch for search in 2022 if their business depends on search ranking quality. I get the analytics part with the ELK stack, but search ranking is beyond me. So let me explain my take in a short thread 🧵 It's not elastic in any sensible way, so the name Elasticsearch is highly inaccurate. First, you need to determine the number of shards to partition the data volume. Changing shard count cannot be done in place, so you need to size a new index. 2/🧵

Dec 30, 2021 • 14 tweets • 4 min read

Tired of 2022 predictions already? Well, I'm sorry.
Here are my 2022 predictions for search, vector search, and NLP in this small thread. 1/14🧵 We will hear a lot more from @quickwit in 2022. Their solution for keyword search over immutable data using inexpensive cloud storage is novel, and I bet they will take a significant piece of the Log Search/Analytic space. 2/🧵

Apr 24, 2021 • 9 tweets • 3 min read

Coming back to this work with my take. A thread 1/N

1) Lexical/sparse BM25 is a must have for any technology which is marketed as a search engine. BM25 provides a strong baseline for zero shot without any fuzz.

https://twitter.com/Nthakur20/status/1384555249688580096

2) Dense retrievers (aka vector search) outperforms BM25 in data rich tasks significantly but generalization suffers when applied out of domain.

Dec 14, 2020 • 13 tweets • 4 min read

I'm thrilled about the features we have added to @vespaengine this year while working from home. 1/n

- Implemented ANN support, allowing fast dense retrieval, e.g using representation models built on PLM models which is blowing up leaderboards on both ranking and qa.

2/n

- Integrated with ONNX-RT so that we can run large PLM models more efficiently and support a wider range of ML models. blog.vespa.ai/stateful-model…

- Integrated a BERT tokenizer so users don't need any python stack or dependencies (unless they want to).

Share this page!

Enter URL or ID to Unroll