Jo Kristian Bergum Profile picture
Chief Scientist @vespaengine. Tweets about Vespa, search, recommendation, ranking, and IR. CET. #StandWithUkraine 💙💛
Nov 19, 2024 8 tweets 3 min read
🚀 The Rise of Vision RAG!

Launching a complete RAG app that you can deploy to production in minutes!

- Hybrid fusion of ColPali + BM25 with @vespaengine
- Gemini 1.5 Flash-8B
- FastHTML frontend
- Runs on Huggingface Spaces

Interpretable SERP with snippets + patch highlights!

RAG with ColPali doesn't need to be sluggish.

Huge s/o to the team that built it @thomas_thoresen @andreer @ldalves Please give it a try over at @huggingface spaces huggingface.co/spaces/vespa-e…

Also, read this comprehensive blog post by the team blog.vespa.ai/visual-rag-in-…
Sep 7, 2024 6 tweets 2 min read
A new Vespa + ColPali notebook just dropped!

We demonstrate how to scale ColPali (and MaxSim) to large collections of PDF pages.

- HNSW index over binary patch embeddings
- Efficient candidate retrieval over HNSW
- Set of re-ranking steps from coarse to finer precision implementing late-interaction MaxSim

github.com/vespa-engine/p…Image Colab hot link colab.research.google.com/github/vespa-e…
Image
Jul 15, 2024 4 tweets 2 min read
ColPali is probably one of the most significant innovations in complex document retrieval.

Visual embeddings allow the inclusion of complex elements such as charts, tables, and figures without the complexity of OCR and ad-hoc extraction routines.

In this blog, I demonstrate how to use ColPali with @vespaengine.

blog.vespa.ai/retrieval-with… If you want to jump straight to the notebook.

pyvespa.readthedocs.io/en/latest/exam…
Image
May 12, 2024 6 tweets 3 min read
Great list of low-hanging fruit to improve RAG.

Allow me to do a short thread on how to address some of these with @vespaengine 1) Synthetic data can be used both for evaluation and to improve the retrieval function with fine-tuning. In this example, we trained a cross-encoder model using synthetic data. blog.vespa.ai/improving-text…
Apr 27, 2023 6 tweets 2 min read
Tensor and vector databases will replace most legacy databases in this decade. A disruption fueled by natural language interfaces and deep neural representations. In other words:

Natural query languages (NQL) replace the lstructured query language (SQL). How developers interface with data through these structured query languages will become legacy. Most of the new data created is unstructured. Text, images, video, sensory data. All data that we can derive meaning from using deep neural representations.
Dec 21, 2022 5 tweets 1 min read
The effectiveness versus efficiency debate in the IR community is interesting, but it also misses that most search deployments are tiny. Most have fewer than 10M documents and less than 10 QPS. The deployment cost is a small tiny fraction of engineering costs. For example, many criticize ColBERT v1 because of the storage footprint with a vector per token. That is not a significant cost driver for small-sized deployments.
Mar 17, 2022 11 tweets 5 min read
We have updated our news search and recommendation tutorial for Vespa.ai - A small thread to highlight some awesome Vespa features which are demonstrated in the tutorial 🧵1/10 The first part is basically kicking the tires and indexing a hello world example. All examples use the awesome new vespa-CLI tool which allows deploying locally using docker, or in Vespa cloud. 🧵2/10

docs.vespa.ai/en/tutorials/n…
Jan 26, 2022 5 tweets 2 min read
Alternative framing

We introduce this dense model which can be used unsupervised, don't worry that it's worse than plain BM25 in that setting, but hey we beat some other dense model in an unsupervised setting. Image And the large model has 6B params and embedding dim 4096😂
Jan 14, 2022 23 tweets 6 min read
I don't understand why organizations use Elasticsearch for search in 2022 if their business depends on search ranking quality. I get the analytics part with the ELK stack, but search ranking is beyond me. So let me explain my take in a short thread 🧵 It's not elastic in any sensible way, so the name Elasticsearch is highly inaccurate. First, you need to determine the number of shards to partition the data volume. Changing shard count cannot be done in place, so you need to size a new index. 2/🧵
Dec 30, 2021 14 tweets 4 min read
Tired of 2022 predictions already? Well, I'm sorry.
Here are my 2022 predictions for search, vector search, and NLP in this small thread. 1/14🧵 We will hear a lot more from @quickwit in 2022. Their solution for keyword search over immutable data using inexpensive cloud storage is novel, and I bet they will take a significant piece of the Log Search/Analytic space. 2/🧵
Apr 24, 2021 9 tweets 3 min read
Coming back to this work with my take. A thread 1/N

1) Lexical/sparse BM25 is a must have for any technology which is marketed as a search engine. BM25 provides a strong baseline for zero shot without any fuzz. 2) Dense retrievers (aka vector search) outperforms BM25 in data rich tasks significantly but generalization suffers when applied out of domain.
Dec 14, 2020 13 tweets 4 min read
I'm thrilled about the features we have added to @vespaengine this year while working from home. 1/n

- Implemented ANN support, allowing fast dense retrieval, e.g using representation models built on PLM models which is blowing up leaderboards on both ranking and qa. 2/n

- Integrated with ONNX-RT so that we can run large PLM models more efficiently and support a wider range of ML models. blog.vespa.ai/stateful-model…

- Integrated a BERT tokenizer so users don't need any python stack or dependencies (unless they want to).