elvis Profile picture
Apr 30 7 tweets 3 min read Read on X
Universal RAG

RAG is dead, they said.

Then you see papers like this and it gives you a better understanding of the opportunities and challenges ahead.

Lots of great ideas in this paper. I've summarized a few below: Image
What is it?

UniversalRAG is a framework that overcomes the limitations of existing RAG systems confined to single modalities or corpora. It supports retrieval across modalities (text, image, video) and at multiple granularities (e.g., paragraph vs. document, clip vs. video).
Modality-aware routing

To counter modality bias in unified embedding spaces (where queries often retrieve same-modality results regardless of relevance), UniversalRAG introduces a router that dynamically selects the appropriate modality (e.g., image vs. text) for each query. Image
Granularity-aware retrieval

Each modality is broken into granularity levels (e.g., paragraphs vs. documents for text, clips vs. full-length videos). This allows queries to retrieve content that matches their complexity -- factual queries use short segments while complex reasoning accesses long-form data​.
Flexible routing

It supports both training-free (zero-shot GPT-4o prompting) and trained (T5-Large) routers. Trained routers perform better on in-domain data, while GPT-4o generalizes better to out-of-domain tasks. An ensemble router combines both for robust performance.
Performance

UniversalRAG outperforms modality-specific and unified RAG baselines across 8 benchmarks spanning text (e.g., MMLU, SQuAD), image (WebQA), and video (LVBench, VideoRAG). With T5-Large, it achieves the highest average score across modalities​. Image
Case study

In WebQA, UniversalRAG correctly routes a visual query to the image corpus (retrieving an actual photo of the event), while TextRAG and VideoRAG fail. Similarly, on HotpotQA and LVBench, it chooses the right granularity, retrieving documents or short clips.

Overall, this is a great paper showing the importance of considering modality and granularity in a RAG system.

Paper:
arxiv.org/abs/2504.20734Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with elvis

elvis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @omarsar0

May 1
Small reasoning models are here!

Microsoft just released Phi-4-Mini-Reasoning to explore small reasoning language models for math.

Let's find out how this all works: Image
Phi-4-Mini-Reasoning

The paper introduces Phi-4-Mini-Reasoning, a 3.8B parameter small language model (SLM) that achieves state-of-the-art mathematical reasoning performance, rivaling or outperforming models nearly TWICE its size. Image
Unlocking Reasoning

They use a systematic, multi-stage training pipeline to unlock strong reasoning capabilities in compact models, addressing the challenges posed by their limited capacity.

Uses large-scale distillation, preference learning, and RL with verifiable rewards. Image
Read 8 tweets
Apr 29
Building Production-Ready AI Agents with Scalable Long-Term Memory

Memory is one of the most challenging bits of building production-ready agentic systems.

Lots of goodies in this paper.

Here is my breakdown: Image
What does it solve?

It proposes a memory-centric architecture for LLM agents to maintain coherence across long conversations and sessions, solving the fixed-context window limitation. Image
The solution:

Introduces two systems: Mem0, a dense, language-based memory system, and Mem0g, an enhanced version with graph-based memory to model complex relationships.

Both aim to extract, consolidate, and retrieve salient facts over time efficiently.
Read 9 tweets
Apr 29
A Survey of Efficient LLM Inference Serving

This one provides a comprehensive taxonomy of recent system-level innovations for efficient LLM inference serving.

Great overview for devs working on inference.

Here is what's included: Image
Instance-Level Methods

Techniques like model parallelism (pipeline, tensor, context, and expert parallelism), offloading (e.g., ZeRO-Offload, FlexGen, TwinPilots), and request scheduling (inter- and intra-request) are reviewed... Image
Novel schedulers like FastServe, Prophet, and INFERMAX optimize decoding with predicted request lengths. KV cache optimization covers paging, reuse (lossless and semantic-aware), and compression (e.g., 4-bit quantization, compact encodings).
Read 5 tweets
Apr 27
265 pages of everything you need to know about building AI agents.

5 things that stood out to me about this report: Image
1. Human Brain and LLM Agents

Great to better understand what differentiates LLM agents from human/brain cognition, and what inspirations we can get from the way humans learn and operate. Image
2. Definitions

There is a nice, detailed, and formal definition for what makes up an AI agent. Most of the definitions out there are too abstract. Image
Read 8 tweets
Apr 16
BREAKING: OpenAI introduces new o-series models

o3 and o4-mini

OpenAI claims that these are models that can produce novel and useful ideas.

Here is all you need to know: Image
They are rolling out starting today on ChatGPT and APIs.

These reasoning models have gotten better at using internal tooling to solve very complex tasks.

And they are getting way better at it.
These models can navigate large codebases and generate novel ideas.

Tool use makes these models a lot more useful.

The o-series of models is now combined with their full suite of tools.
Read 21 tweets
Apr 9
NEW: Google announces Agent2Agent

Agent2Agent (A2A) is a new open protocol that lets AI agents securely collaborate across ecosystems regardless of framework or vendor.

Here is all you need to know:
Universal agent interoperability

A2A allows agents to communicate, discover each other’s capabilities, negotiate tasks, and collaborate even if built on different platforms. This enables complex enterprise workflows to be handled by a team of specialized agents.
Built for enterprise needs

The protocol supports long-running tasks (e.g., supply chain planning), multimodal collaboration (text, audio, video), and secure identity/auth flows (matching OpenAPI-grade auth). Agents share JSON-based “Agent Cards” for capability discovery, negotiate UI formats, and sync task state with real-time updates.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(