Jerry Liu Profile picture
May 3 7 tweets 3 min read Twitter logo Read on Twitter
There are a ton of techniques for LLM-based QA over your data, each for diff use cases.
- Top-k semantic search
- Summarization
- Temporal recency weighting
- Document comparisons
- etc.

We propose a simple “router” that can unify every technique under a *single* interface 👇🧵 Image
A “router” concept is at the core of an LLM agent/tool abstraction - given a set of tools, pick the best tool for the task.

Having a “router” (or even multiple) can be especially powerful for enhancing LLM retrieval + QA systems!

We added this abstraction into @gpt_index. Image
We’ve defined a “RouterQueryEngine” that can take as input a set of underlying query engines.

Each query engine could be defined over a specific view of your data + use case.

E.g. For top-k vector search, for summarization, for doc comparisons.
Each query engine can be treated as a “tool” with a text description attached to it.

You can think of the “RouterQueryEngine” as a zero-shot engine that can reason over different query engines.

During query-time, it can choose the right candidate query engine to execute. Image
Tutorials/Notebooks:

Ask q’s over specific docs, but also compare/contrast documents 🍎🍊

gpt-index.readthedocs.io/en/stable/guid…

Ask both summarization and semantic search q’s 💡

github.com/jerryjliu/llam…
We’ve written about this before.

Composing an index (🌲 tree index, vector index) is one way of defining a “router” to the rest of your data.

But we believe the RouterQueryEngine provides a more general abstraction that should be easier to use.

Last but not least, we put a tutorial on @GoogleColab so it’s easy to access - come check them out!

Unified query interface: colab.research.google.com/drive/1KH8XtRi…

Joint QA / Summarization: colab.research.google.com/drive/1Asq_obA…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jerry Liu

Jerry Liu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jerryjliu0

May 2
At a high-level, retrieval-augmented generation (RAG) involves interaction between retrieval+synthesis over your data.

It’s helpful to think about the base abstractions here (from data to compute), since it helps us generalize RAG beyond existing approaches.

A short 🧵 Image
Some basic background:

A “retrieval” model takes in a natural language input and retrieves texts from the knowledge base

A synthesis model takes in texts and calls the LLM to give some response

The data itself is a set of text chunks (nodes)

An index is a view on this data
Let’s start with the “data” layer. A common paradigm is processing source documents into text chunks using text splitting capabilities.

This data can be stored in a KV store or a vector store.
Read 8 tweets
Apr 30
An implication of designing any LLM app over your data is you’re adding “state” (data) to a “stateless” module (LLM).

Stateful apps are hard, and require good storage abstractions.

We’ve thought hard about this with @gpt_index 🧵 Image
Hacking an initial retrieval-augmented LLM app is super easy: take some documents, chunk it up, put it in a vector db.

But thinking about prod data reqs makes this more challenging.

It’s one thing to build a demo over 5 docs. What about over GB’s of data over different sources?
Some questions:

How do we store source Documents? Once we split it, how do we store text chunks?

How do we store metadata? Including indices on your data?

How do we store vectors with vector db’s?
Read 7 tweets
Apr 29
LlamaIndex now makes it super easy for you to define custom retrieval for LLM’s 💡

Hybrid search is a popular extension of semantic search; let’s walk through an example of how you can define your *own* (simplified) hybrid search with @gpt_index 👇

github.com/jerryjliu/llam…
At its core, hybrid search is a mix of keyword lookup and semantic search.

We show how you can define a custom retriever that can take the intersection of retrieved nodes from the two techniques above.
First, we split a document into nodes, and add the nodes to the docstore.

We then define two indexes over your data: our keyword lookup index, and our vector index.

Note: defining multiple indexes does not duplicate the data. ImageImage
Read 5 tweets
Apr 29
It’s official…LlamaIndex 0.6.0.alpha1 is out. And it’s basically a completely new product 🔥

We fundamentally rewrote two main areas:
- 🔍Query interface
- 🗃️Storage abstractions

Full blog post: medium.com/@jerryjliu98/l…

Way too much for one tweet thread but we’ll try! 🧵
[1] There are BIG changes in the following core areas:
- 🔍Decoupling state from compute: separate index (state) from retriever/query (compute)
- 🧱Progressive Disclosure of Complexity: high-level API -> low-level API
- 🫙Principled Storage Abstractions
[2] Decoupling state from compute:
- An index manages state: abstracts away underlying storage, exposes view over processed data
- A Retriever fetches Nodes from an index
- A QueryEngine can synthesize a response from Nodes Image
Read 9 tweets
Apr 26
Evaporate (@simran_s_arora et al.) is an awesome paper on structured data extraction 🙌

Key insight: “function extraction”; synthesize an “extract” fn using LLM, apply it across data at scale!

arxiv.org/abs/2304.09433

We added an initial module in @gpt_index! 🛠️👇 Image
The paper proposes two strategies for structured extraction:
- ➡️Evaporate-Direct: LLM directly extracts values from docs (similar to @gpt_index SQL support)
- 🤖Evaporate-Code: LLM synthesizes fn, applies it to docs at scale
We implement a super basic version of Evaporate-Code in @gpt_index, with following steps from paper:

1. Schema Identification: extract attributes from docs
2. Function Synthesis: Given attributes, synthesize functions
3. Run functions across docs to get structured data
Read 6 tweets
Apr 24
💬 Conversational Agent Simulations 🤖

Over this weekend I hacked on getting AI agents to talk in different settings:
🥂 First date
🥼 Doctor checkup
🧑‍💻 Software eng interview

Used some core @gpt_index data structs. Check it out on Llama Lab 🧪! 👇

github.com/run-llama/llam…
Each actor is represented by a simple “ConvoAgent” class, containing short-term and long-term memory.

Long-term memory uses our vector index. Short-term memory is just a deque.

Each agent can 1) store incoming messages 🗃️, and 2) generate messages 🗣️
@gpt_index makes this really easy to do with the following:
- Abstractions for storing/querying long-term memory
- Synthesize short-term and long-term memory *without* worrying about context limits.
- Easily customizable prompts to present different settings
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(