There are a ton of techniques for LLM-based QA over your data, each for diff use cases.
- Top-k semantic search
- Summarization
- Temporal recency weighting
- Document comparisons
- etc.
We propose a simple “router” that can unify every technique under a *single* interface 👇🧵
A “router” concept is at the core of an LLM agent/tool abstraction - given a set of tools, pick the best tool for the task.
Having a “router” (or even multiple) can be especially powerful for enhancing LLM retrieval + QA systems!
LlamaIndex now makes it super easy for you to define custom retrieval for LLM’s 💡
Hybrid search is a popular extension of semantic search; let’s walk through an example of how you can define your *own* (simplified) hybrid search with @gpt_index 👇
Way too much for one tweet thread but we’ll try! 🧵
[1] There are BIG changes in the following core areas:
- 🔍Decoupling state from compute: separate index (state) from retriever/query (compute)
- 🧱Progressive Disclosure of Complexity: high-level API -> low-level API
- 🫙Principled Storage Abstractions
[2] Decoupling state from compute:
- An index manages state: abstracts away underlying storage, exposes view over processed data
- A Retriever fetches Nodes from an index
- A QueryEngine can synthesize a response from Nodes
The paper proposes two strategies for structured extraction:
- ➡️Evaporate-Direct: LLM directly extracts values from docs (similar to @gpt_index SQL support)
- 🤖Evaporate-Code: LLM synthesizes fn, applies it to docs at scale
We implement a super basic version of Evaporate-Code in @gpt_index, with following steps from paper:
1. Schema Identification: extract attributes from docs 2. Function Synthesis: Given attributes, synthesize functions 3. Run functions across docs to get structured data
Each actor is represented by a simple “ConvoAgent” class, containing short-term and long-term memory.
Long-term memory uses our vector index. Short-term memory is just a deque.
Each agent can 1) store incoming messages 🗃️, and 2) generate messages 🗣️
@gpt_index makes this really easy to do with the following:
- Abstractions for storing/querying long-term memory
- Synthesize short-term and long-term memory *without* worrying about context limits.
- Easily customizable prompts to present different settings