Hacking an initial retrieval-augmented LLM app is super easy: take some documents, chunk it up, put it in a vector db.
But thinking about prod data reqs makes this more challenging.
It’s one thing to build a demo over 5 docs. What about over GB’s of data over different sources?
Some questions:
How do we store source Documents? Once we split it, how do we store text chunks?
How do we store metadata? Including indices on your data?
How do we store vectors with vector db’s?
The new release of @gpt_index (0.6.0) takes a stab at addressing this:
- We define an underlying KV store abstraction
- We can store Nodes (raw data chunks) and indices in KV store
- In parallel, we maintain vector store abstractions
There is now a vector ecosystem of vector db providers. Many vector db’s (e.g. @pinecone, @trychroma, @weaviate_io), allow storage of both vectors and docs.
For now they’re sep from our docstore; we have a TODO to explore similarities.
A key concept is to decouple the raw data from the indexes that we define at the top-level.
An Index in @gpt_index is just a lightweight view over your data, each solving a diff retrieval use case.
You can/should define multiple indices over your data.
Interested in contributing? We’d LOVE to have your help in building way more document store abstractions: S3, GCS, HDFS, and more.
LlamaIndex now makes it super easy for you to define custom retrieval for LLM’s 💡
Hybrid search is a popular extension of semantic search; let’s walk through an example of how you can define your *own* (simplified) hybrid search with @gpt_index 👇
Way too much for one tweet thread but we’ll try! 🧵
[1] There are BIG changes in the following core areas:
- 🔍Decoupling state from compute: separate index (state) from retriever/query (compute)
- 🧱Progressive Disclosure of Complexity: high-level API -> low-level API
- 🫙Principled Storage Abstractions
[2] Decoupling state from compute:
- An index manages state: abstracts away underlying storage, exposes view over processed data
- A Retriever fetches Nodes from an index
- A QueryEngine can synthesize a response from Nodes
The paper proposes two strategies for structured extraction:
- ➡️Evaporate-Direct: LLM directly extracts values from docs (similar to @gpt_index SQL support)
- 🤖Evaporate-Code: LLM synthesizes fn, applies it to docs at scale
We implement a super basic version of Evaporate-Code in @gpt_index, with following steps from paper:
1. Schema Identification: extract attributes from docs 2. Function Synthesis: Given attributes, synthesize functions 3. Run functions across docs to get structured data
Each actor is represented by a simple “ConvoAgent” class, containing short-term and long-term memory.
Long-term memory uses our vector index. Short-term memory is just a deque.
Each agent can 1) store incoming messages 🗃️, and 2) generate messages 🗣️
@gpt_index makes this really easy to do with the following:
- Abstractions for storing/querying long-term memory
- Synthesize short-term and long-term memory *without* worrying about context limits.
- Easily customizable prompts to present different settings
@gpt_index makes AGI projects straightforward to build. 🧵
At a high-level, @gpt_index is a great tool for AGI dev 🛠️
📕 Index a knowledge corpus
✅ Index a set of tasks
🧠 Use it as a memory module
📝 Synthesize response from data.
Here’s an example w/ llama_agi. Suppose you want to set a goal: “What steps can I take to live longer?”
llama_agi can repeatedly reason/store tasks until it’s complete!
Sample tasks: develop a plan for medical advice, research effects of diet, research effects of stress, etc.