Dumping updates from I/O in this 🧵
Gemini 1.5 flash - faster Gemini with 1M context
Nov 7, 2023 • 7 tweets • 2 min read
I’ve seen a few threads on “Do we still need LlamaIndex / framework” given @OpenAI dev day updates
Short answer: Yes.
Some high-level takes addressing two points 🧵: 1) why vendor-agnostic LLM orchestration, and 2) how good is OpenAI API
[1] Why LLM orchestration
If you’re completely locked into a single ecosystem (e.g. OpenAI, or Anthropic), there’s less need for orchestration.
But the whole point of an orchestration framework is that you can easily combine any modules w/o writing a ton of boilerplate.
Oct 5, 2023 • 4 tweets • 2 min read
I’m excited for @OpenAI’s new support for function calling fine-tuning! (@stevenheidel)
Help gpt-3.5 better structure outputs + reason/plan 🤖
Dropping a day 0 release of supporting fn fine-tuning + distilling GPT-4 w/ Pydantic in @llama_index ⚡️👇: github.com/run-llama/llam…
Our default way of using @OpenAI function calling is through our pydantic programs: simply specify the pydantic schema, and we’re use the endpoint to extract a structured output with that schema.
We can now log these results and collect them as a dataset.
Sep 13, 2023 • 5 tweets • 2 min read
Fine-tuning embeddings is a great way to improve RAG performance - it’s even better if you can freeze your doc embeddings ❄️
We now allow you to fine-tune an arbitrary transformation to your queries: linear, deep NN, or custom!
Full guide showing all 3: gpt-index.readthedocs.io/en/latest/exam…
For instance, we can fine-tune a 2-layer neural net that takes in the query embedding as input and outputs a transformed embedding.
The best part is that this will work on top of any black-box embedding model; anything from sentence_transformers to text-embedding-ada-002.
Sep 4, 2023 • 4 tweets • 2 min read
Here’s a simple trick to improve retrieval for RAG 💡:
Embed “references” to each text chunk instead of the chunk itself (e.g. smaller chunks, summaries). Leads to ~10-20% improvement 📈
Possible with @llama_index recursive retrieval. 💫
Full guide 📗: gpt-index.readthedocs.io/en/latest/exam…
The core intuition is related to the idea of decoupling embeddings from the raw text chunks (we’ve tweeted about this).
By embedding smaller chunks, summaries, or questions, we can first fetch relevant references -> then fetch original chunk for LLM synthesis.
Sep 3, 2023 • 10 tweets • 4 min read
We fine-tuned a gpt-3.5 ReAct agent to be better at chain-of-thought 💭
A big gripe with gpt-3.5-turbo is that its reasoning is worse than gpt-4, causing unreliable agent behavior 🤔
By distilling gpt-4, these agents can do better search/retrieval 🔥🧵
Example Q: “What is the total fair value of Uber's financial assets as of March 31, 2022?”
gpt-3.5 (left) returns an inaccurate response. Finetuning (right) keeps CoT going until it finds the actual answer.
Aug 25, 2023 • 5 tweets • 2 min read
One major way to improve your RAG system is to fine-tune your embedding model ⚙️
We’ve created a full repo/guide (@disiok) on fine-tuning embeddings over any unstructured text (no labels needed) 🌟
5-10% improvement 📈 in evals + runs on your MacBook!
github.com/run-llama/fine…
This is motivated from the insight that using pre-trained embedding models may not be suited for your specific retrieval task.
Finetuning is a way to solve that - train it over positive example text pairs over your data domain.
Aug 23, 2023 • 5 tweets • 2 min read
We successfully made gpt-3.5-turbo output GPT-4 quality responses in an e2e RAG system 🔥
Stack: automated training dataset creation in @llama_index + new @OpenAI finetuning + ragas (@Shahules786) eval
We did this in a day ⚡️
Full Colab notebook here: colab.research.google.com/drive/1vWeJBXd…
The key intuition : gpt-3.5-turbo (even after fine-tuning) is much cheaper than GPT-4 on a marginal token basis. 💡
If we’re willing to incur a fixed cost through finetuning, then we can distill GPT-4 outputs to gpt-3.5-turbo in a cheaper package over your data.
Aug 18, 2023 • 4 tweets • 2 min read
Adding document metadata helps to improve retrieval for your LLM chatbots 📣
To drive home the point, @wenqi_glantz shows concrete examples in her latest blog w/ @llama_index metadata extractors 🧨
Only with metadata do you get the correct answer ✅. https://t.co/P6DwwkMjuWbetterprogramming.pub/building-produ…
Here’s another example: given question “what percentage of gov revenue came from taxes” - top-k retrieval over raw text chunks doesn’t return the right answer.
Adding metadata, however, provides a concrete answer
Aug 8, 2023 • 6 tweets • 2 min read
There are too many options for building information retrieval:
- Chunk size
- Query strategy (top-k, hybrid, MMR)
Idea: What if we ensembled *all of the options* + let an LLM prune the pooled results? 👇
✅ More general retriever (though more 💰)
✅ Benchmark diff strategies
The key intuition here is that specific retrieval parameters (chunk size, top-k, etc.) can work better for different situations.
If we “ensemble” the results, we can have an LLM/reranker decide relevance for each query.
This creates a more general retrieval interface.
Aug 7, 2023 • 5 tweets • 3 min read
The SpanMarker model (by tomaarsen on @huggingface) is incredible for NER 🖋️
Can use it on top of powerful LM’s like BERT/RoBERTa
Use it to automate entity extraction in @llama_index -> increase the retrieval performance of your RAG system! 🚀👇
https://t.co/iykjJnLGolgpt-index.readthedocs.io/en/latest/exam…
The model can extract a lot of entities from any unstructured text: PER (person), ORG (organization), LOC (location), and much more.
This means that it’s perfectly suited for extracting entities from documents and setting them as metadata for each document.
Jul 21, 2023 • 4 tweets • 2 min read
A big failure case I've found when building data agents is that the LLM fails to "use" the tool with the right parameters:
🚫 Wrong type
🚫 Missing required params
Protip: Make your tools tolerant of partial inputs - easier for the agent to use and increases reliability! 👇 https://t.co/HOG6nUqdTHtwitter.com/i/web/status/1…
Take the example above of have the agent use a tool to draft an email 📤
The Gmail API *requires* all of the above values when updating a draft, but you can build the function interface such that everything else (`to`, `subject`, `message`) can be inferred from the draft_id.
Jul 14, 2023 • 4 tweets • 2 min read
There’s been two main stacks for LLM-based QA over data:
- Unstructured: vector db + RAG
- Structured: Text-to-SQL
We now have full native support for a THIRD stack (s/o @wey_gu): build knowledge graph w/ graph db, then query with text-to-Cypher! 🕸️🔎
You can use the following:
- `SummaryExtractor`
- `QuestionsAnsweredExtractor`
- `TitleExtractor`
- `KeywordExtractor`
- `MetadataFeatureExtractor`
Jul 1, 2023 • 7 tweets • 3 min read
What are the main use cases for knowledge graphs/graph db’s and how can they play a role in LLM RAG stacks?
Thread on some key insights that @wey_gu shared on our webinar 🧵
There’s a TON of stuff in the slides, and it’d be a crime not to share: siwei.io/talks/graph-ra…
[1] Graph DBs are much better suited for queries that require complex joins over data.
Much of the world’s knowledge is graph-based 🕸️🌎
Use cases: Fraud Detection, Social Networks, and also AI-enabled use cases like knowledge graphs (NLP, LLMs)
Jun 21, 2023 • 9 tweets • 4 min read
With @OpenAI function agents, it’s time to think about best practices for designing Tool API interfaces.
Simple Tool API - takes in a simple type (str, int, float)
Complex Tool API - takes in a complex type (Pydantic obj)
There’s pros and cons to both! 🧵
In a Simple Tool API - the Tool function signature takes in a simple type, like a string (or int)
Pro: this makes it WAY easier for agents to call! Much easier for agents to infer the value of simple types, especially natural language strings.
Easier for “dumber” LLMs to use.
Jun 20, 2023 • 7 tweets • 3 min read
We’ve now made agents 🤖 capable of sophisticated query planning over your data, with @OpenAI Function API + Pydantic + @jxnlco’s intellect 🔥
Agents input full Pydantic graph in fn signature of our query plan Tool - plan is executed over sub-tools.
We know that @OpenAI’s function API can extract structured data. But @jxnlco’s work takes it to the next-level: recursive Pydantic objects 🪆
Allows API to infer way more complex schemas. We were super inspired to add an in-house guide on this 📗👇
gpt-index.readthedocs.io/en/latest/exam…
The example we incorporated from @jxnlco was an example of parsing a directory tree. A tree contains recursive Node objects representing files/folders.
The key trick: wrap a recursive Pydantic model (Node) with a non-recursive one (DirectoryTree) to work with the function API.