Jerry Liu Profile picture
co-founder/CEO @llama_index Careers: https://t.co/EUnMNmbCtx Enterprise: https://t.co/Ht5jwxSrQB
6 subscribers
May 14 9 tweets 3 min read
Google vs OpenAI is the beef we’re all here for

Dumping updates from I/O in this 🧵 Gemini 1.5 flash - faster Gemini with 1M context Image
Nov 7, 2023 7 tweets 2 min read
I’ve seen a few threads on “Do we still need LlamaIndex / framework” given @OpenAI dev day updates

Short answer: Yes.

Some high-level takes addressing two points 🧵: 1) why vendor-agnostic LLM orchestration, and 2) how good is OpenAI API [1] Why LLM orchestration

If you’re completely locked into a single ecosystem (e.g. OpenAI, or Anthropic), there’s less need for orchestration.

But the whole point of an orchestration framework is that you can easily combine any modules w/o writing a ton of boilerplate.
Oct 5, 2023 4 tweets 2 min read
I’m excited for @OpenAI’s new support for function calling fine-tuning! (@stevenheidel)

Help gpt-3.5 better structure outputs + reason/plan 🤖

Dropping a day 0 release of supporting fn fine-tuning + distilling GPT-4 w/ Pydantic in @llama_index ⚡️👇: github.com/run-llama/llam…


Image
Image
Image
Our default way of using @OpenAI function calling is through our pydantic programs: simply specify the pydantic schema, and we’re use the endpoint to extract a structured output with that schema.

We can now log these results and collect them as a dataset.
Sep 13, 2023 5 tweets 2 min read
Fine-tuning embeddings is a great way to improve RAG performance - it’s even better if you can freeze your doc embeddings ❄️

We now allow you to fine-tune an arbitrary transformation to your queries: linear, deep NN, or custom!

Full guide showing all 3: gpt-index.readthedocs.io/en/latest/exam…
Image For instance, we can fine-tune a 2-layer neural net that takes in the query embedding as input and outputs a transformed embedding.

The best part is that this will work on top of any black-box embedding model; anything from sentence_transformers to text-embedding-ada-002. Image
Sep 4, 2023 4 tweets 2 min read
Here’s a simple trick to improve retrieval for RAG 💡:

Embed “references” to each text chunk instead of the chunk itself (e.g. smaller chunks, summaries). Leads to ~10-20% improvement 📈

Possible with @llama_index recursive retrieval. 💫

Full guide 📗: gpt-index.readthedocs.io/en/latest/exam…
Image The core intuition is related to the idea of decoupling embeddings from the raw text chunks (we’ve tweeted about this).

By embedding smaller chunks, summaries, or questions, we can first fetch relevant references -> then fetch original chunk for LLM synthesis.
Sep 3, 2023 10 tweets 4 min read
We fine-tuned a gpt-3.5 ReAct agent to be better at chain-of-thought 💭

A big gripe with gpt-3.5-turbo is that its reasoning is worse than gpt-4, causing unreliable agent behavior 🤔

By distilling gpt-4, these agents can do better search/retrieval 🔥🧵

gpt-index.readthedocs.io/en/latest/exam…
Image The fine-tuned model does better than base gpt-3.5 at CoT reasoning.

Example Q: “What is the total fair value of Uber's financial assets as of March 31, 2022?”

gpt-3.5 (left) returns an inaccurate response. Finetuning (right) keeps CoT going until it finds the actual answer.
Image
Image
Aug 25, 2023 5 tweets 2 min read
One major way to improve your RAG system is to fine-tune your embedding model ⚙️

We’ve created a full repo/guide (@disiok) on fine-tuning embeddings over any unstructured text (no labels needed) 🌟

5-10% improvement 📈 in evals + runs on your MacBook!

github.com/run-llama/fine…
Image This is motivated from the insight that using pre-trained embedding models may not be suited for your specific retrieval task.

Finetuning is a way to solve that - train it over positive example text pairs over your data domain.
Aug 23, 2023 5 tweets 2 min read
We successfully made gpt-3.5-turbo output GPT-4 quality responses in an e2e RAG system 🔥

Stack: automated training dataset creation in @llama_index + new @OpenAI finetuning + ragas (@Shahules786) eval

We did this in a day ⚡️

Full Colab notebook here: colab.research.google.com/drive/1vWeJBXd…
Image The key intuition : gpt-3.5-turbo (even after fine-tuning) is much cheaper than GPT-4 on a marginal token basis. 💡

If we’re willing to incur a fixed cost through finetuning, then we can distill GPT-4 outputs to gpt-3.5-turbo in a cheaper package over your data.
Aug 18, 2023 4 tweets 2 min read
Adding document metadata helps to improve retrieval for your LLM chatbots 📣

To drive home the point, @wenqi_glantz shows concrete examples in her latest blog w/ @llama_index metadata extractors 🧨

Only with metadata do you get the correct answer ✅. https://t.co/P6DwwkMjuWbetterprogramming.pub/building-produ…
Image Here’s another example: given question “what percentage of gov revenue came from taxes” - top-k retrieval over raw text chunks doesn’t return the right answer.

Adding metadata, however, provides a concrete answer Image
Aug 8, 2023 6 tweets 2 min read
There are too many options for building information retrieval:
- Chunk size
- Query strategy (top-k, hybrid, MMR)

Idea: What if we ensembled *all of the options* + let an LLM prune the pooled results? 👇
✅ More general retriever (though more 💰)
✅ Benchmark diff strategies Image The key intuition here is that specific retrieval parameters (chunk size, top-k, etc.) can work better for different situations.

If we “ensemble” the results, we can have an LLM/reranker decide relevance for each query.

This creates a more general retrieval interface.
Aug 7, 2023 5 tweets 3 min read
The SpanMarker model (by tomaarsen on @huggingface) is incredible for NER 🖋️

Can use it on top of powerful LM’s like BERT/RoBERTa

Use it to automate entity extraction in @llama_index -> increase the retrieval performance of your RAG system! 🚀👇

https://t.co/iykjJnLGolgpt-index.readthedocs.io/en/latest/exam…
Image The model can extract a lot of entities from any unstructured text: PER (person), ORG (organization), LOC (location), and much more.

This means that it’s perfectly suited for extracting entities from documents and setting them as metadata for each document. Image
Jul 21, 2023 4 tweets 2 min read
A big failure case I've found when building data agents is that the LLM fails to "use" the tool with the right parameters:
🚫 Wrong type
🚫 Missing required params

Protip: Make your tools tolerant of partial inputs - easier for the agent to use and increases reliability! 👇 https://t.co/HOG6nUqdTHtwitter.com/i/web/status/1…
Image Take the example above of have the agent use a tool to draft an email 📤

The Gmail API *requires* all of the above values when updating a draft, but you can build the function interface such that everything else (`to`, `subject`, `message`) can be inferred from the draft_id.
Jul 14, 2023 4 tweets 2 min read
There’s been two main stacks for LLM-based QA over data:
- Unstructured: vector db + RAG
- Structured: Text-to-SQL

We now have full native support for a THIRD stack (s/o @wey_gu): build knowledge graph w/ graph db, then query with text-to-Cypher! 🕸️🔎

https://t.co/ImiakVqrXKgpt-index.readthedocs.io/en/latest/exam…
Image LlamaIndex has the tools to build a Knowledge Graph from any unstructured data source.

You can store this graph in a graph db (@NebulaGraph)

Graph db’s already have rich query interfaces, with languages such as Cypher. We can now provide a natural language interface on top.
Jul 13, 2023 5 tweets 2 min read
Giving LLM agents access to data is more complex than simply letting them use a GET endpoint. ⚠️

If the endpoint returns too much data (e.g. an entire @wikipedia page), it will overflow the context window!

Instead: “cache” data and then search 💡. Available in @llama_index 👇 Let’s assume we want to give an agent access to Wikipedia.

We have a Tool that can load entire Wikipedia pages: .

Main problem is that a Wikipedia page is oftentimes way too big for a default context window (e.g. 4k).llama-hub-ui.vercel.app/l/tools-wikipe…
Jul 12, 2023 14 tweets 4 min read
Today we’re incredibly excited to launch Data Agents in @llama_index: LLM-powered knowledge workers that can read and write over your data 🤖🗃️

✅ Full agent + tool abstractions
✅ LlamaHub Tools Repo - 15+ Tools AND tutorials!

Full blog/thread 🧵: medium.com/@jerryjliu98/d… [1] Context

Our core mission with @llama_index is to unlock the capabilities of LLMs over your data.

So far we’ve mostly focused on search/retrieval…
Jul 11, 2023 7 tweets 3 min read
Any dev framework needs to provide enough customizability to allow the target user to write core business logic.

We’ve worked hard to make @llama_index serve not only beginners, but also allow experts to easily create rich custom workflows over their data.

Our docs show that🧵 Our quickstart tutorial allows you to get started building QA in 3 lines of code:

Right after, you may have some immediate needs for customization (use vector store, modify top k). Our customization guide helps to cover that: https://t.co/CmkkAYB5mhgpt-index.readthedocs.io/en/latest/gett…
gpt-index.readthedocs.io/en/latest/gett…
Jul 8, 2023 7 tweets 3 min read
Adding metadata to text can help w/ disambiguation and boost retrieval performance for LLM QA systems.

Manually adding metadata is time-consuming ⏰

Instead, let’s use LLMs to automate metadata extraction - extract rich context to augment each chunk 💡

https://t.co/ndlMRQ62IEgpt-index.readthedocs.io/en/latest/exam…
We now have these capabilities in LlamaIndex, with our MetadataExtractor modules.

You can use the following:
- `SummaryExtractor`
- `QuestionsAnsweredExtractor`
- `TitleExtractor`
- `KeywordExtractor`
- `MetadataFeatureExtractor`
Jul 1, 2023 7 tweets 3 min read
What are the main use cases for knowledge graphs/graph db’s and how can they play a role in LLM RAG stacks?

Thread on some key insights that @wey_gu shared on our webinar 🧵

There’s a TON of stuff in the slides, and it’d be a crime not to share: siwei.io/talks/graph-ra… [1] Graph DBs are much better suited for queries that require complex joins over data.

Much of the world’s knowledge is graph-based 🕸️🌎

Use cases: Fraud Detection, Social Networks, and also AI-enabled use cases like knowledge graphs (NLP, LLMs)

Jun 21, 2023 9 tweets 4 min read
With @OpenAI function agents, it’s time to think about best practices for designing Tool API interfaces.

Simple Tool API - takes in a simple type (str, int, float)
Complex Tool API - takes in a complex type (Pydantic obj)

There’s pros and cons to both! 🧵 ImageImage In a Simple Tool API - the Tool function signature takes in a simple type, like a string (or int)

Pro: this makes it WAY easier for agents to call! Much easier for agents to infer the value of simple types, especially natural language strings.

Easier for “dumber” LLMs to use. ImageImage
Jun 20, 2023 7 tweets 3 min read
We’ve now made agents 🤖 capable of sophisticated query planning over your data, with @OpenAI Function API + Pydantic + @jxnlco’s intellect 🔥

Agents input full Pydantic graph in fn signature of our query plan Tool - plan is executed over sub-tools.

📗: gpt-index.readthedocs.io/en/latest/exam… ImageImage How did this start? @jxnlco put out this PR on the openai_function_call repo: github.com/jxnl/openai_fu…

It explored using the Function API to generate query plans, in the form of nested Pydantic objects.

Jun 19, 2023 4 tweets 3 min read
We know that @OpenAI’s function API can extract structured data. But @jxnlco’s work takes it to the next-level: recursive Pydantic objects 🪆

Allows API to infer way more complex schemas. We were super inspired to add an in-house guide on this 📗👇

gpt-index.readthedocs.io/en/latest/exam… ImageImage The example we incorporated from @jxnlco was an example of parsing a directory tree. A tree contains recursive Node objects representing files/folders.

The key trick: wrap a recursive Pydantic model (Node) with a non-recursive one (DirectoryTree) to work with the function API. Image