Post

More from @jerryjliu0

Jerry Liu

@jerryjliu0

May 14, 2024

Google vs OpenAI is the beef we’re all here for

Dumping updates from I/O in this 🧵

Gemini 1.5 flash - faster Gemini with 1M context

Project astra - googles answer to the gpt4o demo.

Very impressive, multimodal video reasoning with memory (?!), unclear how prerecorded

Read 9 tweets

Jerry Liu

@jerryjliu0

Nov 7, 2023

I’ve seen a few threads on “Do we still need LlamaIndex / framework” given @OpenAI dev day updates

Short answer: Yes.

Some high-level takes addressing two points 🧵: 1) why vendor-agnostic LLM orchestration, and 2) how good is OpenAI API

[1] Why LLM orchestration

If you’re completely locked into a single ecosystem (e.g. OpenAI, or Anthropic), there’s less need for orchestration.

But the whole point of an orchestration framework is that you can easily combine any modules w/o writing a ton of boilerplate.

[1, continued] This includes LLMs, embedding models, multi-modal models, vector db’s, other storage systems, and much more.

Yes, a single LLM provider can add this. But the space is still so competitive, there’s still a huge demand for open-source models.

Read 7 tweets

Jerry Liu

@jerryjliu0

Oct 5, 2023

I’m excited for @OpenAI’s new support for function calling fine-tuning! (@stevenheidel)

Help gpt-3.5 better structure outputs + reason/plan 🤖

Dropping a day 0 release of supporting fn fine-tuning + distilling GPT-4 w/ Pydantic in @llama_index ⚡️👇: github.com/run-llama/llam…

Our default way of using @OpenAI function calling is through our pydantic programs: simply specify the pydantic schema, and we’re use the endpoint to extract a structured output with that schema.

We can now log these results and collect them as a dataset.

This is *very* WIP - we’re excited to use function fine-tuning to explore better agentic reasoning capabilities as well as better RAG systems (we recently added support for structured outputs!)

More coming soon.

Read 4 tweets

Jerry Liu

@jerryjliu0

Sep 13, 2023

Fine-tuning embeddings is a great way to improve RAG performance - it’s even better if you can freeze your doc embeddings ❄️

We now allow you to fine-tune an arbitrary transformation to your queries: linear, deep NN, or custom!

Full guide showing all 3: gpt-index.readthedocs.io/en/latest/exam…

For instance, we can fine-tune a 2-layer neural net that takes in the query embedding as input and outputs a transformed embedding.

The best part is that this will work on top of any black-box embedding model; anything from sentence_transformers to text-embedding-ada-002.

If you are familiar with @PyTorch, we allow you to also define query transformations as *arbitrary* neural nets as well.

Read 5 tweets

Jerry Liu

@jerryjliu0

Sep 4, 2023

Here’s a simple trick to improve retrieval for RAG 💡:

Embed “references” to each text chunk instead of the chunk itself (e.g. smaller chunks, summaries). Leads to ~10-20% improvement 📈

Possible with @llama_index recursive retrieval. 💫

Full guide 📗: gpt-index.readthedocs.io/en/latest/exam…

The core intuition is related to the idea of decoupling embeddings from the raw text chunks (we’ve tweeted about this).

By embedding smaller chunks, summaries, or questions, we can first fetch relevant references -> then fetch original chunk for LLM synthesis.

NOTE: you can define multiple references for a given text chunk!

In our guide for a given “large” text chunk (512), we define multiple smaller text chunks (128, 256) that refer to it.

If multiple retrieved nodes reference the same original node, we perform deduplication.

Read 4 tweets

Jerry Liu

@jerryjliu0

Sep 3, 2023

We fine-tuned a gpt-3.5 ReAct agent to be better at chain-of-thought 💭

A big gripe with gpt-3.5-turbo is that its reasoning is worse than gpt-4, causing unreliable agent behavior 🤔

By distilling gpt-4, these agents can do better search/retrieval 🔥🧵

gpt-index.readthedocs.io/en/latest/exam…

The fine-tuned model does better than base gpt-3.5 at CoT reasoning.

Example Q: “What is the total fair value of Uber's financial assets as of March 31, 2022?”

gpt-3.5 (left) returns an inaccurate response. Finetuning (right) keeps CoT going until it finds the actual answer.

Our comprehensive guide (linked above) shows you how to do this.

At a high-level, we autogenerate a set of questions over Uber 10Q filings.
We then log prompt inputs + outputs with each call to the LLM for a GPT-4 agent.
We use this data to finetune gpt-3.5.

Read 10 tweets

Share this page!

Enter URL or ID to Unroll

Jerry Liu

Try unrolling a thread yourself!

More from @jerryjliu0

Jerry Liu

Jerry Liu

Jerry Liu

Jerry Liu

Jerry Liu

Jerry Liu

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!