Jerry Liu Profile picture
Dec 1 1 tweets 2 min read Read on X
Claude Code over Excel++ 🤖📊

Claude already 'works' over Excel, but in a naive manner - it writes raw python/openpyxl to analyze an Excel sheet cell-by-cell and generally lacks a semantic understanding of the content. Basically the coding abstractions used are too low-level to have the coding agent accurately do more sophisticated analysis.

Our new LlamaSheets API lets you automatically segment structure complex Excel sheets into well-formatted 2D tables. This both gives Claude Code immediate semantic awareness of the sheet, and allows it to run Pandas/SQL over well-structured dataframes.

We've written a guide showing you how specifically to use LlamaSheets with coding agents!

Guide: developers.llamaindex.ai/python/cloud/l…

Sign up to LlamaCloud: cloud.llamaindex.ai

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jerry Liu

Jerry Liu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jerryjliu0

May 14, 2024
Google vs OpenAI is the beef we’re all here for

Dumping updates from I/O in this 🧵
Gemini 1.5 flash - faster Gemini with 1M context Image
Project astra - googles answer to the gpt4o demo.

Very impressive, multimodal video reasoning with memory (?!), unclear how prerecorded
Read 9 tweets
Nov 7, 2023
I’ve seen a few threads on “Do we still need LlamaIndex / framework” given @OpenAI dev day updates

Short answer: Yes.

Some high-level takes addressing two points 🧵: 1) why vendor-agnostic LLM orchestration, and 2) how good is OpenAI API
[1] Why LLM orchestration

If you’re completely locked into a single ecosystem (e.g. OpenAI, or Anthropic), there’s less need for orchestration.

But the whole point of an orchestration framework is that you can easily combine any modules w/o writing a ton of boilerplate.
[1, continued] This includes LLMs, embedding models, multi-modal models, vector db’s, other storage systems, and much more.

Yes, a single LLM provider can add this. But the space is still so competitive, there’s still a huge demand for open-source models.
Read 7 tweets
Oct 5, 2023
I’m excited for @OpenAI’s new support for function calling fine-tuning! (@stevenheidel)

Help gpt-3.5 better structure outputs + reason/plan 🤖

Dropping a day 0 release of supporting fn fine-tuning + distilling GPT-4 w/ Pydantic in @llama_index ⚡️👇: github.com/run-llama/llam…


Image
Image
Image
Our default way of using @OpenAI function calling is through our pydantic programs: simply specify the pydantic schema, and we’re use the endpoint to extract a structured output with that schema.

We can now log these results and collect them as a dataset.
This is *very* WIP - we’re excited to use function fine-tuning to explore better agentic reasoning capabilities as well as better RAG systems (we recently added support for structured outputs!)

More coming soon.
Read 4 tweets
Sep 13, 2023
Fine-tuning embeddings is a great way to improve RAG performance - it’s even better if you can freeze your doc embeddings ❄️

We now allow you to fine-tune an arbitrary transformation to your queries: linear, deep NN, or custom!

Full guide showing all 3: gpt-index.readthedocs.io/en/latest/exam…
Image
For instance, we can fine-tune a 2-layer neural net that takes in the query embedding as input and outputs a transformed embedding.

The best part is that this will work on top of any black-box embedding model; anything from sentence_transformers to text-embedding-ada-002. Image
If you are familiar with @PyTorch, we allow you to also define query transformations as *arbitrary* neural nets as well. Image
Read 5 tweets
Sep 4, 2023
Here’s a simple trick to improve retrieval for RAG 💡:

Embed “references” to each text chunk instead of the chunk itself (e.g. smaller chunks, summaries). Leads to ~10-20% improvement 📈

Possible with @llama_index recursive retrieval. 💫

Full guide 📗: gpt-index.readthedocs.io/en/latest/exam…
Image
The core intuition is related to the idea of decoupling embeddings from the raw text chunks (we’ve tweeted about this).

By embedding smaller chunks, summaries, or questions, we can first fetch relevant references -> then fetch original chunk for LLM synthesis.
NOTE: you can define multiple references for a given text chunk!

In our guide for a given “large” text chunk (512), we define multiple smaller text chunks (128, 256) that refer to it.

If multiple retrieved nodes reference the same original node, we perform deduplication. Image
Read 4 tweets
Sep 3, 2023
We fine-tuned a gpt-3.5 ReAct agent to be better at chain-of-thought 💭

A big gripe with gpt-3.5-turbo is that its reasoning is worse than gpt-4, causing unreliable agent behavior 🤔

By distilling gpt-4, these agents can do better search/retrieval 🔥🧵

gpt-index.readthedocs.io/en/latest/exam…
Image
The fine-tuned model does better than base gpt-3.5 at CoT reasoning.

Example Q: “What is the total fair value of Uber's financial assets as of March 31, 2022?”

gpt-3.5 (left) returns an inaccurate response. Finetuning (right) keeps CoT going until it finds the actual answer.
Image
Image
Our comprehensive guide (linked above) shows you how to do this.

At a high-level, we autogenerate a set of questions over Uber 10Q filings.
We then log prompt inputs + outputs with each call to the LLM for a GPT-4 agent.
We use this data to finetune gpt-3.5.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(