Rohan Profile picture
Oct 9 10 tweets 4 min read Twitter logo Read on Twitter
Finetuning the embedding model can allow for more meaningful embedding representations, leading to better retrieval performance.

@llama_index has abstraction for finetuning sentence transformers embedding models that makes this process quite seamless.

Let's see how it works 👇 Image
@llama_index Finetuning means updating the model weights themselves over a set of data corpus to make the model work better for specific use-cases.

E.g. for embedding ArXiv papers, we want the embeddings to align semantically with the concepts and not filler words like “This paper is…”.
@llama_index .@llama_index has guides on how to finetune embeddings in different ways:

- finetune the embedding model itself (only sentence transformers)
- finetune an adapter over any black-box embedding model (stay tuned for this one 🔥)
@llama_index 3 Steps for finetuning embeddings:

1. Prepare the data via generate_qa_embeddings_pairs()
2. finetune model via SentenceTransformersFinetuneEngine
3. Evaluate the model
@llama_index Here we use two pdfs: lyft.pdf and uber.pdf.

we'll use lyft.pdf to create the training set and evaluate the finetuned model using evaluation set created from uber.pdf.

we create nodes from both training and evaluation pdfs. Image
@llama_index Next we use generate_qa_embedding_pairs() to create the training and evaluation datasets from the nodes.

For each chunk, synthetic queries are created for that chunk via LLMs.

Each pair of (generated question, text chunk used as context) becomes a datapoint in the datasets.
Image
Image
@llama_index Next we use the SentenceTransformersFinetuneEngine to create the finetuning engine.

We pass it the necessary parameters like the train and evaluation dataset, model to finetune, path to output the finetuned model etc.
Image
Image
@llama_index Finally, we finetune it using the engine created before and evaluate the new model comparing it with the base model and OpenAIEmbeddings Image
@llama_index In the hit rate metric, new model performs significantly well from the base model it was finetuned on and even performs almost as good as openai embedding model.

More on finetuning and the evaluation method in the official docs:
gpt-index.readthedocs.io/en/latest/exam…
@llama_index Thanks for reading.

I write about AI, LLMs, RAG etc. and try to make complex topics as easy as possible.

Stay tuned for more ! 🔥 #AI #RAG

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rohan

Rohan Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @clusteredbytes

Oct 2
Multi Document Agent architecture (v0) in @llama_index, a step beyond naive top-k RAG.

It allows answering broader set of questions over multiple documents, which weren't possible with basic RAG.

Let's break down the agent architecture and see how it works 👇🧵 Image
Architecture:

- For each document, a VectorIndex is created for semantic search, and a SummaryIndex is created for summarization

- Then we create QueryEngine for both these Indices

- Next the QueryEngines are converted to QueryTools Image
These Tools are passed to OpenAIAgent. This is the document agent.

Each document has an agent like this that chooses to perform summarization or semantic search within each document. Image
Read 7 tweets
Sep 29
We've seen that smaller chunks are good for capturing semantic meaning and larger ones are good for providing better context.

@llama_index AutoMergingRetriever takes it one step further by keeping the chunks in a tree structure and dynamically choosing the chunk length. 🧵👇 Image
The first step here is parsing via the HierarchicalNodeParser.

It stores the node in a tree structure, where deeper nodes are smaller chunks and shallow nodes are larger chunks.

We can specify how many layers of nodes we want and the splitter size for each layer.
All nodes are stored in a docstore and only the leaf nodes are stored in a vectorstore.

At first, the vectorstore retriever is called to get the initial leaf nodes.

From here we try to auto-merge parents to find parent with the correct chunk size.
Image
Image
Read 10 tweets
Aug 26
Previously we've seen @LangChainAI ParentDocumentRetriever that creates smaller chunks from a document and links them back to the initial documents during retrieval.

MultiVectorRetriever is a more customizable version of that. Let's see how to use it 🧵👇 Image
@LangChainAI ParentDocumentRetriever automatically creates the small chunks and links their parent document id.

If we want to create some additional vectors for each documents, other than smaller chunks, we can do that and then retrieve those using MultiVectorRetriever.
We can customize how these additional vectors are created for each parent document. Here're some ways @LangChainAI mentioned in their documentation.

- smaller chunks
- store the summary vector of each document
- store the vectors of hypothetical questions for each documents
Read 13 tweets
Aug 14
While splitting the raw text for Retrieval Augmented Generation (RAG), what should be the ideal length of each chunk? What’s the sweet spot?

Strike a balance between small vs large chunks using @LangChainAI ParentDocumentRetriever

Let's see how to use it 👇🧵 Image
The issue:

- smaller chunks reflect more accurate semantic meaning after creating embedding

- but they sometimes might lose the bigger picture and might sound out of context, making it difficult for the LLM to properly answer user's query with limited context per chunk.
@LangChainAI ParentDocumentRetriever addresses this issue by creating embedding from the smaller chunks only as they capture better semantic meaning.

But while plugging into the LLM input, it uses the larger chunks with better context.
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(