Sahas Profile picture
Jun 9 7 tweets 3 min read Twitter logo Read on Twitter
Wrote my first custom chain in @LangChainAI as part of improving our AI search in stockinsights.ai

ModifyDocumentsChain adds additional context to docs retrieved from vectorstore based on the metadata and runs before StuffDocumentChain. This helps us #buildinpublic Image
In many use cases, we would want to add more context to retrieved chunks from vectorstore such as @pinecone before passing to OpenAI to improve the GPT's accuracy.

ModifyDocumentsChain contains chunkModifier method that modifies the doc's content based on the metadata.
At stockinsights.ai, we provide users to leverage GPT over the fin data (Eg: Earning reports) of Indian companies. Our goal is to generate accurate answers to users' queries based on only these reports.

The Problem: Splitting large docs into chunks of meaningful context
There's no simple solution to this seemingly innocuous problem. The requirements are
1. Chunks should contain enough context to answer users query
2. Relevant data can lie in multiple reports. So larger chunks crowd out other meaningful chunks that we pass to OpenAI. Not correct
3. A user can query with a company name & expects us to provide the results exclusively for the company
4. Each chunk shouldn't contain same context as that can push the similarity search score of a non-relevant chunk

We are trying different approaches to handle this issue.
Approach 1: Add context to each vector store chunk of report at the time of persisting the chunk. For eg: Chunks of TataSteel company should contain - "This is the earnings transcript chunk of TataSteel company"

Cons: All chunks will have same context. So, point 4 gets affected.
Approach2: Add the context to docs before sending it to OpenAI.
1. This ensures that chunks are distinct & only highly relevant chunks are chosen
2. OpenAI receives much better context too.

ModifyDocuemntsChain is used to prepend this context to chunk before sending to OpenAI.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sahas

Sahas Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(