Harrison Chase Profile picture
Jul 5 8 tweets 5 min read Twitter logo Read on Twitter
📄Documents x LLMs📄

Combining documents with LLMs is a key part of retrieval and chaining

We've improved our @LangChainAI reference documentation across the 5 major CombineDocumentsChains and helper functions to help with clarity and understanding of how these work

🧵





📄 `format_document`

Want to control which metadata keys show up in the prompt?

This helper function is rarely exposed, but is key to combining documents with LLMs

It takes a Document and formats it into a string using a PromptTemplate

Docs: https://t.co/Xrl5HtvFlvapi.python.langchain.com/en/latest/sche…
🧸Stuff Documents Chain

The most basic CombineDocumentsChain, this takes N documents, formats them into a string using a PromptTemplate and `format_document`, and then combines them into a single prompt and passes them to an LLM

Docs: https://t.co/NCvUNEbAVYapi.python.langchain.com/en/latest/chai…
🟥 ReduceDocumentsChain

But what if you have too many documents to fit into a single a prompt? That's where ReduceDocumentsChain comes into play

It recursively combines documents together

Docs: https://t.co/0VdYZm7WhXapi.python.langchain.com/en/latest/chai…
🗺️ Map Reduce Chain

This builds on top of the ReduceDocumentsChain

It takes an LLMChain and a ReduceDocumentsChain. It first applies the LLMChain to each document, and then passes all the results to the ReduceDocumentsChain

Docs: https://t.co/5s3IZ3XPbOapi.python.langchain.com/en/latest/chai…
👨‍🚒Refine Documents Chain

This chain uses the first document to get an initial response

It then loops over the remaining docs, making a call to the language model to combining the response with the next document

Docs: https://t.co/iONLv4ooo2api.python.langchain.com/en/latest/chai…
🌭Map Rerank

Finally, the Map Rerank Chain calls an LLM on each document, asking not only for an answer but also a score

It then sorts the responses by the score and returns the highest one

Docs: https://t.co/mdaswUaYw8api.python.langchain.com/en/latest/chai…
These chains form the backbone of several of the most popular use cases (question answering, summarization, etc)

We hope the updated reference documentation helps with understanding how these chains work and how to use them

Please let us know what other documentation we can add

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Harrison Chase

Harrison Chase Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @hwchase17

Jul 6
💬ConversationalRetrievalChain Upgrades

One of our more popular chains is the ConversationalRetrievalChain, which allows you to create a retrieval augmented generation chatbot

We've introduced some small but impactful quality of life changes:

🧵
📃Improved Reference Docs

We beefed up our reference documentation to include better docstrings and a more end-to-end example

There's a lot of toggles to play with, hopefully this helps make it more clear what all the parameters are

Docs: api.python.langchain.com/en/latest/chai…
❓Rephrase Question Flag

The conversational retrieval chain first condenses the chat history and the new message into a standalone question to use for retrieval

This flag controls whether that new question is also used for generation as well

Docs: api.python.langchain.com/en/latest/chai…
Read 5 tweets
Jun 19
⭐️Using `functions` to structure output⭐️

We're starting to add more chains that rely on functions to structure output

Here's a quick overview of how we're doing that, which chains we've added so far, how to contribute, and additional resources

🧵
Although we first incorporated `functions` into agents, an almost more important ability of `functions` is to structure output from ChatGPT

This is extremely useful when you want to use the output of ChatGPT in a particular way
You can do this by not only passing in `functions` parameter, but also passing in the `function_call` parameter

The `function_call` parameter forces it to respond using a particular function - allowing you to guarantee the output in a specific format
Read 9 tweets
Jun 16
The new @OpenAI functions are good for other things besides agents

Another killer use case is extracting structured information from unstructured docs

We've adding support for extraction AND tagging in @LangChainAI - thanks to @fpingham for code and @jxnlco for review

🧵
✂️Extraction

Specify a schema - either a dictionary or a Pydantic model - and then extract entities from a piece of text with the same schema

This will return a list of objects with that schema

Docs: python.langchain.com/en/latest/modu… ImageImage
⚡️Tagging

Specify a schema and tag a document with those attributes

As opposed to Extraction, this extracts only one instance of that schema so its more useful for classification of attributes pertaining to the text as a whole

Docs: python.langchain.com/en/latest/modu… ImageImage
Read 4 tweets
Jun 5
⭐️Composable Prompts⭐️

Wouldn't it be nice if there was a way to compose prompts together, reusing pieces across prompts?

In the newest Python and JS release there now is with `Pipeline Prompt`!

Links 👇 ImageImage
The way this works is you define a `PipelinePrompt` with two components:

- FinalPrompt: the final prompt template to be formatted
- PipelinePrompts: a sequence of tuples of (name, PromptTemplate)

The `name` argument is how the formatted prompt will be passed to future prompts
When `.format` is called, the PipelinePrompts are first formatted in order, and are then used in future formatting steps with their respective `name` arguments

Finally, the FinalPrompt.format is called using any previously formatted values as neccesary
Read 4 tweets
May 31
✂️15+ Code Specific Text Splitters✂️

Just used one of @LangChainAI 's 100+ Document Loaders?

Next step: split data into embeddable chunks.

We now have support for splitting 15+ different coding languages in the optimal way

🧵 ImageImage
A underrated part of the preprocessing pipeline, proper splitting of text allows for maintaining semantically meaningful chunks

This is crucial when doing retrieval augmented generation in order to ensure the proper context is inserted into the prompt
One way to do this is split on semantically meaningful characters.

Theoretically, these characters are used to denote the start and end of sequences that make the most sense to be together.

For example "\n\n" is generically good to split on as that signals a new paragraph
Read 5 tweets
May 30
How to speed up "chat-your-data" applications while retaining final answer accuracy?

🫙Use a cheaper/faster model (gpt-3.5) to create the condensed question
💬Use a better but more expensive model (gpt-4) for final response

Thanks to @cristobal_dev for highlighting!

🧵 Image
Most "chat-your-data" applications involve three steps:

1⃣Condense the chat history into a standalone question
2⃣Retrieve relevant docs based on the standalone question
3⃣Generate a final answer based on the retrieved documents

This involves two total calls to the LLM!
But these calls are not created equal

Condensing the chat history is a relatively easy (and less important) step, while generating the final answer can be trickier and more important to get right

With @LangChainAI you can easily use a different LLM for each step
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(