Harrison Chase Profile picture
Jul 5, 2023 8 tweets 5 min read Read on X
📄Documents x LLMs📄

Combining documents with LLMs is a key part of retrieval and chaining

We've improved our @LangChainAI reference documentation across the 5 major CombineDocumentsChains and helper functions to help with clarity and understanding of how these work

🧵





📄 `format_document`

Want to control which metadata keys show up in the prompt?

This helper function is rarely exposed, but is key to combining documents with LLMs

It takes a Document and formats it into a string using a PromptTemplate

Docs: https://t.co/Xrl5HtvFlvapi.python.langchain.com/en/latest/sche…
🧸Stuff Documents Chain

The most basic CombineDocumentsChain, this takes N documents, formats them into a string using a PromptTemplate and `format_document`, and then combines them into a single prompt and passes them to an LLM

Docs: https://t.co/NCvUNEbAVYapi.python.langchain.com/en/latest/chai…
🟥 ReduceDocumentsChain

But what if you have too many documents to fit into a single a prompt? That's where ReduceDocumentsChain comes into play

It recursively combines documents together

Docs: https://t.co/0VdYZm7WhXapi.python.langchain.com/en/latest/chai…
🗺️ Map Reduce Chain

This builds on top of the ReduceDocumentsChain

It takes an LLMChain and a ReduceDocumentsChain. It first applies the LLMChain to each document, and then passes all the results to the ReduceDocumentsChain

Docs: https://t.co/5s3IZ3XPbOapi.python.langchain.com/en/latest/chai…
👨‍🚒Refine Documents Chain

This chain uses the first document to get an initial response

It then loops over the remaining docs, making a call to the language model to combining the response with the next document

Docs: https://t.co/iONLv4ooo2api.python.langchain.com/en/latest/chai…
🌭Map Rerank

Finally, the Map Rerank Chain calls an LLM on each document, asking not only for an answer but also a score

It then sorts the responses by the score and returns the highest one

Docs: https://t.co/mdaswUaYw8api.python.langchain.com/en/latest/chai…
These chains form the backbone of several of the most popular use cases (question answering, summarization, etc)

We hope the updated reference documentation helps with understanding how these chains work and how to use them

Please let us know what other documentation we can add

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Harrison Chase

Harrison Chase Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @hwchase17

Oct 8, 2024
🚀We're launching "long-term memory" support in LangGraph

At its core, long-term memory is "just" a persistent document store that lets you *put*, *get*, and *search* for memories you've saved

Why so simple?

🧵 Image
🧠The idea of memory is tantalizing, but also really vague

What does it even mean for an application to have memory?

Much like agents, there's a lot of hype and interest in this area, without a clear definition of what is actually means
🥇We saw that applications that successfully implemented memory were implementing it in an application specific way

A coding app may care about a users language, preferred libraries, and proficiency level

A companion app may care about a users name, age, friends, etc
Read 9 tweets
Oct 14, 2023
⛓️Chain of Verification

A great new paper from Meta on a prompting technique to reduce hallucinations

🦜🔗Sourajit Roy Chowdhury implemented this in @LangChainAI **along with some improvements**

📃And he wrote a blog on it

🧵Lets dive in (this is why I love the LC community!)


Image
Image
Image
Image
Most important link: the GitHub repo

This is a well documented, well implemented repo - that takes a lot of time

Big 👏 and ⭐️ to Sourajit for not only implementing this paper, but implementing in such a comprehensive and helpful way

github.com/ritun16/chain-…
First, I would start off by checking out the original paper:



I would also look at great threads on the topic:

from @arankomatsuzaki

from @jaseweston

from @johnjnay arxiv.org/abs/2309.11495



Image
Read 7 tweets
Sep 21, 2023
🤖Agents from scratch

We've rewritten all our 8 agent types using LangChain Expression LangChain and prompts from the Hub

This makes them more modular, understandable, and therefor more customizable

This customizability is crucial for teams looking to go to production

Long 🧵
Image
Image
If you want to jump right into it, we've updated the "Getting Started" page for agents to go over all the individual components

We then show how to create agents from these individual components

Is a great resource to build up a solid base understanding

python.langchain.com/docs/modules/a…
Why do this?

One thing we've seen is that while default agents make it easy to prototype, a lot of teams want to customize some component of them in order to improve the accuracy of THEIR application

In order enable this, we exposed all the core components
Read 13 tweets
Aug 25, 2023
🌲Multi Vector Retriever

The basic idea: you store multiple embedding vectors per document. How do you generate these embeddings?

👨‍👦Smaller chunks (this is ParentDocumentRetriever)
🌞Summary of document
❓Hypothetical questions
🖐️Manually specified text snippets

Quick 🧵 Image
Language models are getting larger and larger context windows

This is great, because you can pass bigger chunks in!

But if you have larger chunks, then a single embedding per chunk can start to fall flat, as there can be multiple distinct topics in that longer passage
One solution is to start creating not one but MULTIPLE embeddings per document

This was the basic realization with our ParentDocumentRetriever ~2 weeks ago, but it's really much more general than that

There are many ways to create multiple embeddings

Read 10 tweets
Aug 15, 2023
🚢Benchmarking Question/Answering Over CSV Data

Deep dive on improving an application that does question answering over CSV data:

📜3000 word blog post
🎥30min video
🛌Open sourced eval data
🎬Open sourced code for gathering feedback
🤖Open sourced final agent code

🧵 Image
Blog:

YouTube: https://t.co/JxUrrvzBdi

Code & data used: https://t.co/LnQeRsHrNT

Now for a quick thread:blog.langchain.dev/benchmarking-q…

github.com/langchain-ai/l…
This started ~2 weeks ago, when I tweeted that we wanted to improve our chains/agents for doing question/answering over CSV data

Why?

Most QA applications focus on text data, but lots of real world data is in CSVs

Read 7 tweets
Aug 3, 2023
💬Conversational Retrieval Agents

The most popular chain in @LangChainAI is the ConversationalRetrievalChain, which allows you chat with your data

Using an agent instead can allow for great flexibility, and its a narrow and well defined enough agent that its fairly reliable

🧵 Image
I'll dive into details in this thread, but quick links:

Blog:

Python Docs: https://t.co/v1wLHIuBki

JS Docs: https://t.co/N0hQ90MFyg https://t.co/1eAdJBUnXCblog.langchain.dev/conversational…
python.langchain.com/docs/use_cases…
js.langchain.com/docs/use_cases…
Image
The basic idea:

Give an agent a tool that is itself a retriever. The agent can then call this tool and get back a list of documents

This allows the agent to decide when it wants to do retrieval - could do it once, twice, or not at all
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(