Lance Martin Profile picture
langchain. past: robots 🚘 🤖, phd @stanford 🧪
2 subscribers
Mar 22, 2024 11 tweets 6 min read
Gave this short talk on RAG vs long context LLMs at a few meetups recently. Tries to pull together threads from a few recent projects + papers I really like.

Just put on YT, a few highlights w papers below ...
1/ Can long context LLMs retrieve & reason over multiple facts as a RAG system does? @GregKamradt and I dug into this w/ multi-needle-in-a-haystack on GPT4. Retrieval is not guaranteed: worse for more needles, worse at doc start, worse w/ reasoning.
Image
Aug 25, 2023 14 tweets 6 min read
Check out these new guides for 13 popular LLM use-cases. Part of a major community effort to improve the @LangChainAI docs + add CoLabs prototyping.

1/13: Open source LLMs
How to use many open source LLMs on your device
python.langchain.com/docs/guides/lo…
Image 2/13: Agents
How to quickly test various types of agents
python.langchain.com/docs/use_cases…
Image
Aug 23, 2023 9 tweets 4 min read
GPT-3.5 and LLaMA2 fine-tuning guides 🪄

Considering LLM fine-tuning? Here's two new CoLab guides for fine-tuning GPT-3.5 & LLaMA2 on your data using LangSmith for dataset management and eval. We also share our lessons learned in a blog post here:

blog.langchain.dev/using-langsmit…
Image ... 1/ When to fine-tune? Fine-tuning is not advised for teaching an LLM new knowledge (see references from @OpenAI and others in our blog post). It's best for tasks (e.g., extraction) focused on "form, not facts":
anyscale.com/blog/fine-tuni…
Aug 12, 2023 6 tweets 3 min read
Code understanding 🖥️🧠

LLMs excel at code analysis / completion (e.g., Co-Pilot, Code Interpreter, etc). Part 6 of our initiative to improve @LangChainAI docs covers code analysis, building on contributions of @cristobal_dev + others:
https://t.co/2DsxdjbYeypython.langchain.com/docs/use_cases…
Image 1/ Copilot and related tools (e.g., @codeiumdev) have dramatically accelerated dev productivity and shown that LLMs excel at code understanding / completion
Aug 8, 2023 8 tweets 4 min read
Text-to-SQL 📒

LLMs unlock a natural language interface with structured data. Part 4 of our initiative to improve @LangChainAI docs shows how to use LLMs to write / execute SQL queries w/ chains and agents. Thanks @manuelsoria_ for work on the docs:
https://t.co/CyOqp5I3TMpython.langchain.com/docs/use_cases…
Image 1/ Text-to-SQL is an excellent LLM use-case: many ppl can describe what they want in natural language, but have difficultly mapping that to a specific SQL queries. LLMs can bridge this gap, e.g., see:
https://t.co/b0NMkHPe9xarxiv.org/pdf/2204.00498…
Image
Aug 5, 2023 10 tweets 5 min read
Extraction 📚➡️🗒️

Getting structured LLM output is hard! Part 3 of our initiative to improve @LangChainAI docs covers this w/ functions and parsers (see @GoogleColab ntbk). Thanks to @fpingham for improving the docs on this:

https://t.co/bMjFmCSZM3python.langchain.com/docs/use_cases…
Image 1/ Getting LLMs to produce structured (e.g., JSON) output is challenge, often requiring tedious prompt eng:
Aug 3, 2023 6 tweets 3 min read
LLM Use Case: Summarization 📚🧠

We've kicked off a community driven effort to improve @LangChainAI docs, starting w/ popular use cases. Here is the new use case doc on Summarization w/ @GoogleColab notebook for easy testing ...
https://t.co/e6QYl8pEsHpython.langchain.com/docs/use_cases…
Image 1/ Context window stuffing: adding full documents into LLM context window for summarization is easiest approach and increasingly feasible as LLMs (e.g., @AnthropicAI Claude w/ 100k token window) get larger context windows (e.g., fits hundreds of pages).
https://t.co/aClREUqtPd
Image
Aug 2, 2023 5 tweets 4 min read
Recent updates @LangChainAI data ecosystem 🦜⛓️: 3 new loaders, 2 new storage options, new loader / retriever for web research ... Image ... great addition from @RubenBarraganP that connects files in @Dropbox to the LangChain ecosystem:


... similarly, @Huawei unstructured data storage can be connected:
https://t.co/Ir3HLgtgAgpython.langchain.com/docs/integrati…
python.langchain.com/docs/integrati…
Jul 26, 2023 5 tweets 3 min read
Web research is a great LLM use case. @hwchase17 and I are releasing a new retriever to automate web research that is simple, configurable (can run in private-mode w/ llamav2, GPT4all, etc), & observable (use LangSmith to see what it's doing). Blog:
https://t.co/LU0PWDmrBEblog.langchain.dev/automating-web…
Image Projects like @assaf_elovic gpt-researcher are great example of research agents; we started with an agent, but landed on a simple retriever that executes LLM-generated search queries in parallel, indexes the loaded pages, and retrieves relevant chunks. LangSmith trace: Image
Jul 20, 2023 5 tweets 3 min read
Possible tip on prompting Llama-2. Try special tokens from llama's generation code (<<SYS>>, <</SYS>>, [INST], [/INST]). Answers seem better w/ them.

LangSmith trace w/o tokens linked (also, image left):

w/ tokens (right):
smith.langchain.com/public/a4de67a…
smith.langchain.com/public/54ed8ae…

Image
Image
h/t @disiok for flagging this: I passed to GPT4 and ask it to design a prompt for retrieval using system message and instruction tokens. I just used the resulting GPT4-designed prompt (image below) :P ...
generation.py
github.com/facebookresear…
Image
Jun 30, 2023 4 tweets 2 min read
Document splitting is common for vector storage / retrieval, but useful context can be lost. @LangChainAI has 3 new "context-aware" text splitters that keep metadata about where each split came from. Works for code (py, js) c/o @cristobal_dev, PDFs c/o @CorranMac, and Markdown .. Image .. the newest @LangChainAI release (v0.0.220) has a contribution from @CorranMac that uses Grobid for context-aware splitting of PDFs; great for scientific articles or large docs. Each text chunks retains the section of the paper it came from. See here .. https://t.co/tqKedGTwLCpython.langchain.com/docs/modules/d…
Image
Jun 14, 2023 6 tweets 6 min read
@karpathy's YouTube course is one of the best educational resources on LLMs. In this spirit, I built a Q+A assistant for the course and open soured the repo, which shows how to use @LangChainAI to easily build and evaluate LLM apps karpathy-gpt.vercel.app
github.com/rlancemartin/k… 1/ @LangChainAI has a new document loader for YouTube urls. Simply pass in urls and get the resulting text back (using @OpenAI whisper API). The repo shows how to use this this to get the text for all @karpathy course videos in a few lines of code ...
Jun 7, 2023 4 tweets 3 min read
YouTube is a great source of content for LLM chat / Q+A apps. I recently added a @LangChainAI document loader to simplify this: pass in YouTube video urls, get back text documents that can be easily embedded for retrieval QA or chat (see below)🪄
github.com/hwchase17/lang… Image @karpathy inspired this work a while ago w/ Whisper transcriptions of the @lexfridman pod. I used a similar pipeline to build a Q+A app, lex-gpt. @OpenAI Whisper API simplified the pipeline, so I wrapped it all in an easy-to-use @LangChainAI doc loader ..

May 31, 2023 7 tweets 5 min read
Retrieval for QA systems is hard. I'm open sourcing a tool I've been using to easily evaluate custom and/or advanced retrievers (e.g., SelfQueryRetriever). It runs locally as a lightweight app using @LangChainAI. Here are some things I've used it for ...
github.com/langchain-ai/a… ... there's a lot of newer retriever architectures, such as SelfQueryRetriever. As mentioned by @hwchase17, this will use an LLM to extract: 1) The `query` string to use for vector search 2) A metadata filter to pass in as well ...
Image
May 30, 2023 7 tweets 7 min read
There's a lot of questions abt smaller, open source LLMs vs larger, closed models for tasks like question answering. So, we added @MosaicML MPT-7B & @lmsysorg Vicuna-13b to @LangChainAI auto-evaluator. You test them on your own Q+A use-case ... autoevaluator.langchain.com/playground Image ... great pod w/ @jefrankle @swyx @abhi_venigalla on MPT-7B, so used auto-evaluator to benchmark it on a test set of 5 Q+A pairs from the GPT3 paper. Results are close to larger models and it's very fast (kudos to @MosaicML inference team!) ...
Image
May 16, 2023 5 tweets 3 min read
I've seen questions about @AnthropicAI's 100k context window: can it compete w/ vectorDB retrieval? We added Claude-100k to the @LangChainAI auto-evaluator app so you can compare for yourself (details showing Claude-100k results below). App is here:
autoevaluator.langchain.com/playground Image .. there are many retrieval approaches for Q+A that fetch docs relevant to a question followed by LLM answer synthesis. But as LLM context window grows, retrieval may not be needed since you can just stuff the full doc(s) into the prompt (red in the diagram) .. Image
May 1, 2023 5 tweets 4 min read
Here's a free-to-use, open-source app for evaluating LLM question-answer chains. Assemble modular LLM QA chain components w/ @LangChainAI. Use LLMs to generate a test set and grade the chain.
Built by 🛠️ - me, @sfgunslinger, @thebengoldberg
Link - autoevaluator.langchain.com Image Inspired by 1) @AnthropicAI - model-written eval sets and 2) @OpenAI - model-graded evaluation. This app combines both of these ideas into a single workspace, auto-generating a QA test set for a given input doc and auto-grading the result of the user-specified QA chain. Image
Apr 16, 2023 4 tweets 4 min read
I'm open-sourcing a tool I use to auto-evaluate LLM Q+A chains: given inputs docs, app will use an LLM to auto-generate a Q+A eval set, run on a user-selected chain (model, retriever, etc) built w/ @LangChainAI, use an LLM to grade, and store each expt. github.com/PineappleExpre… There are many model (@OpenAI, @AnthropicAI, @huggingface), retriever (SVM, vectorstores), and parameter (chunk size, etc) options. This lets you easily assemble combinations and evaluate them for Q+A (scoring and latency) on your docs of interest ... Image
Mar 29, 2023 6 tweets 6 min read
Finally got GPT4 API access, so built an app to test it: here's Q+A assistant for all 121 episodes of the @theallinpod. You can ask any question abt the shows. It uses @OpenAI whisper model for audio -> text, @pinecone, @LangChainAI. App is here: besties-gpt.fly.dev There is perf v latency trade-off for GPT4 vs ChatGPT (3.5-turbo). I used @LangChainAI to generate a QA eval set of 52 questions (w/ manual curation) and used a LLM to score them. GPT4 is better, but they are close (left below) and GPT4 is ~2x slower (right, w/ k=sim search docs)
Mar 20, 2023 6 tweets 7 min read
I built an app that uses ChatGPT for question-answering over all 365 episodes of the @lexfridman podcast. Uses @OpenAI Whisper model for audio-to-text and @LangChainAI. All code is open source (linked below). App: lex-gpt.fly.dev I used @karpathy's Whisper transcriptions for the first 325 episodes and generated the rest. I used @LangChainAI for splitting transcriptions / writing embeddings to @pinecone, LangChainJS for VectorDBQA, and @mckaywrigley's UI template. Some notes below ... Image
Mar 6, 2023 4 tweets 3 min read
Tested ChatVectorDB yesterday. Pretty cool: below is code for a lightweight / open source ChatGPT enabled chat app over any uploaded PDF. Retains conversational history naturally, as expected ...
Image ... using @streamlit, which is useful for fast prototyping / deployment w/ minimal code here:
github.com/PineappleExpre…