Lance Martin Profile picture
Jun 7 4 tweets 3 min read Twitter logo Read on Twitter
YouTube is a great source of content for LLM chat / Q+A apps. I recently added a @LangChainAI document loader to simplify this: pass in YouTube video urls, get back text documents that can be easily embedded for retrieval QA or chat (see below)🪄
github.com/hwchase17/lang… Image
@karpathy inspired this work a while ago w/ Whisper transcriptions of the @lexfridman pod. I used a similar pipeline to build a Q+A app, lex-gpt. @OpenAI Whisper API simplified the pipeline, so I wrapped it all in an easy-to-use @LangChainAI doc loader ..

.. see this notebook for example going from YouTube urls to a chat app in ~10 lines of code. You can find this feature in the latest @LangChainAI releases (> v0.0.192).
github.com/rlancemartin/l… Image
I'll be spending a lot of time on Document Loaders at @LangChainAI and welcome any ideas / feedback: 1) what document loaders / integrations are missing? 2) what tutorials are missing? etc.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lance Martin

Lance Martin Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @RLanceMartin

May 30
There's a lot of questions abt smaller, open source LLMs vs larger, closed models for tasks like question answering. So, we added @MosaicML MPT-7B & @lmsysorg Vicuna-13b to @LangChainAI auto-evaluator. You test them on your own Q+A use-case ... autoevaluator.langchain.com/playground Image
... great pod w/ @jefrankle @swyx @abhi_venigalla on MPT-7B, so used auto-evaluator to benchmark it on a test set of 5 Q+A pairs from the GPT3 paper. Results are close to larger models and it's very fast (kudos to @MosaicML inference team!) ...
Image
... @sfgunslinger also deployed Vicuna-13b on @replicatehq and it achieves performance parity w/ the larger models on this test set. Prompt eng may further improve this (v helpful discussion w/ the folks at @replicatehq / @JoeEHoover); we are looking into improving latency ... Image
Read 7 tweets
May 16
I've seen questions about @AnthropicAI's 100k context window: can it compete w/ vectorDB retrieval? We added Claude-100k to the @LangChainAI auto-evaluator app so you can compare for yourself (details showing Claude-100k results below). App is here:
autoevaluator.langchain.com/playground Image
.. there are many retrieval approaches for Q+A that fetch docs relevant to a question followed by LLM answer synthesis. But as LLM context window grows, retrieval may not be needed since you can just stuff the full doc(s) into the prompt (red in the diagram) .. Image
.. we tested on Q+A eval sets from the GPT3 paper and SF Building Codes (75, 51 page PDFs). @AnthropicAI 100k was impressively close in terms of performance to various retrieval methods, but does have higher latency. See details here:
blog.langchain.dev/auto-evaluatio…
Read 5 tweets
May 1
Here's a free-to-use, open-source app for evaluating LLM question-answer chains. Assemble modular LLM QA chain components w/ @LangChainAI. Use LLMs to generate a test set and grade the chain.
Built by 🛠️ - me, @sfgunslinger, @thebengoldberg
Link - autoevaluator.langchain.com Image
Inspired by 1) @AnthropicAI - model-written eval sets and 2) @OpenAI - model-graded evaluation. This app combines both of these ideas into a single workspace, auto-generating a QA test set for a given input doc and auto-grading the result of the user-specified QA chain. Image
You can use it two ways: 1) Demo mode: pre-loaded w/ the @karpathy episode from the @lexfridman pod and a test set. 2) Playground mode: upload your own doc and / or test test. In both cases, you can test QA chain configs and compare results (table and visually).
Read 5 tweets
Apr 16
I'm open-sourcing a tool I use to auto-evaluate LLM Q+A chains: given inputs docs, app will use an LLM to auto-generate a Q+A eval set, run on a user-selected chain (model, retriever, etc) built w/ @LangChainAI, use an LLM to grade, and store each expt. github.com/PineappleExpre…
There are many model (@OpenAI, @AnthropicAI, @huggingface), retriever (SVM, vectorstores), and parameter (chunk size, etc) options. This lets you easily assemble combinations and evaluate them for Q+A (scoring and latency) on your docs of interest ... Image
It uses an LLM to generate the eval set and an LLM as a grader. The prompts can be easily tuned (see code below) and you can ask the LLM grader to explain itself. It uses ideas from some helpful discussion w/ @jerryjliu0 on retrieval scoring ...
github.com/PineappleExpre… Image
Read 4 tweets
Mar 29
Finally got GPT4 API access, so built an app to test it: here's Q+A assistant for all 121 episodes of the @theallinpod. You can ask any question abt the shows. It uses @OpenAI whisper model for audio -> text, @pinecone, @LangChainAI. App is here: besties-gpt.fly.dev
There is perf v latency trade-off for GPT4 vs ChatGPT (3.5-turbo). I used @LangChainAI to generate a QA eval set of 52 questions (w/ manual curation) and used a LLM to score them. GPT4 is better, but they are close (left below) and GPT4 is ~2x slower (right, w/ k=sim search docs)
@LangChainAI eval tooling is v useful. Ntkbs to generate the QA eval set based on the pod episodes and score them (using LLM as a grader) are below. V interested in further ideas on eval and have been discussing w/ @hwchase17. Thoughts welcome! github.com/PineappleExpre…
Read 6 tweets
Mar 20
I built an app that uses ChatGPT for question-answering over all 365 episodes of the @lexfridman podcast. Uses @OpenAI Whisper model for audio-to-text and @LangChainAI. All code is open source (linked below). App: lex-gpt.fly.dev
I used @karpathy's Whisper transcriptions for the first 325 episodes and generated the rest. I used @LangChainAI for splitting transcriptions / writing embeddings to @pinecone, LangChainJS for VectorDBQA, and @mckaywrigley's UI template. Some notes below ... Image
1/ Chunk size has an influence on performance. I used @LangChainAI QAGenerationChain to create an eval set on the @karpathy episode and QAEvalChain to eval across chunk sizes. Interested in ideas to address this e.g., @gpt_index @jerryjliu0. Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(