Lance Martin Profile picture
Aug 5 10 tweets 5 min read Twitter logo Read on Twitter
Extraction 📚➡️🗒️

Getting structured LLM output is hard! Part 3 of our initiative to improve @LangChainAI docs covers this w/ functions and parsers (see @GoogleColab ntbk). Thanks to @fpingham for improving the docs on this:

https://t.co/bMjFmCSZM3python.langchain.com/docs/use_cases…
Image
1/ Getting LLMs to produce structured (e.g., JSON) output is challenge, often requiring tedious prompt eng:
2/ Functions (e.g., using OpenAI models) have been a great way to tackle this problem, as shown by the work of @jxnlco and others. LLM calls a function and returns output that follows a specified schema.
wandb.ai/jxnlco/functio…
3/ @LangChainAI support function calling for extraction, requiring only specification of the desired output schema (e.g., JSON, Pydantic class, etc).
Image
Image
4/ We can look under the hood using LangSmith trace, seeing that the prompt instructs the LLM (in this case OpenAI) to call "information_extraction" (function defined) here:

Trace:
https://t.co/gf6DW3r8RM https://t.co/JAvzqpEh8sgithub.com/langchain-ai/l…
smith.langchain.com/public/72bc320…
Image
5/ The docs provide some examples for using functions, which can shine in cases where we don't specifically a priori all the fields we want to extract. Whereas a parser requires enumeration of each attribute, functions work w/ fields such as "give me extra information": Image
6/ The docs also cover parsers, which are useful esp for LLMs that don't yet support function calling. We can use a LangSmith trace to see that parsers use few shot prompting under the hood:
https://t.co/ZMLQpdquTRsmith.langchain.com/public/8e3aa85…
Image
7/ It's worth noting that more LLMs are getting support for function calling (e.g., @AnthropicAI) :


Llama2 has been fine-tuned to support it as well:
https://t.co/ISAY1MsnF8
8/ For more in-depth, see past webinars on parsing, extraction, and function calling w/ @GregKamradt, @jerwelborn, @veryboldbagel, @fpingham, @jxnlco


https://t.co/uRbFgMnxW1
9/ And for more on the community initiative to improve the docs, see Part 2 on summarization:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lance Martin

Lance Martin Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @RLanceMartin

Aug 3
LLM Use Case: Summarization 📚🧠

We've kicked off a community driven effort to improve @LangChainAI docs, starting w/ popular use cases. Here is the new use case doc on Summarization w/ @GoogleColab notebook for easy testing ...
https://t.co/e6QYl8pEsHpython.langchain.com/docs/use_cases…
Image
1/ Context window stuffing: adding full documents into LLM context window for summarization is easiest approach and increasingly feasible as LLMs (e.g., @AnthropicAI Claude w/ 100k token window) get larger context windows (e.g., fits hundreds of pages).
https://t.co/aClREUqtPd
Image
2/ Embed-cluster-sample: @GregKamradt demod a cool approach w/ @LangChainAI to chunk, embed, cluster, and sample representative chunks that are passed to the LLM context window. A nice approach to save cost by reducing tokens sent to the LLM.
Read 6 tweets
Aug 2
Recent updates @LangChainAI data ecosystem 🦜⛓️: 3 new loaders, 2 new storage options, new loader / retriever for web research ... Image
... great addition from @RubenBarraganP that connects files in @Dropbox to the LangChain ecosystem:


... similarly, @Huawei unstructured data storage can be connected:
https://t.co/Ir3HLgtgAgpython.langchain.com/docs/integrati…
python.langchain.com/docs/integrati…
... there's a new loader for etherscan transactions. Folks like @punk9059 may have a pulse on applications w/in the larger crypto community. Always interesting to learn about:
python.langchain.com/docs/integrati…
Read 5 tweets
Jul 26
Web research is a great LLM use case. @hwchase17 and I are releasing a new retriever to automate web research that is simple, configurable (can run in private-mode w/ llamav2, GPT4all, etc), & observable (use LangSmith to see what it's doing). Blog:
https://t.co/LU0PWDmrBEblog.langchain.dev/automating-web…
Image
Projects like @assaf_elovic gpt-researcher are great example of research agents; we started with an agent, but landed on a simple retriever that executes LLM-generated search queries in parallel, indexes the loaded pages, and retrieves relevant chunks. LangSmith trace: Image
The retriever is compatible w/ private workflows. Here's a trace running on my laptop (~50 tok/sec) w/ Llama-v2 and @nomic_ai GPT4all embeddings + @trychroma: LLM will generate search queries and also be used for the final answer generation. See docs: https://t.co/I5V51LVdOFpython.langchain.com/docs/modules/d…
Image
Read 5 tweets
Jun 30
Document splitting is common for vector storage / retrieval, but useful context can be lost. @LangChainAI has 3 new "context-aware" text splitters that keep metadata about where each split came from. Works for code (py, js) c/o @cristobal_dev, PDFs c/o @CorranMac, and Markdown .. Image
.. the newest @LangChainAI release (v0.0.220) has a contribution from @CorranMac that uses Grobid for context-aware splitting of PDFs; great for scientific articles or large docs. Each text chunks retains the section of the paper it came from. See here .. https://t.co/tqKedGTwLCpython.langchain.com/docs/modules/d…
Image
.. earlier this week, @cristobal_dev added context aware splitting for .js and .py, which will keep the class or function that each split comes from. He also added helpful documentation on usage here ..
python.langchain.com/docs/modules/d…
Read 4 tweets
Jun 14
@karpathy's YouTube course is one of the best educational resources on LLMs. In this spirit, I built a Q+A assistant for the course and open soured the repo, which shows how to use @LangChainAI to easily build and evaluate LLM apps karpathy-gpt.vercel.app
github.com/rlancemartin/k…
1/ @LangChainAI has a new document loader for YouTube urls. Simply pass in urls and get the resulting text back (using @OpenAI whisper API). The repo shows how to use this this to get the text for all @karpathy course videos in a few lines of code ...
2/ With the text, the repo then shows how to use @LangChainAI auto-evaluator to prototype different chains / parameters w/o any code. You can use the hosted app for this: autoevaluator.langchain.com/playground
Also, all code is open source for this tool:
github.com/langchain-ai/a… Image
Read 6 tweets
Jun 7
YouTube is a great source of content for LLM chat / Q+A apps. I recently added a @LangChainAI document loader to simplify this: pass in YouTube video urls, get back text documents that can be easily embedded for retrieval QA or chat (see below)🪄
github.com/hwchase17/lang… Image
@karpathy inspired this work a while ago w/ Whisper transcriptions of the @lexfridman pod. I used a similar pipeline to build a Q+A app, lex-gpt. @OpenAI Whisper API simplified the pipeline, so I wrapped it all in an easy-to-use @LangChainAI doc loader ..

.. see this notebook for example going from YouTube urls to a chat app in ~10 lines of code. You can find this feature in the latest @LangChainAI releases (> v0.0.192).
github.com/rlancemartin/l… Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(