Lance Martin Profile picture
Aug 8, 2023 8 tweets 4 min read Read on X
Text-to-SQL 📒

LLMs unlock a natural language interface with structured data. Part 4 of our initiative to improve @LangChainAI docs shows how to use LLMs to write / execute SQL queries w/ chains and agents. Thanks @manuelsoria_ for work on the docs:
https://t.co/CyOqp5I3TMpython.langchain.com/docs/use_cases…
Image
1/ Text-to-SQL is an excellent LLM use-case: many ppl can describe what they want in natural language, but have difficultly mapping that to a specific SQL queries. LLMs can bridge this gap, e.g., see:
https://t.co/b0NMkHPe9xarxiv.org/pdf/2204.00498…
Image
2/ create_sql_query_chain( ) maps from natural language to a SQL query: pass the question and the database into the chain, and get SQL out. Run the query on the database easily: Image
3/ The LangSmith trace is a great way to see that the chain employs ideas from the paper above: give LLM a CREATE TABLE description for each table and and three example rows in a SELECT statement. This gives the LLM context about the db structure:
https://t.co/Pqu86RFcJPsmith.langchain.com/public/c8fa52e…
Image
4/ Extending this, SQLDatabaseChain will generate the query, execute if, and also synthesize the result in natural language. This creates a natural language wrapper around a SQL DB w/ input and output:
https://t.co/avT4kRVIiosmith.langchain.com/public/7f202a0…
Image
5/ Finally, SQL agents can be used for more complex tasks (multi-query) and can recover from errors. The trace shows how a ReAct agent can use a toolkit of SQL operations (read table, write query, run query) :
https://t.co/zUrXja5bzVsmith.langchain.com/public/a86dbe1…
Image
6/ For more, see blog post and webinar w/ @fpingham and @JonZLuo:

https://t.co/dtZQY4gAMLblog.langchain.dev/llms-and-sql/
7/ Finally, for more on the community initiative to improve the docs, see Part 3 on extraction:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lance Martin

Lance Martin Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @RLanceMartin

Mar 22, 2024
Gave this short talk on RAG vs long context LLMs at a few meetups recently. Tries to pull together threads from a few recent projects + papers I really like.

Just put on YT, a few highlights w papers below ...
1/ Can long context LLMs retrieve & reason over multiple facts as a RAG system does? @GregKamradt and I dug into this w/ multi-needle-in-a-haystack on GPT4. Retrieval is not guaranteed: worse for more needles, worse at doc start, worse w/ reasoning.
Image
2/ Nice paper (@adamlerer & @alex_peys) suggests this may be due to recency bias from training: recent tokens are typically most informative for predicting the next one. Not good for context augmented generation.
arxiv.org/pdf/2310.01427…
Image
Read 11 tweets
Aug 25, 2023
Check out these new guides for 13 popular LLM use-cases. Part of a major community effort to improve the @LangChainAI docs + add CoLabs prototyping.

1/13: Open source LLMs
How to use many open source LLMs on your device
python.langchain.com/docs/guides/lo…
Image
2/13: Agents
How to quickly test various types of agents
python.langchain.com/docs/use_cases…
Image
3/13: RAG (retrieval augmented generation)
How to do RAG at multiple levels of abstraction
python.langchain.com/docs/use_cases…
Image
Read 14 tweets
Aug 23, 2023
GPT-3.5 and LLaMA2 fine-tuning guides 🪄

Considering LLM fine-tuning? Here's two new CoLab guides for fine-tuning GPT-3.5 & LLaMA2 on your data using LangSmith for dataset management and eval. We also share our lessons learned in a blog post here:

blog.langchain.dev/using-langsmit…
Image
... 1/ When to fine-tune? Fine-tuning is not advised for teaching an LLM new knowledge (see references from @OpenAI and others in our blog post). It's best for tasks (e.g., extraction) focused on "form, not facts":
anyscale.com/blog/fine-tuni…
... 2/ With this in mind, we fine-tuned LLaMA-7b-chat & GPT-3.5-turbo for knowledge graph triple extraction (see details in blog post and CoLab). Notebooks here:
LLaMA CoLab:
GPT-3.5-turbo CoLab:

colab.research.google.com/drive/1tpywvzw…
colab.research.google.com/drive/1YCyDHPS…
Read 9 tweets
Aug 12, 2023
Code understanding 🖥️🧠

LLMs excel at code analysis / completion (e.g., Co-Pilot, Code Interpreter, etc). Part 6 of our initiative to improve @LangChainAI docs covers code analysis, building on contributions of @cristobal_dev + others:
https://t.co/2DsxdjbYeypython.langchain.com/docs/use_cases…
Image
1/ Copilot and related tools (e.g., @codeiumdev) have dramatically accelerated dev productivity and shown that LLMs excel at code understanding / completion
2/ But, RAG for QA/chat on codebases is challenging b/c text splitters may break up elements (e.g., fxns, classes) and fail to preserve context about which element each code chunk comes from.
Read 6 tweets
Aug 5, 2023
Extraction 📚➡️🗒️

Getting structured LLM output is hard! Part 3 of our initiative to improve @LangChainAI docs covers this w/ functions and parsers (see @GoogleColab ntbk). Thanks to @fpingham for improving the docs on this:

https://t.co/bMjFmCSZM3python.langchain.com/docs/use_cases…
Image
1/ Getting LLMs to produce structured (e.g., JSON) output is challenge, often requiring tedious prompt eng:
2/ Functions (e.g., using OpenAI models) have been a great way to tackle this problem, as shown by the work of @jxnlco and others. LLM calls a function and returns output that follows a specified schema.
wandb.ai/jxnlco/functio…
Read 10 tweets
Aug 3, 2023
LLM Use Case: Summarization 📚🧠

We've kicked off a community driven effort to improve @LangChainAI docs, starting w/ popular use cases. Here is the new use case doc on Summarization w/ @GoogleColab notebook for easy testing ...
https://t.co/e6QYl8pEsHpython.langchain.com/docs/use_cases…
Image
1/ Context window stuffing: adding full documents into LLM context window for summarization is easiest approach and increasingly feasible as LLMs (e.g., @AnthropicAI Claude w/ 100k token window) get larger context windows (e.g., fits hundreds of pages).
https://t.co/aClREUqtPd
Image
2/ Embed-cluster-sample: @GregKamradt demod a cool approach w/ @LangChainAI to chunk, embed, cluster, and sample representative chunks that are passed to the LLM context window. A nice approach to save cost by reducing tokens sent to the LLM.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(