Considering LLM fine-tuning? Here's two new CoLab guides for fine-tuning GPT-3.5 & LLaMA2 on your data using LangSmith for dataset management and eval. We also share our lessons learned in a blog post here:
... 1/ When to fine-tune? Fine-tuning is not advised for teaching an LLM new knowledge (see references from @OpenAI and others in our blog post). It's best for tasks (e.g., extraction) focused on "form, not facts": anyscale.com/blog/fine-tuni…
... 2/ With this in mind, we fine-tuned LLaMA-7b-chat & GPT-3.5-turbo for knowledge graph triple extraction (see details in blog post and CoLab). Notebooks here:
LLaMA CoLab:
GPT-3.5-turbo CoLab:
... 3/ We used LangSmith for managing / cleaning the train & test set and for eval, using a GPT4 grader. All code is shared in CoLabs. Results comparing few-shot GPT4, GPT3.5 vs fine-tuning are shown below, with grades from 0 (worst) to 100% (best).
... 4/ Lesson 1: always consider approach like few-shot prompting or RAG before fine-tuning. Few-shot prompting of GPT4 scored better than any fine-tuning (w/ a small 1.5k instruction dataset / 7b base-model).
... 5/ Lesson 2: but, we find that fine-tuning a small (7b) base model can outperform a larger generalist (GPT-3.5) w/ few shot prompting, a result also shown recently by @anyscalecompute and others. anyscale.com/blog/fine-tuni…
... 6/ Lesson 3: dataset collection, cleaning is often the most challenging part. We iterated through several public datasets. LangSmith automatically logs project generations w/ a queryable interface to select for and fix poor quality examples for fine-tuning.
... 7/ Lesson 4: eval is challenging. We used LangSmith to run eval and inspect generations. We found base models w/o fine-tuning were verbose / chatty, and in one case hallucinated Homer Simpson as the subject (vs fine-tuned LLMs extracted triples much closer to label format):
Overall, fine-tuning is a powerful tool but should be considered vs prompt eng / RAG. LangSmith can help w/ fine-tuning pain points (data capture / cleaning / eval) and works well w/ fine-tuning recipes (e.g., via @huggingface / @maximelabonne, @OpenAI). mlabonne.github.io/blog/posts/Fin…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
LLMs excel at code analysis / completion (e.g., Co-Pilot, Code Interpreter, etc). Part 6 of our initiative to improve @LangChainAI docs covers code analysis, building on contributions of @cristobal_dev + others:
https://t.co/2DsxdjbYeypython.langchain.com/docs/use_cases…
1/ Copilot and related tools (e.g., @codeiumdev) have dramatically accelerated dev productivity and shown that LLMs excel at code understanding / completion
2/ But, RAG for QA/chat on codebases is challenging b/c text splitters may break up elements (e.g., fxns, classes) and fail to preserve context about which element each code chunk comes from.
LLMs unlock a natural language interface with structured data. Part 4 of our initiative to improve @LangChainAI docs shows how to use LLMs to write / execute SQL queries w/ chains and agents. Thanks @manuelsoria_ for work on the docs:
https://t.co/CyOqp5I3TMpython.langchain.com/docs/use_cases…
1/ Text-to-SQL is an excellent LLM use-case: many ppl can describe what they want in natural language, but have difficultly mapping that to a specific SQL queries. LLMs can bridge this gap, e.g., see:
https://t.co/b0NMkHPe9xarxiv.org/pdf/2204.00498…
2/ create_sql_query_chain( ) maps from natural language to a SQL query: pass the question and the database into the chain, and get SQL out. Run the query on the database easily:
Getting structured LLM output is hard! Part 3 of our initiative to improve @LangChainAI docs covers this w/ functions and parsers (see @GoogleColab ntbk). Thanks to @fpingham for improving the docs on this:
2/ Functions (e.g., using OpenAI models) have been a great way to tackle this problem, as shown by the work of @jxnlco and others. LLM calls a function and returns output that follows a specified schema. wandb.ai/jxnlco/functio…
We've kicked off a community driven effort to improve @LangChainAI docs, starting w/ popular use cases. Here is the new use case doc on Summarization w/ @GoogleColab notebook for easy testing ...
https://t.co/e6QYl8pEsHpython.langchain.com/docs/use_cases…
1/ Context window stuffing: adding full documents into LLM context window for summarization is easiest approach and increasingly feasible as LLMs (e.g., @AnthropicAI Claude w/ 100k token window) get larger context windows (e.g., fits hundreds of pages).
https://t.co/aClREUqtPd
2/ Embed-cluster-sample: @GregKamradt demod a cool approach w/ @LangChainAI to chunk, embed, cluster, and sample representative chunks that are passed to the LLM context window. A nice approach to save cost by reducing tokens sent to the LLM.
... there's a new loader for etherscan transactions. Folks like @punk9059 may have a pulse on applications w/in the larger crypto community. Always interesting to learn about: python.langchain.com/docs/integrati…
Web research is a great LLM use case. @hwchase17 and I are releasing a new retriever to automate web research that is simple, configurable (can run in private-mode w/ llamav2, GPT4all, etc), & observable (use LangSmith to see what it's doing). Blog:
https://t.co/LU0PWDmrBEblog.langchain.dev/automating-web…
Projects like @assaf_elovic gpt-researcher are great example of research agents; we started with an agent, but landed on a simple retriever that executes LLM-generated search queries in parallel, indexes the loaded pages, and retrieves relevant chunks. LangSmith trace:
The retriever is compatible w/ private workflows. Here's a trace running on my laptop (~50 tok/sec) w/ Llama-v2 and @nomic_ai GPT4all embeddings + @trychroma: LLM will generate search queries and also be used for the final answer generation. See docs: https://t.co/I5V51LVdOFpython.langchain.com/docs/modules/d…