How to get URL link on X (Twitter) App
2/13: Agents
... 1/ When to fine-tune? Fine-tuning is not advised for teaching an LLM new knowledge (see references from @OpenAI and others in our blog post). It's best for tasks (e.g., extraction) focused on "form, not facts":
1/ Copilot and related tools (e.g., @codeiumdev) have dramatically accelerated dev productivity and shown that LLMs excel at code understanding / completionhttps://twitter.com/karpathy/status/1608895189078380544?s=20
1/ Text-to-SQL is an excellent LLM use-case: many ppl can describe what they want in natural language, but have difficultly mapping that to a specific SQL queries. LLMs can bridge this gap, e.g., see:
1/ Getting LLMs to produce structured (e.g., JSON) output is challenge, often requiring tedious prompt eng:https://twitter.com/goodside/status/1657396491676164096?s=20
1/ Context window stuffing: adding full documents into LLM context window for summarization is easiest approach and increasingly feasible as LLMs (e.g., @AnthropicAI Claude w/ 100k token window) get larger context windows (e.g., fits hundreds of pages).https://twitter.com/AnthropicAI/status/1656700154190389248?s=20
... great addition from @RubenBarraganP that connects files in @Dropbox to the LangChain ecosystem:
Projects like @assaf_elovic gpt-researcher are great example of research agents; we started with an agent, but landed on a simple retriever that executes LLM-generated search queries in parallel, indexes the loaded pages, and retrieves relevant chunks. LangSmith trace:

h/t @disiok for flagging this: I passed to GPT4 and ask it to design a prompt for retrieval using system message and instruction tokens. I just used the resulting GPT4-designed prompt (image below) :P ...
.. the newest @LangChainAI release (v0.0.220) has a contribution from @CorranMac that uses Grobid for context-aware splitting of PDFs; great for scientific articles or large docs. Each text chunks retains the section of the paper it came from. See here .. https://t.co/tqKedGTwLCpython.langchain.com/docs/modules/d…
https://twitter.com/RLanceMartin/status/1666468143445704706?s=20
@karpathy inspired this work a while ago w/ Whisper transcriptions of the @lexfridman pod. I used a similar pipeline to build a Q+A app, lex-gpt. @OpenAI Whisper API simplified the pipeline, so I wrapped it all in an easy-to-use @LangChainAI doc loader ..https://twitter.com/RLanceMartin/status/1637852936238956546?s=20
https://twitter.com/hwchase17/status/1651617956881924096?s=20
... great pod w/ @jefrankle @swyx @abhi_venigalla on MPT-7B, so used auto-evaluator to benchmark it on a test set of 5 Q+A pairs from the GPT3 paper. Results are close to larger models and it's very fast (kudos to @MosaicML inference team!) ... https://twitter.com/karpathy/status/1660824101412548609?s=20
.. there are many retrieval approaches for Q+A that fetch docs relevant to a question followed by LLM answer synthesis. But as LLM context window grows, retrieval may not be needed since you can just stuff the full doc(s) into the prompt (red in the diagram) ..
Inspired by 1) @AnthropicAI - model-written eval sets and 2) @OpenAI - model-graded evaluation. This app combines both of these ideas into a single workspace, auto-generating a QA test set for a given input doc and auto-grading the result of the user-specified QA chain.
https://twitter.com/hwchase17/status/1632416195155726337?s=20
... using @streamlit, which is useful for fast prototyping / deployment w/ minimal code here: