At a high level, the ingestion pipeline looks like this:
- Use document loaders to scrape the Python docs and API reference
- Chunk
- Using Indexing API to sync latest docs <> vecstore
- Use Github Actions to run ingestion daily
QA
If we've scraped and chunked our docs well, a lot of the hard work is done for us by the time we reach the actual QA. Here we just need to:
- Rephrase latest user question given context of current chat session
- Retrieve from vecstore using rephrased q
- Synthesize answer
We’re particularly excited about a centralized hub’s promise to enable:
-Encoding of expertise
-Discoverability of prompts for a variety of models
-Inspectability
-Cross-team collaboration
🧵
Check it out here:
Read more about the motivation and future direction in our blog post here:
As @emollick put it in in his recent article, there’s a need for "prompt libraries that encode the expertise of their best practices into forms that anyone can use."
👩🏼🍳Curate fine-tuning data with LangSmith Cookbook
🦜🛠️LangSmith offers easy-to-use filters for tags, content, and feedback to help curate better training data for your chat models. Makes data wrangling less painful. Cookbook and guide: