LangChain Profile picture
Jun 12, 2023 11 tweets 6 min read Read on X
🦜🔗0.0.198 adds a lot of functionality to every step of the ingestion process!

+2 Document Loader (@airtable, XML)
+2 Text Splitter features
+2 Embedding providers
+3 Vectorstores

Lots of detail, so buckle up👇 Image
🏓Airtable Loader

@airtable is a super popular platform for storing and collecting data (we've used it internally for meetup sign ups)

You can now easily load data from there with our new document loader!

Docs: python.langchain.com/en/latest/modu… Image
✖️ XML Loader

s/o to our friends at @UnstructuredIO for adding an XML loader!

@mrobinson0623 you're the best

Docs: python.langchain.com/en/latest/modu… Image
🤗 HuggingFace tokenizer Text Splitter

This text splitter uses @huggingface tokenizers to count the tokens in each chunk, and splits it that way

Thanks Jens Madsen for adding!

Docs: python.langchain.com/en/latest/modu…
🤩 add_start_index

This addition from `felpigeon` helps to keep track of the chunks you create

It lets you include the starting position of each chunk within the original document in the metadata

Docs: python.langchain.com/en/latest/modu… Image
💨Dashscope Embeddings

Dashscope is DAMO Academys multilingual text unified vector model. It caters to multiple mainstream languages worldwide.

h/t wenmeng zhou

Docs: python.langchain.com/en/latest/modu… Image
🫢Embaas Embeddings

embaas is a fully managed NLP API service that offers features like embedding generation, document text extraction, document to embeddings and more

Thanks to Julius Lipp for adding

Docs: python.langchain.com/en/latest/modu… Image
🧑‍⚖️AwaDB Vectorstore

AwaDB is an AI Native database for the search and storage of embedding vectors used by LLM Applications.

Thanks to ljeagle for adding

Docs: python.langchain.com/en/latest/modu… Image
🕳️Hologres Vectorstore

Hologres is a unified real-time data warehousing service developed by Alibaba Cloud

Thanks Changgeng Zhao for adding

Docs: python.langchain.com/en/latest/modu… Image
🟦Azure Cognitive Search Vectorstore

And finally, the biggest of them all - an integration with Azure Cognitive Search's new vectorstore functionality (still in beta)

Thanks to Fabrizio Ruocco for all his work in merging in!

Docs: python.langchain.com/en/latest/modu… Image
Which of these steps is the most challenging and deserves more love?

Or is there another step that we should improve on?

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with LangChain

LangChain Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LangChainAI

Nov 14, 2024
We asked, you answered — our State of AI Agents Report is here! 🤖✨

We surveyed 1300+ industry professionals, from developers to business leaders, on how they're using AI agents today — and the results are in.

What are the top use cases for agents? The biggest challenges when building agents? And who's finding success after deploying their agents to production?

Read the full report ➡️ langchain.com/stateofaiagents

Here's 5 key insights in the thread below 🧵👇
1⃣ Agent adoption is a coin toss, but nearly everyone has plans for it.

About 50% of respondents have agents in production, with mid-sized companies leading the charge. That number is poised to grow, with 78% planning to implement AI agents soon. Image
2⃣ Research and summarization is the leading agent use case among respondents (at 58%), followed by personal assistance / productivity (54%) and customer service (46%).

AI agents are taking over time-consuming tasks—whether it’s more repetitive tasks for productivity, or handling complex information retrieval and data analysis.Image
Read 7 tweets
Feb 6, 2024
⛴️ WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

WebVoyager is a new kind of web-browsing agent, developed by Hongliang He, @wyu_nd, et. al.

Powered by large multi-modal models, like GPT-4V, it uses browser screenshots to conduct research, analyze images, and perform other tasks.

Older text-based web-browsing agents often fail to handle interactive web elements. Naive vision-based methods can struggle to use tools effectively.

WebVoyager uses “Set-of-mark” prompting to overlay the DOM with labeled bounding boxes and provide better guidance for the agent.

Check out the tutorial on how to build WebVoyager here: Image
2/ To jump straight to the code, check out the links below.

Python Code: 
WebVoyager Paper: 
Set-of-Mark Paper: github.com/langchain-ai/l…
arxiv.org/abs/2401.13919
arxiv.org/abs/2310.11441
3/ Developing agents like WebVoyager is easier with easier with LangSmith.

Sign up at  to get started.smith.langchain.com
Read 4 tweets
Dec 20, 2023
⚙️ Agents are the “killer” LLM app, but building and evaluating agents is hard.

A huge part of agents is tool use, but there aren't enough open-source tool use benchmarks out there.

Today, we are excited to release four new test environments for benchmarking LLMs’ ability to effectively use tools.

📖

🧵 Below are some of our preliminary resultsblog.langchain.dev/benchmarking-a…Image
2/ Task 1: Typewriter (1 tool)

Agent has 1 tool (a typewriter). It has to type the provided word.

🔗langchain-ai.github.io/langchain-benc…Image
3/ Typewriter results

None of the agents are perfect. GPT-4 had a hard time typing "keyboard" and "head" 🤭

🔗 smith.langchain.com/public/ff14ecb…Image
Read 12 tweets
Oct 18, 2023
⭐️ Prompt Trends + Highlights ⭐️

We recently launched the LangChain Hub to support prompt sharing + workshopping.

We collected hundreds of prompts across many use-cases.

Here, we distill major themes and highlight interesting examples.

Blog:
blog.langchain.dev/the-prompt-lan…
Image
Reasoning 🧠

Simple instructions ("think step by step") can improve many reasoning tasks.

Great thread from @_jasonwei w/ trade-offs:

Recent @GoogleDeepMind work (img below) shows accuracy across many such instructions:

arxiv.org/abs/2309.03409
Image
Writing ✍️

@mattshumer_ has shared some of our favorite prompts to improve your writing:



Also nice prompts for content generation (tests c/o @GregKamradt, threads c/o @HardKothari):

smith.langchain.com/hub/rlm/matt-s…
smith.langchain.com/hub/rlm/matt-s…
smith.langchain.com/hub/gregkamrad…
smith.langchain.com/hub/hardkothar…
Read 14 tweets
Oct 12, 2023
🏓Introduction LangServe

The best way to deploy your LangChains

📤Input/Output schema
📃/docs endpoint
🔠invoke/batch/stream endpoints
🎏/stream_log endpoint for streaming intermediate steps
🛠️LangSmith Integration

Used to power ChatLangChain and WebLangChain

Blog post and 🧵

Image
Image
Image
Github Repo for the package:

We cover a lot of the motivation and features in a blog post here:

We'll pull out a lot of the most important points into a thread heregithub.com/langchain-ai/l…
blog.langchain.dev/introducing-la…
⏫Improvements to LangChain Expression Language

A lot of the features we were able to implement were made possible by improvements to LangChain Expression Language

We highlight the most important ones, including better streaming, input/output schemas, intermediate results
Read 13 tweets
Sep 27, 2023
🚀Re-launching Chat LangChain

To help navigate the many features of 🦜🔗, we asked the amazing @mollycantillon to revamp the Chat LangChain chatbot.

Read about how she used LCEL, indexed our docs, deployed with FastAPI, ran evals, and more:

Highlights👇blog.langchain.dev/building-chat-…
Ingestion

At a high level, the ingestion pipeline looks like this:
- Use document loaders to scrape the Python docs and API reference
- Chunk
- Using Indexing API to sync latest docs <> vecstore
- Use Github Actions to run ingestion daily Image
QA

If we've scraped and chunked our docs well, a lot of the hard work is done for us by the time we reach the actual QA. Here we just need to:
- Rephrase latest user question given context of current chat session
- Retrieve from vecstore using rephrased q
- Synthesize answer Image
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(