langchain Profile picture
Jun 12 11 tweets 6 min read Twitter logo Read on Twitter
🦜🔗0.0.198 adds a lot of functionality to every step of the ingestion process!

+2 Document Loader (@airtable, XML)
+2 Text Splitter features
+2 Embedding providers
+3 Vectorstores

Lots of detail, so buckle up👇 Image
🏓Airtable Loader

@airtable is a super popular platform for storing and collecting data (we've used it internally for meetup sign ups)

You can now easily load data from there with our new document loader!

Docs: python.langchain.com/en/latest/modu… Image
✖️ XML Loader

s/o to our friends at @UnstructuredIO for adding an XML loader!

@mrobinson0623 you're the best

Docs: python.langchain.com/en/latest/modu… Image
🤗 HuggingFace tokenizer Text Splitter

This text splitter uses @huggingface tokenizers to count the tokens in each chunk, and splits it that way

Thanks Jens Madsen for adding!

Docs: python.langchain.com/en/latest/modu…
🤩 add_start_index

This addition from `felpigeon` helps to keep track of the chunks you create

It lets you include the starting position of each chunk within the original document in the metadata

Docs: python.langchain.com/en/latest/modu… Image
💨Dashscope Embeddings

Dashscope is DAMO Academys multilingual text unified vector model. It caters to multiple mainstream languages worldwide.

h/t wenmeng zhou

Docs: python.langchain.com/en/latest/modu… Image
🫢Embaas Embeddings

embaas is a fully managed NLP API service that offers features like embedding generation, document text extraction, document to embeddings and more

Thanks to Julius Lipp for adding

Docs: python.langchain.com/en/latest/modu… Image
🧑‍⚖️AwaDB Vectorstore

AwaDB is an AI Native database for the search and storage of embedding vectors used by LLM Applications.

Thanks to ljeagle for adding

Docs: python.langchain.com/en/latest/modu… Image
🕳️Hologres Vectorstore

Hologres is a unified real-time data warehousing service developed by Alibaba Cloud

Thanks Changgeng Zhao for adding

Docs: python.langchain.com/en/latest/modu… Image
🟦Azure Cognitive Search Vectorstore

And finally, the biggest of them all - an integration with Azure Cognitive Search's new vectorstore functionality (still in beta)

Thanks to Fabrizio Ruocco for all his work in merging in!

Docs: python.langchain.com/en/latest/modu… Image
Which of these steps is the most challenging and deserves more love?

Or is there another step that we should improve on?

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with langchain

langchain Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LangChainAI

Jun 14
🚨🤖New Agent Release🤖🚨

We can @OpenAI's new function parameter to create a new type of agent (`openai-functions`) now available in Python and JS

Links to documentation a thread on what went into it below 👇 Image
First, if you just want to jump right to trying it out:

Python Docs: github.com/hwchase17/lang…

JS Docs: js.langchain.com/docs/modules/a…
Under the hood, we are heavily utilizing the new `functions` parameter available in the chat model

First, we convert the LangChain tool spec to the function tool spec the expect
Read 8 tweets
Jun 11
🦈Querying graph databases with LLMs can be hard🦈

s/o to @tb_tomaz for adding multiple knobs to turn when using our GraphCypherQAChain

🦵Limit the number of results
🪜Return intermediate results
🎯Return Direct results

🧵 ImageImageImage
🦵Limit the number of results

You can limit the number of results from the Cypher QA Chain using the top_k parameter. The default is 10.

This is useful to make sure you don't pass too many results back to the LLM and overwhelm the context window

Docs: python.langchain.com/en/latest/modu…
🪜Return intermediate results

You can return intermediate steps from the Cypher QA Chain using the return_intermediate_steps parameter

This is useful to get programmatic access to the generated graph query

Docs: python.langchain.com/en/latest/modu…
Read 4 tweets
Jun 10
🧙‍♂️Lord of the Retrievers🧙‍♂️

This cheekily named retriever (more simply called the "Merger Retriever" from @musicaoriginal2 allows for easy combination of MULTIPLE retrievers

This can cause a lot of documents to be returned... so what do you do then?

🧵 Image
First: Why would you even want to combine multiple retrievers?

This can be useful if you have potentially relevant information in multiple sources

You could use an existing method like "Routing" to choose between the retrievers... but what if you want to use all of them?
A second way this can be useful is if you want to use multiple different retrieval strategies on the same data

Don't want to choose between semantic similarity, MMR, and BM25? Now you don't have to!
Read 7 tweets
Jun 9
Lets go into the weekend with 🦜🔗0.0.195 with three big items:

🔟 Baseten LLM Integration - serve ML models
❄️Snowflake Document Loader - load documents so you can index them
🐔AWS Kendra Retriever - use this enterprise-grade search functionality to do grounded generation

🧵
🔟 Baseten LLM Integration

Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently.

Documentation: python.langchain.com/en/latest/modu… Image
❄️Snowflake Document Loader

Load data from your @SnowflakeDB into document objects. This will allow you to split, embed, and then eventually query it with semantic search

Docs: python.langchain.com/en/latest/modu… Image
Read 4 tweets
Jun 8
Big Thursday release with a lot of new feature!

👩‍🎤SingleStore Vector Database
🪨Shale Protocol LLM Integration
🌳Deep Infra Embeddings
🦌Fauna Document Loader
💤Sleep tool
☁️Nebula graph
🧬Unstructured CSV Document Loader

Basically one of everything!
👇
👩‍🎤SingleStore Vector Database

@SingleStoreDB is a high-performance distributed database that supports deployment both in the cloud and on-premises

Really excited to add this as another vector database offering! Thanks to `volodymyr-memsql`

Docs: python.langchain.com/en/latest/modu…
🪨Shale Protocol LLM Integration

@ShaleProtocol provides production-ready inference APIs for open LLMs. It’s a Plug & Play API as it’s hosted on a highly scalable GPU cloud infrastructure.

It has a very generous free tier of 1k daily requests!

Docs: python.langchain.com/en/latest/inte…
Read 8 tweets
Jun 6
Big community 🦜🔧0.0.191 release!

🌲TWO new vectorstores: ClickHouse and Tigris
🧑‍💼Excel Document Loader
📹Multilingual support for YouTube Document Loader
🦜Aviary LLM integration
🩺PubMed integrations (tool and retriever)
🦾Guide for deploying LLMs to production

🧵
🛖ClickHouse vectorstore

@ClickHouseDB is an open-source database for real-time apps and analytics, and recently added ANN search indexes allowing it to be used as a high performance and scalable vector database

Thanks to @haozch for adding this

python.langchain.com/en/latest/modu… Image
🐅Tigris Vectorstore

@TigrisData is an open source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications

Thanks to @adilansari for adding!

Documentation: Image
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(