Eden Marco Profile picture
May 27 13 tweets 8 min read Twitter logo Read on Twitter
🧵🚀 Following my last thread on "in-context learning", now it's time to explain how we can digest our custom data so that LLM’s 🤖 can use it. Spoiler alert- @LangChainAI 🦜 🔗 and a vector store like @pinecone 🌲 will do all the work for us.

Link:
1/12 This is a laser focused thread 🧵 for devs and software engineers. Even if you have zero AI knowledge (like I did just 6 months ago)- I will be simplifying key data concepts for any gen ai application💡
2/12 Let's talk custom data digestion for LLMs 🤖
First off: Embedding models. These condense complex data into meaningful vectors, capturing relationships and semantic meaning. Think of it as a black box for text ➡ vector conversion. (vector = list of floats) Image
3/11 There are many embedding models, e.g @googlecloud ☁️ VertexAI Embeddings. Each has its pros and cons, considering cost, storage latency, and other factors. The main idea: transform data into vectors representing semantic meaning.
4/12 It is important to note that semantically related vectors will be “located” close to each in the vector space.
How does this work? Well…. a lot of math that I don’t really care about as a dev lol 😂 just use black box and let google vertex do its magic for me! 🎩 Image
5/12 Now, if we take all the HTML documentation of a Python package (like @LangChainAI ) and embed it, we end up with a bunch of vectors, each representing a different doc page. If we used @googlecloud ☁️ VertexAI embeddings, the vectors' size = dimension would be 768. Image
6/12 So what do we do with these vectors? On their own, there isn't much we can do. They're just lists of numbers. But that's where a cloud-based ☁️ managed vectorstore like @pinecone 🌲 comes in, which saves us from storing hundreds of GBs of vectors on our own machines! Image
7/12 In @pinecone 🌲, we can create an index with the same dimension as our embeddings model (768 in this example). Then, we just need to iterate over the vectors we got back from the embedding model and upsert them into the vector store. Simple! 👌 Image
8/12 Vectorstores like @pinecone🌲 offer semantic search functionality to find vectors close to our query vector. These semantically related vectors hold the info our LLM needs to answer. Image
9/12 How does this semantic search work? Complex algorithms, heavy math calculations which again I don’t really want to know as lazy dev, thank you @pinecone engineers for abstracting this for me! 🙏
10/12 Enters now @LangChainAI 🦜 🔗. This open source framework automates this entire process, doing the heavy lifting for us. With just one line of code, you can invoke a function that handles data embedding, and inserts all created vectors into @pinecone . Easy as pie! 🥧 Image
11/12 @LangChainAI 🦜 🔗 has so much more to offer, making our lives easier when developing production grade LLM powered applications. IMO, it's the go-to open-source framework for developing such apps. Check out their docs here: python.langchain.com/en/latest/
12/12
Ready to dive in the entire code? Check out the Github repo -
github.com/emarco177/docu…

Inspired by @hwchase17, the creator of @LangChainAI.
Happy coding, everyone! 💻🚀
#EndOfThread

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Eden Marco

Eden Marco Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @EdenEmarco177

May 19
🧵 Ever wanted to talk with your LLM🤖 on some custom data that it wasn't originally trained on?
@LangChainAI 🦜🔗+ @pinecone 🌲vectorstore will do all the heavy lifting for you. Here's a simplified explanation using a series of 8 illustrations I made.

#GenAI
1/8 Assume you've got documentation of an internal library 📚. When you directly ask the LLM about the library, it can't answer as it wasn't trained on it 🤷‍♂️. No worries! @LangChainAI + @pinecone is here to help 🚀 Image
2/8: We load the entire package documentation into a vectorstore like @pinecone 🌲. This involves transforming the text into vectors, aka 'embeddings'. Now, these vectors hover around, representing our texts 🗂️ Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(