Akshay 🚀 Profile picture
Apr 19, 2024 11 tweets 4 min read Read on X
Let's build a RAG app using MetaAI's Llama-3 (100% local):
Before we begin, take a look at what we're about to create!

Here's what you'll learn:

- @Ollama for locally serving a LLM (Llama-3)
- @Llama_Index for orchestration
- @Streamlit for building the UI
- @LightningAI for development & hosting

Let's go! 🚀
The architecture diagram presented below illustrates some of the key components & how they interact with each other!

It will be followed by detailed descriptions & code for each component: Image
1️⃣ & 2️⃣ : Loading the knowledge base

A knowledge base is a collection of relevant and up-to-date information that serves as a foundation for RAG. In our case it's the docs stored in a directory.

Here's how you can load it as document objects in LlamaIndex: Image
3️⃣ The embedding model

Embedding is a meaningful representation of text in form of numbers.

The embedding model is responsible for creating embeddings for the document chunks & user queries.

We are using @SnowflakeDB's `arctic-embed-m`, one of the best model in it's class. Image
4️⃣ Indexing & storing

Embeddings created by embedding model are stored in a vector store that offers fast retrieval and similarity search by creating an index over our data.

By default, LlamaIndex provides a in-memory vector store that’s great for quick experimentation. Image
5️⃣ Creating a prompt template

A custom prompt template is use to refine the response from LLM & include the context as well: Image
6️⃣ & 7️⃣ Setting up a query engine

The query engine takes a query string & use it to fetch relevant context and then sends them both as a prompt to the LLM to generate a final natural language response.

Here's how you set it up: Image
8️⃣ The Chat interface

We create a UI using Streamlit to provide a chat interface for our RAG application.

The code for this & all we discussed so far is shared in the next tweet!

Check this out👇 Image
I used @LightningAI⚡️ Studio for developing this application!

You will find all the code & everything you need to run it! ✨

Clone a FREE studio now & take it for a spin...👇
lightning.ai/lightning-ai/s…
If you interested in:

- Python 🐍
- Machine Learning 🤖
- AI Engineering ⚙️

Find me → @akshay_pachaar ✔️
Everyday, I share tutorials on above topics!

Cheers! 🥂

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Akshay 🚀

Akshay 🚀 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @akshay_pachaar

Jul 14
ML researchers just built a new ensemble technique.

It even outperforms XGBoost, CatBoost, and LightGBM.

Here's a complete breakdown (explained visually):
For years, gradient boosting has been the go-to for tabular learning.

TabM is a parameter-efficient ensemble that provides:
- The speed of an MLP.
- The accuracy of GBDT.

The visual below explains how it works.

Let's dive in!
In tabular ML:

- MLPs are simple and fast, but usually underperform on tabular data.
- Deep ensembles are accurate but bloated and slow.
- Transformers are powerful but rarely practical on tables.

The image below depicts an MLP ensemble, and it looks heavily parameterized👇
Read 8 tweets
Jul 12
A Crash Course on Building AI Agents!

Here's what it covers:

- What is an AI agent
- Connecting agents to tools
- Overview of MCP
- Replacing tools with MCP servers
- Setting up observability and tracing

All with 100% open-source tools!
This course builds agents based on the following definition:

An AI agent uses an LLM as its brain, has memory to retain context, and can take real-world actions through tools, like browsing web, running code, etc.

In short, it thinks, remembers, and acts.
100% open-source tech stack:

- @crewAIInc for building MCP ready agents
- @zep_ai Graphiti to add human like memory
- @Cometml Opik for observability and tracing.

You can find the entire code here: github.com/patchy631/ai-e…
Read 5 tweets
Jul 11
MCP is on fire.

AI agents can now talk to real world tools, apps and actually get stuff done.

This changes everything.

Here are 10 amazing examples:
1️⃣ WhatsApp MCP

Exchange images, videos, and voice notes on WhatsApp!

Pair it with the ElevenLabs MCP server for AI-powered transcription & audio messages with 3,000+ voices.

Check this out👇
2️⃣ MCP-powered Agentic RAG

I created this server for Cursor and lets it perform deep web searches, as well as RAG over a specified directory.

Everything from the comforts of your IDE:
Read 12 tweets
Jul 10
90% of Python programmers don't know these 11 ways to declare type hints:
Type hints are incredibly valuable for improving code quality and maintainability.

Today, I'll walk you through 11 must-know principles to declare type hints in just two minutes.

Let's begin! 🚀 Image
1️⃣ Type hints for standard Python objects:

The most basic (and must-know) way to declare type hints for standard Python objects is as follows👇 Image
Read 15 tweets
Jul 7
Temperature in LLMs, clearly explained (with code):
Let's prompt OpenAI GPT-3.5 with a low temperature value twice.

It produces identical responses from the LLM.

Check the response below👇 Image
Now, let's prompt it with a high temperature value.

This time, it produces a gibberish output. Check the output below👇

What is going on here? Let's dive in! Image
Read 9 tweets
Jul 3
7 MCP projects for AI Engineers (with video tutorials):
1️⃣ MCP meets Ollama

An MCP client is a component in an AI app (like Cursor) that establishes connections to external tools.

Learn how to build it 100% locally.

Full walkthrough:
2️⃣ MCP-powered shared memory for Claude Desktop and Cursor

Devs use Claude Desktop and Cursor independently.

Learn how to add a knowledge graph based common memory layer to cross-operate without losing context.

Full walkthrough:
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(