Let's build a RAG app using MetaAI's Llama-3 (100% local):
Before we begin, take a look at what we're about to create!
Here's what you'll learn:
- @Ollama for locally serving a LLM (Llama-3)
- @Llama_Index for orchestration
- @Streamlit for building the UI
- @LightningAI for development & hosting
Let's go! 🚀
The architecture diagram presented below illustrates some of the key components & how they interact with each other!
It will be followed by detailed descriptions & code for each component:
1️⃣ & 2️⃣ : Loading the knowledge base
A knowledge base is a collection of relevant and up-to-date information that serves as a foundation for RAG. In our case it's the docs stored in a directory.
Here's how you can load it as document objects in LlamaIndex:
3️⃣ The embedding model
Embedding is a meaningful representation of text in form of numbers.
The embedding model is responsible for creating embeddings for the document chunks & user queries.
We are using @SnowflakeDB's `arctic-embed-m`, one of the best model in it's class.
4️⃣ Indexing & storing
Embeddings created by embedding model are stored in a vector store that offers fast retrieval and similarity search by creating an index over our data.
By default, LlamaIndex provides a in-memory vector store that’s great for quick experimentation.
5️⃣ Creating a prompt template
A custom prompt template is use to refine the response from LLM & include the context as well:
6️⃣ & 7️⃣ Setting up a query engine
The query engine takes a query string & use it to fetch relevant context and then sends them both as a prompt to the LLM to generate a final natural language response.
Here's how you set it up:
8️⃣ The Chat interface
We create a UI using Streamlit to provide a chat interface for our RAG application.
The code for this & all we discussed so far is shared in the next tweet!
Check this out👇
I used @LightningAI⚡️ Studio for developing this application!
You will find all the code & everything you need to run it! ✨
ML researchers just built a new ensemble technique.
It even outperforms XGBoost, CatBoost, and LightGBM.
Here's a complete breakdown (explained visually):
For years, gradient boosting has been the go-to for tabular learning.
TabM is a parameter-efficient ensemble that provides:
- The speed of an MLP.
- The accuracy of GBDT.
The visual below explains how it works.
Let's dive in!
In tabular ML:
- MLPs are simple and fast, but usually underperform on tabular data.
- Deep ensembles are accurate but bloated and slow.
- Transformers are powerful but rarely practical on tables.
The image below depicts an MLP ensemble, and it looks heavily parameterized👇
- What is an AI agent
- Connecting agents to tools
- Overview of MCP
- Replacing tools with MCP servers
- Setting up observability and tracing
All with 100% open-source tools!
This course builds agents based on the following definition:
An AI agent uses an LLM as its brain, has memory to retain context, and can take real-world actions through tools, like browsing web, running code, etc.
In short, it thinks, remembers, and acts.
100% open-source tech stack:
- @crewAIInc for building MCP ready agents
- @zep_ai Graphiti to add human like memory
- @Cometml Opik for observability and tracing.