Akshay 🚀 Profile picture
Jun 22 11 tweets 4 min read Read on X
Let's build a real-time Voice RAG Agent, step-by-step:
Before we begin, here's a quick demo of what we're building

Tech stack:

- @Cartesia_AI for SOTA text-to-speech
- @AssemblyAI for speech-to-text
- @LlamaIndex to power RAG
- @livekit for orchestration

Let's go! 🚀
Here's an overview of what the app does:

1. Listens to real-time audio
2. Transcribes it via AssemblyAI
3. Uses your docs (via LlamaIndex) to craft an answer
4. Speaks that answer back with Cartesia

Now let's jump into code!
1️⃣ Set up environment and logging

This ensures we can load configurations from .env and keep track of everything in real time.

Check this out👇 Image
2️⃣ Setup RAG

This is where your documents get indexed for search and retrieval, powered by LlamaIndex.

The agents answers would be grounded to this knowledge base.

Check this out👇 Image
3️⃣ Setup Voice Activity Detection

We also want Voice Activity Detection (VAD) for smooth real-time experience—so we’ll “prewarm” the Silero VAD model.

This helps us detect when someone is actually speaking.

Check this out👇 Image
4️⃣ The VoicePipelineAgent and Entry Point

This is where we bring it all together. The agent:

1. Listens to real-time audio.
2. Transcribes it using AssemblyAI.
3. Crafts an answer with your documents via LlamaIndex.
4. Speaks that answer back using Cartesia.

Check this out 👇 Image
5️⃣ Run the app

Finally, we tie it all together. We run our agent with, specifying the prewarm function and main entrypoint.

That’s it—your Real-Time Voice RAG Agent is ready to roll! Image
The entire code is 100% open-source, you can find it here!

GitHub repo: github.com/patchy631/ai-e…
To summarise, here's how the app works:

1. Listens to real-time audio
2. Transcribes it via AssemblyAI
3. Uses your docs (via LlamaIndex) to craft an answer
4. Speaks that answer back with Cartesia
If you found it insightful, reshare with your network.

Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Akshay 🚀

Akshay 🚀 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @akshay_pachaar

Jun 21
Let's build an MCP-powered audio analysis toolkit:
Before we dive in, here's a demo of what we're building!

Tech stack:
- @AssemblyAI for transcription and audio analysis.
- Claude Desktop as the MCP host.
- @streamlit for the UI

Let's build it!
Here's our workflow:

- User's audio input is sent to AssemblyAI via a local MCP server.
- AssemblyAI transcribes it while providing the summary, speaker labels, sentiment, and topics.
- Post-transcription, the user can also chat with audio.

Let's implement this!
Read 11 tweets
Jun 19
AI agents can finally talk to your frontend!

The AG-UI Protocol bridges the critical gap between AI agents and frontend apps, making human-agent collaboration seamless.

MCP: Agents to tools
A2A: Agents to agents
AG-UI: Agents to users

100% open-source.
Here's the official GitHub repo for @CopilotKit's AG-UI:

(don't forget to star 🌟)github.com/ag-ui-protocol…
Here's a really good illustration of how it works!

Key features:

🤝 Works with LangGraph, LlamaIndex, Agno, CrewAI & AG2
🎯 Event-based protocol with 16 standard event types
💬 Real-time agentic chat with streaming
🧑‍💻 Human-in-the-loop collaboration
💬 ChatUI & Generative UI
Read 4 tweets
Jun 16
Top 4 open-source LLM finetuning libraries!

From single-GPU “click-to-tune” notebooks to trillion-param clusters, these four libraries cover every LLM finetuning scenario.

Understand which one to use, & when...👇 Image
1️⃣ Unsloth

Unsloth makes fine-tuning easy and fast, turning a mid-range GPU into a powerhouse with a simple Colab or Kaggle notebook.

Perfect for hackers and small teams using 12–24 GB GPUs needing quick LoRA experiments without DeepSpeed configs or clusters

Check this out👇
github.com/unslothai/unsl…
2️⃣ Axolotl

Axolotl keeps your entire pipeline in one YAML file—write once, reuse from data prep to serving.

Perfect for teams that crave reproducibility and want to toggle advanced recipes by flipping a YAML switch.

Check this out👇
github.com/axolotl-ai-clo…
Read 6 tweets
Jun 15
12 powerful tools for your AI Agents!

Here's a breakdown of what each does...👇
1️⃣ FileReadTool

This tool instantly pulls data from the local file system.

Read more👇
docs.crewai.com/tools/file-doc…
2️⃣ FileWriterTool

Let the agent create or overwrite any file.

Read more👇
docs.crewai.com/tools/file-doc…
Read 14 tweets
Jun 13
Model Context Protocol (MCP), clearly explained:
MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀
At its core, MCP follows a client-server architecture where a host application can connect to multiple servers.

Key components include:

- Host
- Client
- Server

Here's an overview before we dig deep 👇
Read 11 tweets
Jun 11
Object-oriented programming in Python, clearly explained:
We break it down to 6 important concepts:

- Object 🚘
- Class 🏗️
- Inheritance 🧬
- Encapsulation 🔐
- Abstraction 🎭
- Polymorphism 🌀

Let's take them one-by-one... 🚀 Image
1️⃣ Object 🚘

Just look around, everything you see can be treated as an object.

For instance a Car, Dog, your Laptop are all objects.

An Object can be defined using 2 things:

- Properties: that describe an object
- Behaviour: the functions that an object can perform

...👇
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(