Let's build a real-time Voice RAG Agent, step-by-step:
Before we begin, here's a quick demo of what we're building
Tech stack:
- @Cartesia_AI for SOTA text-to-speech
- @AssemblyAI for speech-to-text
- @LlamaIndex to power RAG
- @livekit for orchestration
Let's go! 🚀
Here's an overview of what the app does:
1. Listens to real-time audio 2. Transcribes it via AssemblyAI 3. Uses your docs (via LlamaIndex) to craft an answer 4. Speaks that answer back with Cartesia