Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Aug 20 • 12 tweets • 4 min read • Read on X

Scrolly

DeepMind built a simple RAG technique that:

- reduces hallucinations by 40%
- improves answer relevancy by 50%

Let's understand how to use it in RAG systems (with code):

Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:

- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration

Let's go!

Here's an overview of what the app does:

- First search the docs with user query
- Evaluate if the retrieved context is relevant using LLM
- Only keep the relevant context
- Do a web search if needed
- Aggregate the context & generate response

Now let's jump into code!

1️⃣ Setup LLM

We will use gpt-oss as the LLM, locally served using Ollama.

Check this out👇

2️⃣ Setup vector DB

Our primary source of knowledge is the user documents that we index and store in a Milvus vectorDB collection.

This will be the first source that will be invoked to fetch context when the user inputs a query.

Check this👇

3️⃣ Setup search tool

If the context obtained from the vector DB isn't relevant, we resort to web search using Firecrawl.

More specifically, we use the latest v2 endpoint that provides 10x faster scraping, semantic crawling, News & image search, and more.

Check this👇

4️⃣ Observability

LlamaIndex offers a seamless integration with CometML's Opik.

We use this to trace every LLM call, monitor, and evaluate our Corrective RAG application.

Check this 👇

5️⃣ Create the workflow

Now that we have everything set up, it's time to create the event-driven agentic workflow that orchestrates our application.

We pass in the LLM, vector index, and web search tool to initialize the workflow.

Check this 👇

7️⃣ Kick off the workflow

Finally, when we have everything ready, we kick off our workflow.

Check this👇

8️⃣ Deployment with Beam

Beam enables ultra-fast serverless deployment of any AI workflow.

Thus, we wrap our app in a Streamlit interface, specify the Python libraries, and the compute specifications for the container.

Finally, we deploy it in a few lines of code👇

Run the app

Beam launches the container and deploys our streamlit app as an HTTPS server that can be accessed from a web browser.

In the video, our workflow is able to answer a query that's unrelated to the document. The evaluation step makes this possible.

Check this 👇

https://twitter.com/1175166450832687104/status/1958053890831872065

That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

https://twitter.com/1175166450832687104/status/1958053890831872065

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Aug 17

Model Context Protocol (MCP), clearly explained (with visuals):

MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀

At its core, MCP follows a client-server architecture where a host application can connect to multiple servers.

Key components include:

- Host
- Client
- Server

Here's an overview before we dig deep 👇

Read 11 tweets

Avi Chawla

@_avichawla

Aug 14

A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Here's a complete breakdown (with visuals):

RAG is 80% retrieval and 20% generation.

So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.

Contextualized chunk embedding models solve this.

Let's dive in to learn more!

In RAG:

- No chunking drives up token costs
- Large chunks lose fine-grained context
- Small chunks lose global/neighbourhood context

In fact, chunking also involves determining chunk overlap, generating summaries, etc., which are tedious.

There's another problem!

Read 12 tweets

Avi Chawla

@_avichawla

Aug 11

Let's fine-tune OpenAI gpt-oss (100% locally):

Today, let's learn how to fine-tune OpenAI's latest gpt-oss locally.

We'll give it multilingual reasoning capabilities as shown in the video.

We'll use:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.

Let's begin!

1️⃣ Load the model

We start by loading the gpt-oss (20B variant) model and its tokenizer using Unsloth.

Check this 👇

Read 10 tweets

Avi Chawla

@_avichawla

Aug 8

Enterprises build RAG over 100s of data sources, not one!

- Microsoft ships it in M365 products.
- Google ships it in its Vertex AI Search.
- AWS ships it in its Amazon Q Business.

Let's build an MCP-powered RAG over 200+ sources (100% local):

Enterprise data is scattered across many sources.

Today, we'll build a unified MCP server that can query 200+ sources from one interface.

Tech stack:
- @mcpuse to build a local MCP client
- @MindsDB to connect to data sources
- @ollama to serve GPT-oss locally

Let's begin!

Here's the workflow:

- User submits a query.
- Agent connects to the MindsDB MCP server to find tools.
- Selects the appropriate tool based on user's query and invokes it
- Finally, it returns a contextually relevant response

Now, let's dive into the code!

Read 12 tweets

Avi Chawla

@_avichawla

Aug 7

I have been building AI Agents in production for over an year.

If you want to learn too, here's a simple tutorial (hands-on):

Today, we'll build and deploy a Coding Agent that can scrape docs, write production-ready code, solve issues and raise PRs, directly from Slack.

Tech stack:
- Claude Code for code generation
- @xpander_ai as the Agent backend
- @firecrawl_dev for scraping

Let's begin!

For context...

xpander is a plug-and-play Backend for agents that manages scale, memory, tools, multi-user states, events, guardrails, and more.

Once we deploy an Agent, it also provides various triggering options like MCP, Webhook, SDK, Chat, etc.

Check this👇

Read 11 tweets

Avi Chawla

@_avichawla

Aug 6

12 MCP, RAG, and Agents cheat sheets for AI engineers (with visuals):

https://twitter.com/1175166450832687104/status/1913842571572625420

1️⃣ Function calling & MCP for LLMs

Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs.

The visual covers how Function Calling & MCP work under the hood.

Check the thread below 👇

https://twitter.com/1175166450832687104/status/1913842571572625420

https://twitter.com/1175166450832687104/status/1947184607277019582

2️⃣ 4 stages of training LLMs from scratch

This visual covers the 4 stages of building LLMs from scratch to make them practically applicable.

- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning

Here's my detailed thread about it 👇

https://twitter.com/1175166450832687104/status/1947184607277019582

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Avi Chawla

Try unrolling a thread yourself!

More from @_avichawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!