Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Jul 18 • 12 tweets • 5 min read • Read on X

Scrolly

After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code):

ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!

Here's how it works:

- Build the Agents and host them on ACP servers.
- The ACP server receives requests from the ACP Client and forwards them to the Agent.
- ACP Client itself can be an Agent to intelligently route requests to the Agents (like MCP Client does).

Check this 👇

We’ll create a research summary generator, where:

- Agent 1 drafts a general topic summary (built using CrewAI)
- Agent 2 fact-checks & enhances it using web search (built using Smolagents).

Start by installing some dependencies and a local LLM using Ollama.

Check this 👇

In our case, we’ll have two servers, and each server will host one Agent.

Let’s define the server that will host the CrewAI Agent and its LLM.

Here's how we do it 👇

Next, we define an Agent on this server.

- Line 1 → Decorate the method.
- Line 6-21 → Build the Agent and kick off the Crew.
- Line 23 → Return the output in the expected ACP format.
- Line 26 → Serve on a REST-based ACP server running locally.

Check this 👇

Next, repeat these steps for the 2nd server to host the Smolagents Agent and its LLM.

- Line 1-10 → Imports + define the Server & the LLM.
- Line 12 → Decorate the method.
- Line 21-28 → Define the Agent with a web search tool.
- Line 31 → Serve the Agent.

Check this 👇

Finally, we use an ACP client to connect both agents in a workflow.

- Line 6-7 → Connect the client to both servers.
- Line 11-14 → Invoke the first agent to receive an output.
- Line 18-21 → Pass the output to the next agent for enhancement.

Check this 👇

Almost done!

Run the two servers as follows 👇

And then run the client to get an output from a system that’s powered by ACP using `uv run acp_client[.]py`

Check this 👇

This demo showcases how you can use ACP to enable Agents to communicate via a standardized protocol, even if they are built using different frameworks.

How is ACP different from A2A?
- ACP is built for local-first, low-latency communication.
- A2A is optimized for web-native, cross-vendor interoperability

- ACP uses a RESTful interface, making it easier to embed in your stack.
- A2A supports more flexible, natural interactions.

- ACP excels in controlled, edge, or team-specific setups.
- A2A shines in broader cloud-based collaboration

https://twitter.com/1175166450832687104/status/1946095899261972952

That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

https://twitter.com/1175166450832687104/status/1946095899261972952

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Jul 17

How to compress ML models, clearly explained (with code):

Model performance is rarely the only factor to determine which model will be deployed.

Instead, we also consider several operational metrics depicted below.

Knowledge distillation (KD) is popularly used to compress ML models before deployment.

Let's learn about it below.

In KD, we train a student model that mimics a teacher model.

It has two steps:
- Train a teacher model.
- Train a student model that matches its output.

DistillBERT is a student model of BERT. It is 40% smaller but retains 97% of BERT’s capabilities.

Let's implement KD next.

Read 10 tweets

Avi Chawla

@_avichawla

Jul 15

Let's build an MCP-powered financial analyst (100% local):

Before we dive in, here's a quick demo of what we're building!

Tech stack:

- @crewAIInc for multi-agent orchestration
- @Ollama to locally serve DeepSeek-R1 LLM
- @cursor_ai as the MCP host

Let's go! 🚀

System Overview:

- User submits a query.
- The MCP agent kicks off the financial analyst crew.
- The crew conducts research and creates an executable script.
- The agent runs the script to generate an analysis plot.

Now, let's dive into the code!

Read 13 tweets

Avi Chawla

@_avichawla

Jul 11

How to sync GPUs in multi-GPU training, clearly explained (with visuals):

One major run-time bottleneck in multi-GPU training happens during GPU synchronization.

For instance, in multi-GPU training via data parallelism:

- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.

Check this 👇

This leads to different gradients across different devices.

So, before updating the model parameters on each GPU device, we must communicate the gradients to all other devices to sync them.

Let’s understand 2 common strategies next!

Read 14 tweets

Avi Chawla

@_avichawla

Jul 10

Naive RAG vs. Agentic RAG, clearly explained (with visuals):

Naive RAG has many issues:

- It retrieves once and generates once. If the context isn't enough, it cannot dynamically search for more info.

- It cannot reason through complex queries.

- The system can't modify its strategy based on the problem.

Agentic RAG attempts to solve this.

The following visual depicts how it differs from naive RAG.

The core idea is to introduce agentic behaviors at each stage of RAG.

Read 7 tweets

Avi Chawla

@_avichawla

Jul 8

How LLMs work, clearly explained (with visuals):

Before diving into LLMs, we must understand conditional probability.

Let's consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks 👇

So what is Conditional probability?

It's a measure of the probability of an event given that another event has occurred.

If the events are A and B, we denote this as P(A|B).

This reads as "probability of A given B"

Check this illustration👇

Read 13 tweets

Avi Chawla

@_avichawla

Jul 3

uv in Python, clearly explained (with code):

uv is incredibly fast.

- Creating virtual envs. using uv is ~80x faster than python -m venv.
- Package installation is 4–12x faster without caching, and ~100x with caching

Today, let's understand how to use uv for Python package management.

Let's dive in!

uv is a Rust-based Python package manager built to be fast and reliable.

It replaces not just pip but also pip-tools, virtualenv, pipx, poetry, and pyenv, all with a single standalone binary.

Here's a uv cheatsheet for Python devs👇

Let's look at the code next!

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Avi Chawla

Try unrolling a thread yourself!

More from @_avichawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!