Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Jul 18, 2025 • 12 tweets • 5 min read • Read on X

Scrolly

After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code):

ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!

Here's how it works:

- Build the Agents and host them on ACP servers.
- The ACP server receives requests from the ACP Client and forwards them to the Agent.
- ACP Client itself can be an Agent to intelligently route requests to the Agents (like MCP Client does).

Check this 👇

We’ll create a research summary generator, where:

- Agent 1 drafts a general topic summary (built using CrewAI)
- Agent 2 fact-checks & enhances it using web search (built using Smolagents).

Start by installing some dependencies and a local LLM using Ollama.

Check this 👇

In our case, we’ll have two servers, and each server will host one Agent.

Let’s define the server that will host the CrewAI Agent and its LLM.

Here's how we do it 👇

Next, we define an Agent on this server.

- Line 1 → Decorate the method.
- Line 6-21 → Build the Agent and kick off the Crew.
- Line 23 → Return the output in the expected ACP format.
- Line 26 → Serve on a REST-based ACP server running locally.

Check this 👇

Next, repeat these steps for the 2nd server to host the Smolagents Agent and its LLM.

- Line 1-10 → Imports + define the Server & the LLM.
- Line 12 → Decorate the method.
- Line 21-28 → Define the Agent with a web search tool.
- Line 31 → Serve the Agent.

Check this 👇

Finally, we use an ACP client to connect both agents in a workflow.

- Line 6-7 → Connect the client to both servers.
- Line 11-14 → Invoke the first agent to receive an output.
- Line 18-21 → Pass the output to the next agent for enhancement.

Check this 👇

Almost done!

Run the two servers as follows 👇

And then run the client to get an output from a system that’s powered by ACP using `uv run acp_client[.]py`

Check this 👇

This demo showcases how you can use ACP to enable Agents to communicate via a standardized protocol, even if they are built using different frameworks.

How is ACP different from A2A?
- ACP is built for local-first, low-latency communication.
- A2A is optimized for web-native, cross-vendor interoperability

- ACP uses a RESTful interface, making it easier to embed in your stack.
- A2A supports more flexible, natural interactions.

- ACP excels in controlled, edge, or team-specific setups.
- A2A shines in broader cloud-based collaboration

https://twitter.com/1175166450832687104/status/1946095899261972952

That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

https://twitter.com/1175166450832687104/status/1946095899261972952

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Jan 22

A simple technique trains neural nets 4-6x faster!

- OpenAI used it in GPT models.
- Meta used it in LLaMA models.
- Google used it in Gemini models.

Here's a breakdown (with code):

Typical deep learning frameworks are conservative when it comes to assigning data types.

The default data type is usually 64-bit or 32-bit, when they could have used 16-bit, for instance.

This is also evident from the code below👇

As a result, we are not entirely optimal at allocating memory.

Of course, this is done to ensure better precision in representing information.

However, this precision comes at the cost of memory utilization, which is not desired in all situations.

Check this 👇

Read 12 tweets

Avi Chawla

@_avichawla

Dec 12, 2025

- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML in recommendation
- Spotify uses graph ML in recommendation
- Pinterest uses graph ML in recommendation

Here are 6 must-know ways for graph feature engineering (with code):

Like images, text, and tabular datasets have features, so do graph datasets.

This means when building models on graph datasets, we can engineer these features to achieve better performance.

Let's discuss some feature engineering techniques below!

First, let’s create a dummy social networking graph dataset with accounts and followers (which will also be accounts).

We create the two DataFrames shown below, an accounts DataFrame and a followers DataFrame.

Check this code👇

Read 14 tweets

Avi Chawla

@_avichawla

Dec 10, 2025

You're in an AI Engineer interview at OpenAI.

The interviewer asks:

"Our GPT model generates 100 tokens in 42 seconds.

How do you make it 5x faster?"

You: "I'll allocate more GPUs for faster generation."

Interview over.

Here's what you missed:

The real bottleneck isn't compute, it's redundant computation.

Without KV caching, your model recalculates keys and values for each token, repeating work.

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in to understand how it works!

To understand KV caching, we must know how LLMs output tokens.

- Transformer produces hidden states for all tokens.
- Hidden states are projected to the vocab space.
- Logits of the last token are used to generate the next token.
- Repeat for subsequent tokens.

Check this👇

Read 10 tweets

Avi Chawla

@_avichawla

Dec 7, 2025

You're in a Research Scientist interview at OpenAI.

The interviewer asks:

"How would you expand the context length of an LLM from 2K to 128K tokens?"

You: "I will fine-tune the model on longer docs with 128K context."

Interview over.

Here's what you missed:

Extending the context window isn't just about larger matrices.

In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!

So, how do we manage it?

continue...👇

1) Sparse Attention

It limits the attention computation to a subset of tokens by:

- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.

But this has a trade-off between computational complexity and performance.

Read 12 tweets

Avi Chawla

@_avichawla

Nov 25, 2025

Context engineering, clearly explained (with visuals):

(an illustrated guide below)

So, what is context engineering?

It’s the art and science of delivering the right information, in the right format, at the right time, to your LLM.

Here's a quote by Andrej Karpathy on context engineering...👇

To understand context engineering, it's essential to first understand the meaning of context.

Agents today have evolved into much more than just chatbots.

The graphic below summarizes the 6 types of contexts an agent needs to function properly.

Check this out 👇

Read 10 tweets

Avi Chawla

@_avichawla

Oct 27, 2025

8 key skills to master LLM Engineering:

(free/open-source resources below)

https://twitter.com/_avichawla/status/1974723734503342091

1️⃣ Prompt engineering

Prompt engineering is far from dead!

The key is to craft structured prompts that reduce ambiguity and result in deterministic outputs.

Treat it as engineering, not copywriting!

Here's something I published on JSON prompting:

https://twitter.com/_avichawla/status/1974723734503342091

2️⃣ RAG systems

RAG is 80% retrieval and 20% generation. So if you sort out retrieval, the hard part is over.

Airweave (open-source) lets you build live, bi-temporal knowledge bases so that your LLMs always reason on the freshest facts.

Repo: github.com/airweave-ai/ai…

Supports fully agentic retrieval with semantic and keyword search, query expansion, and more across 30+ sources.