Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Akshay 🚀

@akshay_pachaar

Aug 22 • 14 tweets • 5 min read • Read on X

Scrolly

Let's build an MCP server (100% local):

Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser

Let's go! 🚀

First, let's understand MCP using a translation analogy.

Imagine you only know English. To get info from a person who only knows:

- French, you must learn French.
- German, you must learn German.
- and so on.

Learning even 5 languages will be a nightmare for you!

But what if you find a translator?

- You talk to the translator.
- It infers the info you want.
- It picks the person to talk to.
- It gets you a response.

The translator is like an MCP!

It lets you (Agents) talk to other people (tools) through a single interface.

The client-server architecture:

Host (Agent/IDE) runs the MCP Client to communicate with MCP Servers.

Servers expose tools that can perform various actions.

This architecture enables secure and standardized AI tool integration.

Check this out👇

With this understanding in mind, let's start building our own MCP servers and clients...👇

1️⃣ Build a simple MCP server

First, we create a simple MCP server using FastMCP with a tool that adds two numbers.

Then create a configuration file that tells any client how to connect to this server.

Create these files as shown below👇

2️⃣ Create MCP Client

Next, we build a client using mcp-use, powered by locally running LLMs.

It's completely secure and can run on your machine.

Afterward, we'll wrap up this client in a user-friendly Streamlit UI for ease of use.

Check this out 👇

Let's integrate the MCP client and server we just created.

Here's the Streamlit UI I mentioned earlier.

For didactic purposes, we're starting with a very simple server.

Check this out👇

Now, let's make this more practical. We're going to build a browser automation MCP server using Stagehand.

With this, we can navigate websites, click buttons, fill out forms, and extract data using natural language commands.

Check this out👇

Next, let's interact with Stagehand MCP server.

When asked to find the cheapest flight from one city to another:
- It navigated to Google.
- Typed in the query.
- Scraped the relevant information.
- Returned the results to the agent to generate a response.

See this video👇

Why mcp-use?

You can connect any LLMs to MCP servers & create local MCP clients easily with mcp-use.

- Compatible with Ollama & LangChain
- Stream Agent output async
- Built-in debugging mode, etc

Repo:

(don't forget to star ⭐)github.com/mcp-use/mcp-use

To summarise here's what we covered:

- What is MCP
- How to build your own MCP Server
- How to build a 100% local MCP Client
- Fully automate and control a browser

I'll leave you with this great website to explore more MCP servers...👇

https://twitter.com/703601972/status/1958875808430530738

That's a wrap!

If you found it insightful, reshare with your network.

Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!

https://twitter.com/703601972/status/1958875808430530738

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @akshay_pachaar

Akshay 🚀

@akshay_pachaar

Aug 21

A simple technique makes RAG up to 40x faster & 32x memory efficient!

- Perplexity uses it in its search index
- Google uses it in Vertex RAG engine
- Azure uses it in its search pipeline

Let's understand how to use it in a RAG system (with code):

Today, we're building a multi-agent legal assistant that can query 50M+ vectors in <30ms using Binary Quantization (BQ).

Tech stack:

- @milvusio to self-host vectorDB with BQ
- @firecrawl_dev for web search
- @crewAIInc for orchestration
- @ollama to serve GPT-OSS

Let's go! 🚀

First things first: What exactly is binary quantization❓

In this video, I answer this question and provide a really nice analogy to explain why BQ works and how it makes your setup fast and memory efficient.

Check this out👇

Read 15 tweets

Akshay 🚀

@akshay_pachaar

Aug 19

JSON prompting for LLMs, clearly explained:

I used to think prompt engineering is dead!

Then I discovered JSON prompting and everything changed.

Today, I'll show you exactly what JSON prompting is and how it can drastically improve your AI outputs!

Let's dive in! 🚀

What is Json anyway?

JSON stands for JavaScript Object Notation.

Don’t let the name scare you; it’s just a way to organize info with clear labels.

You can think of it like a pizza order ticket with clear labels so the kitchen gets it right:

Read 11 tweets

Akshay 🚀

@akshay_pachaar

Aug 18

MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals):

Agentic applications require both A2A and MCP.

- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.

Today, let's clearly understand what A2A is and how it can work with MCP.

What is A2A?

A2A (Agent2Agent) enables multiple AI agents to work together on tasks without directly sharing their internal memory, thoughts, or tools.

Instead, they communicate by exchanging context, task updates, instructions, and data.

Read 8 tweets

Akshay 🚀

@akshay_pachaar

Aug 17

This simple technique can scale training from 1-1000+ GPUs.

- OpenAI uses it to train GPT models
- Google uses it in their TPUs to train Gemini
- Meta uses it to train Llamas on massive GPU clusters

Let's learn how to sync GPUs in multi-GPU training (with visuals):

One major run-time bottleneck in multi-GPU training happens during GPU synchronization.

For instance, in multi-GPU training via data parallelism:

- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.

Check this 👇

This leads to different gradients across different devices.

So, before updating the model parameters on each GPU device, we must communicate the gradients to all other devices to sync them.

Let’s understand 2 common strategies next!

Read 14 tweets

Akshay 🚀

@akshay_pachaar

Aug 15

Google just dropped a new LLM!

You can run it locally on just 0.5 GB RAM.

Let's fine-tune this on our own data (100% locally):

Google released Gemma 3 270M, a new model for hyper-efficient local AI!

We'll fine-tune this model and make it very smart at playing chess and predict the next move.

Tech stack:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.

Let's go! 🚀

1️⃣ Load the model

We start by loading the Gemma 3 270M and its tokenizer using Unsloth.

Check this 👇

Read 10 tweets

Akshay 🚀

@akshay_pachaar

Aug 14

How LLMs work, clearly explained:

Before diving into LLMs, we must understand conditional probability.

Let's consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks 👇

So what is Conditional probability ⁉️

It's a measure of the probability of an event given that another event has occurred.

If the events are A and B, we denote this as P(A|B).

This reads as "probability of A given B"

Check this illustration 👇

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Akshay 🚀

Try unrolling a thread yourself!

More from @akshay_pachaar

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!