MCP is like a USB-C port for your AI applications.
Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.
Let's dive in! 🚀
At its core, MCP follows a client-server architecture where a host application can connect to multiple servers.
Key components include:
- Host
- Client
- Server
Here's an overview before we dig deep 👇
The Host and Client:
Host: An AI app (Claude desktop, Cursor) that provides an environment for AI interactions, accesses tools and data, and runs the MCP Client.
MCP Client: Operates within the host to enable communication with MCP servers.
Next up, MCP server...👇
The Server
A server exposes specific capabilities and provides access to data.
3 key capabilities:
- Tools: Enable LLMs to perform actions through your server
- Resources: Expose data and content from your servers to LLMs
- Prompts: Create reusable prompt templates and workflows
The Client-Server Communication
Understanding client-server communication is essential for building your own MCP client-server.
Let's begin with this illustration and then break it down step by step... 👇
1️⃣ & 2️⃣: capability exchange
client sends an initialize request to learn server capabilities.
server responds with its capability details.
e.g., a Weather API server provides available `tools` to call API endpoints, `prompts`, and API documentation as `resource`.
3️⃣ Notification
Client then acknowledgment the successful connection and further message exchange continues.
Before we wrap, one more key detail...👇
Unlike traditional APIs, the MCP client-server communication is two-way.
Sampling, if needed, allows servers to leverage clients' AI capabilities (LLM completions or generations) without requiring API keys.
While clients to maintain control over model access and permissions
I hope this clarifies what MCP does.
In the future, I'll explore creating custom MCP servers and building hands-on demos around them.
Over to you! What is your take on MCP and its future?
That's a wrap!
If you enjoyed this breakdown:
Follow me → @akshay_pachaar ✔️
Every day, I share insights and tutorials on LLMs, AI Agents, RAGs, and Machine Learning!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You're in a Research Scientist interview at OpenAI.
The interviewer asks:
"How would you expand the context length of an LLM from 2K to 128K tokens?"
You: "I will fine-tune the model on longer docs with 128K context."
Interview over.
Here's what you missed:
Extending the context window isn't just about larger matrices.
In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!
So, how do we manage it?
continue...👇
1) Sparse Attention
It limits the attention computation to a subset of tokens by:
- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.
But this has a trade-off between computational complexity and performance.
dLLM is a Python library that unifies the training & evaluation of diffusion language models.
You can also use it to turn ANY autoregressive LM into a diffusion LM with minimal compute.
100% open-source.
Here's why this matters:
Traditional autoregressive models generate text left-to-right, one token at a time. Diffusion models work differently - they refine the entire sequence iteratively, giving you better control over generation quality and more flexible editing capabilities.
You're in a Research Scientist interview at Google.
Interviewer: We have a base LLM that's terrible at maths. How would you turn it into a maths & reasoning powerhouse?
You: I'll get some problems labeled and fine-tune the model.
Interview over.
Here's what you missed:
When outputs are verifiable, labels become optional.
Maths, code, and logic can be automatically checked and validated.
Let's use this fact to build a reasoning model without manual labelling.
We'll use:
- @UnslothAI for parameter-efficient finetuning.
- @HuggingFace TRL to apply GRPO.
Let's go! 🚀
What is GRPO?
Group Relative Policy Optimization is a reinforcement learning method that fine-tunes LLMs for math and reasoning tasks using deterministic reward functions, eliminating the need for labeled data.
Here's a brief overview of GRPO before we jump into code: