Akshay 🚀 Profile picture
Jun 13 11 tweets 3 min read Read on X
Model Context Protocol (MCP), clearly explained:
MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀
At its core, MCP follows a client-server architecture where a host application can connect to multiple servers.

Key components include:

- Host
- Client
- Server

Here's an overview before we dig deep 👇
The Host and Client:

Host: An AI app (Claude desktop, Cursor) that provides an environment for AI interactions, accesses tools and data, and runs the MCP Client.

MCP Client: Operates within the host to enable communication with MCP servers.

Next up, MCP server...👇 Image
The Server

A server exposes specific capabilities and provides access to data.

3 key capabilities:

- Tools: Enable LLMs to perform actions through your server
- Resources: Expose data and content from your servers to LLMs
- Prompts: Create reusable prompt templates and workflowsImage
The Client-Server Communication

Understanding client-server communication is essential for building your own MCP client-server.

Let's begin with this illustration and then break it down step by step... 👇
1️⃣ & 2️⃣: capability exchange

client sends an initialize request to learn server capabilities.

server responds with its capability details.

e.g., a Weather API server provides available `tools` to call API endpoints, `prompts`, and API documentation as `resource`.
3️⃣ Notification

Client then acknowledgment the successful connection and further message exchange continues.

Before we wrap, one more key detail...👇
Unlike traditional APIs, the MCP client-server communication is two-way.

Sampling, if needed, allows servers to leverage clients' AI capabilities (LLM completions or generations) without requiring API keys.

While clients to maintain control over model access and permissions Image
I hope this clarifies what MCP does.

In the future, I'll explore creating custom MCP servers and building hands-on demos around them.

Over to you! What is your take on MCP and its future?
If you found it insightful, reshare with your network.

Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Akshay 🚀

Akshay 🚀 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @akshay_pachaar

Jun 11
Object-oriented programming in Python, clearly explained:
We break it down to 6 important concepts:

- Object 🚘
- Class 🏗️
- Inheritance 🧬
- Encapsulation 🔐
- Abstraction 🎭
- Polymorphism 🌀

Let's take them one-by-one... 🚀 Image
1️⃣ Object 🚘

Just look around, everything you see can be treated as an object.

For instance a Car, Dog, your Laptop are all objects.

An Object can be defined using 2 things:

- Properties: that describe an object
- Behaviour: the functions that an object can perform

...👇
Read 11 tweets
Jun 5
Self-attention in LLMs, clearly explained:
Before we start a quick primer on tokenization!

Raw text → Tokenization → Embedding → Model

Embedding is a meaningful representation of each token (roughly a word) using a bunch of numbers.

This embedding is what we provide as an input to our language models.

Check this👇 Image
The core idea of Language modelling is to understand the structure and patterns within language.

By modeling the relationships between words (tokens) in a sentence, we can capture the context and meaning of the text. Image
Read 9 tweets
Jun 3
Let's build an MCP-powered Agentic RAG (100% local):
Below, we have an MCP-powered Agentic RAG that searches a vector database and falls back to web search if needed.

To build this, we'll use:
- @firecrawl_dev search endpoint for web search.
- @qdrant_engine as the vector DB.
- @cursor_ai as the MCP client.

Let's build it!
Here's how it works:

1) The user inputs a query through the MCP client (Cursor).
2-3) The client contacts the MCP server to select a relevant tool.
4-6) The tool output is returned to the client to generate a response.

Let's implement this!
Read 12 tweets
Jun 1
Function calling & MCP for LLMs, clearly explained (with visuals):
Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs.

The visual below explains how Function Calling and MCP work under the hood.

Let's learn more!
In Function Calling:

- The LLM receives a prompt.
- The LLM decides the tool.
- The programmer implements a procedure to accept a tool call request and prepare a function call.
- A backend service executes the tool.

Let's implement this to understand better!
Read 10 tweets
May 30
Let's build an MCP server that connects to 200+ data sources (100% local):
Before we dive in, here's a quick demo of what we're building!

Tech stack:

- @MindsDB to power our unified MCP server
- @cursor_ai as the MCP host
- @Docker to self-host the server

Let's go! 🚀
Here's the workflow:

- User submits a query
- Agent connects to MindsDB MCP server to find tools
- Selects appropriate tool based on user query and call it
- Finally, returns a contextually relevant response

Now, let's dive into the code!
Read 12 tweets
May 29
KV caching in LLMs, clearly explained (with visuals):
KV caching is a technique used to speed up LLM inference.

Before understanding the internal details, look at the inference speed difference in the video:

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in!
To understand KV caching, we must know how LLMs output tokens.

- Transformer produces hidden states for all tokens.
- Hidden states are projected to vocab space.
- Logits of the last token is used to generate the next token.
- Repeat for subsequent tokens.

Check this👇
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(