Avi Chawla Profile picture
Aug 27 11 tweets 4 min read Read on X
There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local):
To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!
For context...

Postman's MCP Generator lets us build an MCP server with tools from its public API Network (with 100k+ APIs).

Steps:

- Select all the APIs for your MCP server.
- Export the code for the MCP server.
- Integrate it with any MCP client.

Check this👇 Image
To begin, select the tools that you want to add to your MCP server. For simplicity, we select Hacker News and pick all the tools.

Once done, we click on Generate.

This gives us a download link with the code for the MCP server.

Check this 👇
After unzipping the file, we can see the entire repository, including:

- A README with instructions
- A .env file to specify API keys (if any)
- The server implementation, and more.

Check this 👇 Image
As instructed in the README file, we run `npm install` command.

Next, to integrate the MCP server with Claude Desktop, go to Settings → Developer → Edit Config and add the config.

Note: You can run the `which node` command to print the path to node.

Check this 👇 Image
Once the server is configured, Claude Desktop will show the tools we integrated while building the MCP server in Postman's MCP Generator.

For Hacker News, we have:
- get_story
- fetch_top_stories
- fetch_best_stories
- fetch_new_stories

Check this👇 Image
Finally, we interact with the MCP server we just built.

Check this demo 👇
You can find the MCP Generator tool here: bit.ly/4oV7uUw
To recap, there are the steps:

- Open Postman's MCP generator.
- Select the APIs from Postman's API Network.
- All these APIs will be available as tools in your MCP server.
- Download the code provided by Postman.
- Specify API keys if needed in the .env file.
- Prepare your MCP config file JSON and add it to Claude/Cursor.

Done!

Thanks to @getpostman for working with me on this thread!Image
That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Avi Chawla

Avi Chawla Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_avichawla

Aug 25
I removed 74% of neurons from a neural network.

It dropped the accuracy by just 0.50%.

Here's a breakdown (with code):
A trained neural network always has neurons that do not substantially contribute to the performance.

But they still consume memory.

These can be removed without significantly compromising accuracy.

Let's see how to identify them!
Here are the steps:

Step 1) Train the neural network as usual.

Step 2) Pass the validation set through the trained network, and for every neuron in hidden layers, compute:

- The average activation
- The variance of activations (if activations can be -ve)

Check this👇 Image
Read 12 tweets
Aug 23
The growth of LLM context length with time:

- GPT-3.5-turbo → 4k tokens
- OpenAI GPT4 → 8k tokens
- Claude 2 → 100k tokens
- Llama 3 → 128k tokens
- Gemini → 1M tokens

Let's understand how they extend the context length of LLMs:
In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.

Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.

Check this 👇 Image
1) Sparse Attention

It limits the attention computation to a subset of tokens by:

- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.

But this has a trade-off between computational complexity and performance. Image
Read 11 tweets
Aug 22
You are in an ML interview.

Your interviewer asks: "Why is Kernel Trick called a Trick?"

Here's how to answer (with simple maths):
Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.

If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.

Let's understand more with maths! Image
Let’s assume the following polynomial kernel function:

Also, for simplicity, let’s say both X and Y are two-dimensional vectors:

- X = (x1, x2)
- Y = (y1, y2)

Check this 👇 Image
Read 10 tweets
Aug 20
DeepMind built a simple RAG technique that:

- reduces hallucinations by 40%
- improves answer relevancy by 50%

Let's understand how to use it in RAG systems (with code):
Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:

- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration

Let's go!
Here's an overview of what the app does:

- First search the docs with user query
- Evaluate if the retrieved context is relevant using LLM
- Only keep the relevant context
- Do a web search if needed
- Aggregate the context & generate response

Now let's jump into code!
Read 12 tweets
Aug 17
Model Context Protocol (MCP), clearly explained (with visuals):
MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀
At its core, MCP follows a client-server architecture where a host application can connect to multiple servers.

Key components include:

- Host
- Client
- Server

Here's an overview before we dig deep 👇
Read 11 tweets
Aug 14
A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Here's a complete breakdown (with visuals):
RAG is 80% retrieval and 20% generation.

So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.

Contextualized chunk embedding models solve this.

Let's dive in to learn more!
In RAG:

- No chunking drives up token costs
- Large chunks lose fine-grained context
- Small chunks lose global/neighbourhood context

In fact, chunking also involves determining chunk overlap, generating summaries, etc., which are tedious.

There's another problem!
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(