Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Aug 27, 2025 • 11 tweets • 4 min read • Read on X

Scrolly

There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local):

To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!

For context...

Postman's MCP Generator lets us build an MCP server with tools from its public API Network (with 100k+ APIs).

Steps:

- Select all the APIs for your MCP server.
- Export the code for the MCP server.
- Integrate it with any MCP client.

Check this👇

To begin, select the tools that you want to add to your MCP server. For simplicity, we select Hacker News and pick all the tools.

Once done, we click on Generate.

This gives us a download link with the code for the MCP server.

Check this 👇

After unzipping the file, we can see the entire repository, including:

- A README with instructions
- A .env file to specify API keys (if any)
- The server implementation, and more.

Check this 👇

As instructed in the README file, we run `npm install` command.

Next, to integrate the MCP server with Claude Desktop, go to Settings → Developer → Edit Config and add the config.

Note: You can run the `which node` command to print the path to node.

Check this 👇

Once the server is configured, Claude Desktop will show the tools we integrated while building the MCP server in Postman's MCP Generator.

For Hacker News, we have:
- get_story
- fetch_top_stories
- fetch_best_stories
- fetch_new_stories

Check this👇

Finally, we interact with the MCP server we just built.

Check this demo 👇

You can find the MCP Generator tool here: bit.ly/4oV7uUw

To recap, there are the steps:

- Open Postman's MCP generator.
- Select the APIs from Postman's API Network.
- All these APIs will be available as tools in your MCP server.
- Download the code provided by Postman.
- Specify API keys if needed in the .env file.
- Prepare your MCP config file JSON and add it to Claude/Cursor.

Done!

Thanks to @getpostman for working with me on this thread!

https://twitter.com/1175166450832687104/status/1960590605480026244

That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

https://twitter.com/1175166450832687104/status/1960590605480026244

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Jan 22

A simple technique trains neural nets 4-6x faster!

- OpenAI used it in GPT models.
- Meta used it in LLaMA models.
- Google used it in Gemini models.

Here's a breakdown (with code):

Typical deep learning frameworks are conservative when it comes to assigning data types.

The default data type is usually 64-bit or 32-bit, when they could have used 16-bit, for instance.

This is also evident from the code below👇

As a result, we are not entirely optimal at allocating memory.

Of course, this is done to ensure better precision in representing information.

However, this precision comes at the cost of memory utilization, which is not desired in all situations.

Check this 👇

Read 12 tweets

Avi Chawla

@_avichawla

Dec 12, 2025

- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML in recommendation
- Spotify uses graph ML in recommendation
- Pinterest uses graph ML in recommendation

Here are 6 must-know ways for graph feature engineering (with code):

Like images, text, and tabular datasets have features, so do graph datasets.

This means when building models on graph datasets, we can engineer these features to achieve better performance.

Let's discuss some feature engineering techniques below!

First, let’s create a dummy social networking graph dataset with accounts and followers (which will also be accounts).

We create the two DataFrames shown below, an accounts DataFrame and a followers DataFrame.

Check this code👇

Read 14 tweets

Avi Chawla

@_avichawla

Dec 10, 2025

You're in an AI Engineer interview at OpenAI.

The interviewer asks:

"Our GPT model generates 100 tokens in 42 seconds.

How do you make it 5x faster?"

You: "I'll allocate more GPUs for faster generation."

Interview over.

Here's what you missed:

The real bottleneck isn't compute, it's redundant computation.

Without KV caching, your model recalculates keys and values for each token, repeating work.

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in to understand how it works!

To understand KV caching, we must know how LLMs output tokens.

- Transformer produces hidden states for all tokens.
- Hidden states are projected to the vocab space.
- Logits of the last token are used to generate the next token.
- Repeat for subsequent tokens.

Check this👇

Read 10 tweets

Avi Chawla

@_avichawla

Dec 7, 2025

You're in a Research Scientist interview at OpenAI.

The interviewer asks:

"How would you expand the context length of an LLM from 2K to 128K tokens?"

You: "I will fine-tune the model on longer docs with 128K context."

Interview over.

Here's what you missed:

Extending the context window isn't just about larger matrices.

In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!

So, how do we manage it?

continue...👇

1) Sparse Attention

It limits the attention computation to a subset of tokens by:

- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.

But this has a trade-off between computational complexity and performance.

Read 12 tweets

Avi Chawla

@_avichawla

Nov 25, 2025

Context engineering, clearly explained (with visuals):

(an illustrated guide below)

So, what is context engineering?

It’s the art and science of delivering the right information, in the right format, at the right time, to your LLM.

Here's a quote by Andrej Karpathy on context engineering...👇

To understand context engineering, it's essential to first understand the meaning of context.

Agents today have evolved into much more than just chatbots.

The graphic below summarizes the 6 types of contexts an agent needs to function properly.

Check this out 👇

Read 10 tweets

Avi Chawla

@_avichawla

Oct 27, 2025

8 key skills to master LLM Engineering:

(free/open-source resources below)

https://twitter.com/_avichawla/status/1974723734503342091

1️⃣ Prompt engineering

Prompt engineering is far from dead!

The key is to craft structured prompts that reduce ambiguity and result in deterministic outputs.

Treat it as engineering, not copywriting!

Here's something I published on JSON prompting:

https://twitter.com/_avichawla/status/1974723734503342091

2️⃣ RAG systems

RAG is 80% retrieval and 20% generation. So if you sort out retrieval, the hard part is over.

Airweave (open-source) lets you build live, bi-temporal knowledge bases so that your LLMs always reason on the freshest facts.

Repo: github.com/airweave-ai/ai…

Supports fully agentic retrieval with semantic and keyword search, query expansion, and more across 30+ sources.