Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Aug 27 • 11 tweets • 4 min read • Read on X

Scrolly

There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local):

To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!

For context...

Postman's MCP Generator lets us build an MCP server with tools from its public API Network (with 100k+ APIs).

Steps:

- Select all the APIs for your MCP server.
- Export the code for the MCP server.
- Integrate it with any MCP client.

Check this👇

To begin, select the tools that you want to add to your MCP server. For simplicity, we select Hacker News and pick all the tools.

Once done, we click on Generate.

This gives us a download link with the code for the MCP server.

Check this 👇

After unzipping the file, we can see the entire repository, including:

- A README with instructions
- A .env file to specify API keys (if any)
- The server implementation, and more.

Check this 👇

As instructed in the README file, we run `npm install` command.

Next, to integrate the MCP server with Claude Desktop, go to Settings → Developer → Edit Config and add the config.

Note: You can run the `which node` command to print the path to node.

Check this 👇

Once the server is configured, Claude Desktop will show the tools we integrated while building the MCP server in Postman's MCP Generator.

For Hacker News, we have:
- get_story
- fetch_top_stories
- fetch_best_stories
- fetch_new_stories

Check this👇

Finally, we interact with the MCP server we just built.

Check this demo 👇

You can find the MCP Generator tool here: bit.ly/4oV7uUw

To recap, there are the steps:

- Open Postman's MCP generator.
- Select the APIs from Postman's API Network.
- All these APIs will be available as tools in your MCP server.
- Download the code provided by Postman.
- Specify API keys if needed in the .env file.
- Prepare your MCP config file JSON and add it to Claude/Cursor.

Done!

Thanks to @getpostman for working with me on this thread!

https://twitter.com/1175166450832687104/status/1960590605480026244

That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

https://twitter.com/1175166450832687104/status/1960590605480026244

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Aug 25

I removed 74% of neurons from a neural network.

It dropped the accuracy by just 0.50%.

Here's a breakdown (with code):

A trained neural network always has neurons that do not substantially contribute to the performance.

But they still consume memory.

These can be removed without significantly compromising accuracy.

Let's see how to identify them!

Here are the steps:

Step 1) Train the neural network as usual.

Step 2) Pass the validation set through the trained network, and for every neuron in hidden layers, compute:

- The average activation
- The variance of activations (if activations can be -ve)

Check this👇

Read 12 tweets

Avi Chawla

@_avichawla

Aug 23

The growth of LLM context length with time:

- GPT-3.5-turbo → 4k tokens
- OpenAI GPT4 → 8k tokens
- Claude 2 → 100k tokens
- Llama 3 → 128k tokens
- Gemini → 1M tokens

Let's understand how they extend the context length of LLMs:

In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.

Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.

Check this 👇

1) Sparse Attention

It limits the attention computation to a subset of tokens by:

- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.

But this has a trade-off between computational complexity and performance.

Read 11 tweets

Avi Chawla

@_avichawla

Aug 22

You are in an ML interview.

Your interviewer asks: "Why is Kernel Trick called a Trick?"

Here's how to answer (with simple maths):

Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.

If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.

Let's understand more with maths!

Let’s assume the following polynomial kernel function:

Also, for simplicity, let’s say both X and Y are two-dimensional vectors:

- X = (x1, x2)
- Y = (y1, y2)

Check this 👇

Read 10 tweets

Avi Chawla

@_avichawla

Aug 20

DeepMind built a simple RAG technique that:

- reduces hallucinations by 40%
- improves answer relevancy by 50%

Let's understand how to use it in RAG systems (with code):

Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:

- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration

Let's go!

Here's an overview of what the app does:

- First search the docs with user query
- Evaluate if the retrieved context is relevant using LLM
- Only keep the relevant context
- Do a web search if needed
- Aggregate the context & generate response

Now let's jump into code!

Read 12 tweets

Avi Chawla

@_avichawla

Aug 17

Model Context Protocol (MCP), clearly explained (with visuals):

MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀

At its core, MCP follows a client-server architecture where a host application can connect to multiple servers.

Key components include:

- Host
- Client
- Server

Here's an overview before we dig deep 👇

Read 11 tweets

Avi Chawla

@_avichawla

Aug 14

A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Here's a complete breakdown (with visuals):

RAG is 80% retrieval and 20% generation.

So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.

Contextualized chunk embedding models solve this.

Let's dive in to learn more!

In RAG:

- No chunking drives up token costs
- Large chunks lose fine-grained context
- Small chunks lose global/neighbourhood context

In fact, chunking also involves determining chunk overlap, generating summaries, etc., which are tedious.

There's another problem!

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Avi Chawla

Try unrolling a thread yourself!

More from @_avichawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!