Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Akshay 🚀

@akshay_pachaar

Aug 29 • 11 tweets • 4 min read • Read on X

Scrolly

I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training:

Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!

These are some basic techniques:

1) Use efficient optimizers—AdamW, Adam, etc.

2) Utilize hardware accelerators (GPUs/TPUs).

3) Max out the batch size.

4) Use multi-GPU training through Model/Data/Pipeline/Tensor parallelism. Check the visual👇

5) Bayesian optimization for hyperparameter optimization:

This technique takes informed steps based on the results of the previous hyperparameter configs.

This way, the model converges to an optimal set of hyperparameters much faster.

Check these results 👇

6) Initialize parameters with He or Xavier initialization.

7) For large models, use DeepSpeed, FSDP, YaFSDP, etc.

8) Mixed precision training: Use lower precision float16 along with float32. This leads to faster computation. Check the visual 👇

9) Use DistributedDataParallel, not DataParallel.

10) Use torch.rand(2, 2, device ...) to create a tensor on GPU. A .cuda() call creates a tensor on CPU and then transfers it to GPU, which is slow.

11) Use activation checkpointing in memory constraints👇

12) Use gradient accumulation.

13) Normalize data after transferring to GPU (for integer data, like pixels):

- Normalizing before will transfer 32-bit floats to the GPU.
- Normalizing after will transfer 8-bit floats to the GPU.
- The latter is better.

Check this👇

14) Use momentum

In gradient descent, every parameter update solely depends on the current gradient. This leads to unwanted oscillations during optimization.

Momentum reduces this by adding a weighted average of previous gradient updates to the update rule.

Check this 👇

15-16) Set max_workers and pin_memory in DataLoader.

PyTorch dataloader has two terrible default settings. Update them according to your config.

Speedup is shown in the image below 👇

Those were 16 techniques that I actively use to optimize neural network training.

If I missed something, please drop that in the replies.

Here's the visual again for your reference 👇

https://twitter.com/703601972/status/1961405987203809434

That's a wrap!

If you found it insightful, reshare with your network.

Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!

https://twitter.com/703601972/status/1961405987203809434

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @akshay_pachaar

Akshay 🚀

@akshay_pachaar

Aug 26

I boosted my AI Agent's performance by 184%

Using a fully open-source, automatic technique

Here's a breakdown (with code):

Top AI Engineers never do manual prompt engineering.

Today, I'll show you how to automatically find the best prompts for any agentic workflow you're building.

We'll use @Cometml's 100% open-source Opik to do so.

Let's go! 🚀

The idea is simple yet powerful:

1. Start with an initial prompt & eval dataset
2. Let the optimizer iteratively improve the prompt
3. Get the optimal prompt automatically! ✨

Now let's dive into the code for this!

Read 13 tweets

Akshay 🚀

@akshay_pachaar

Aug 24

After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code):

ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!

Here's how it works:

- Build the Agents and host them on ACP servers.
- The ACP server receives requests from the ACP Client and forwards them to the Agent.
- ACP Client itself can be an Agent to intelligently route requests to the Agents (like MCP Client does).

Check this 👇

Read 12 tweets

Akshay 🚀

@akshay_pachaar

Aug 22

Let's build an MCP server (100% local):

Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser

Let's go! 🚀

First, let's understand MCP using a translation analogy.

Imagine you only know English. To get info from a person who only knows:

- French, you must learn French.
- German, you must learn German.
- and so on.

Learning even 5 languages will be a nightmare for you!

Read 14 tweets

Akshay 🚀

@akshay_pachaar

Aug 21

A simple technique makes RAG up to 40x faster & 32x memory efficient!

- Perplexity uses it in its search index
- Google uses it in Vertex RAG engine
- Azure uses it in its search pipeline

Let's understand how to use it in a RAG system (with code):

Today, we're building a multi-agent legal assistant that can query 50M+ vectors in <30ms using Binary Quantization (BQ).

Tech stack:

- @milvusio to self-host vectorDB with BQ
- @firecrawl_dev for web search
- @crewAIInc for orchestration
- @ollama to serve GPT-OSS

Let's go! 🚀

First things first: What exactly is binary quantization❓

In this video, I answer this question and provide a really nice analogy to explain why BQ works and how it makes your setup fast and memory efficient.

Check this out👇

Read 15 tweets

Akshay 🚀

@akshay_pachaar

Aug 19

JSON prompting for LLMs, clearly explained:

I used to think prompt engineering is dead!

Then I discovered JSON prompting and everything changed.

Today, I'll show you exactly what JSON prompting is and how it can drastically improve your AI outputs!

Let's dive in! 🚀

What is Json anyway?

JSON stands for JavaScript Object Notation.

Don’t let the name scare you; it’s just a way to organize info with clear labels.

You can think of it like a pizza order ticket with clear labels so the kitchen gets it right:

Read 11 tweets

Akshay 🚀

@akshay_pachaar

Aug 18

MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals):

Agentic applications require both A2A and MCP.

- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.

Today, let's clearly understand what A2A is and how it can work with MCP.

What is A2A?

A2A (Agent2Agent) enables multiple AI agents to work together on tasks without directly sharing their internal memory, thoughts, or tools.

Instead, they communicate by exchanging context, task updates, instructions, and data.

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Akshay 🚀

Try unrolling a thread yourself!

More from @akshay_pachaar

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!