Akshay 🚀 Profile picture
Aug 29 11 tweets 4 min read Read on X
I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training:
Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!
These are some basic techniques:

1) Use efficient optimizers—AdamW, Adam, etc.

2) Utilize hardware accelerators (GPUs/TPUs).

3) Max out the batch size.

4) Use multi-GPU training through Model/Data/Pipeline/Tensor parallelism. Check the visual👇
5) Bayesian optimization for hyperparameter optimization:

This technique takes informed steps based on the results of the previous hyperparameter configs.

This way, the model converges to an optimal set of hyperparameters much faster.

Check these results 👇 Image
6) Initialize parameters with He or Xavier initialization.

7) For large models, use DeepSpeed, FSDP, YaFSDP, etc.

8) Mixed precision training: Use lower precision float16 along with float32. This leads to faster computation. Check the visual 👇 Image
9) Use DistributedDataParallel, not DataParallel.

10) Use torch.rand(2, 2, device ...) to create a tensor on GPU. A .cuda() call creates a tensor on CPU and then transfers it to GPU, which is slow.

11) Use activation checkpointing in memory constraints👇 Image
12) Use gradient accumulation.

13) Normalize data after transferring to GPU (for integer data, like pixels):

- Normalizing before will transfer 32-bit floats to the GPU.
- Normalizing after will transfer 8-bit floats to the GPU.
- The latter is better.

Check this👇 Image
14) Use momentum

In gradient descent, every parameter update solely depends on the current gradient. This leads to unwanted oscillations during optimization.

Momentum reduces this by adding a weighted average of previous gradient updates to the update rule.

Check this 👇
15-16) Set max_workers and pin_memory in DataLoader.

PyTorch dataloader has two terrible default settings. Update them according to your config.

Speedup is shown in the image below 👇 Image
Those were 16 techniques that I actively use to optimize neural network training.

If I missed something, please drop that in the replies.

Here's the visual again for your reference 👇
That's a wrap!

If you found it insightful, reshare with your network.

Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Akshay 🚀

Akshay 🚀 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @akshay_pachaar

Aug 26
I boosted my AI Agent's performance by 184%

Using a fully open-source, automatic technique

Here's a breakdown (with code):
Top AI Engineers never do manual prompt engineering.

Today, I'll show you how to automatically find the best prompts for any agentic workflow you're building.

We'll use @Cometml's 100% open-source Opik to do so.

Let's go! 🚀 Image
The idea is simple yet powerful:

1. Start with an initial prompt & eval dataset
2. Let the optimizer iteratively improve the prompt
3. Get the optimal prompt automatically! ✨

Now let's dive into the code for this!
Read 13 tweets
Aug 24
After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code): Image
ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!
Here's how it works:

- Build the Agents and host them on ACP servers.
- The ACP server receives requests from the ACP Client and forwards them to the Agent.
- ACP Client itself can be an Agent to intelligently route requests to the Agents (like MCP Client does).

Check this 👇
Read 12 tweets
Aug 22
Let's build an MCP server (100% local):
Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser

Let's go! 🚀
First, let's understand MCP using a translation analogy.

Imagine you only know English. To get info from a person who only knows:

- French, you must learn French.
- German, you must learn German.
- and so on.

Learning even 5 languages will be a nightmare for you!
Read 14 tweets
Aug 21
A simple technique makes RAG up to 40x faster & 32x memory efficient!

- Perplexity uses it in its search index
- Google uses it in Vertex RAG engine
- Azure uses it in its search pipeline

Let's understand how to use it in a RAG system (with code):
Today, we're building a multi-agent legal assistant that can query 50M+ vectors in <30ms using Binary Quantization (BQ).

Tech stack:

- @milvusio to self-host vectorDB with BQ
- @firecrawl_dev for web search
- @crewAIInc for orchestration
- @ollama to serve GPT-OSS

Let's go! 🚀
First things first: What exactly is binary quantization❓

In this video, I answer this question and provide a really nice analogy to explain why BQ works and how it makes your setup fast and memory efficient.

Check this out👇
Read 15 tweets
Aug 19
JSON prompting for LLMs, clearly explained:
I used to think prompt engineering is dead!

Then I discovered JSON prompting and everything changed.

Today, I'll show you exactly what JSON prompting is and how it can drastically improve your AI outputs!

Let's dive in! 🚀
What is Json anyway?

JSON stands for JavaScript Object Notation.

Don’t let the name scare you; it’s just a way to organize info with clear labels.

You can think of it like a pizza order ticket with clear labels so the kitchen gets it right: Image
Read 11 tweets
Aug 18
MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals):
Agentic applications require both A2A and MCP.

- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.

Today, let's clearly understand what A2A is and how it can work with MCP.
What is A2A?

A2A (Agent2Agent) enables multiple AI agents to work together on tasks without directly sharing their internal memory, thoughts, or tools.

Instead, they communicate by exchanging context, task updates, instructions, and data.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(