NOBODY wants to send their data to Google or OpenAI.
Yet here we are, shipping proprietary code, customer information, and sensitive business logic to closed-source APIs we don't control.
While everyone's chasing the latest closed-source releases, open-source models are quietly becoming the practical choice for many production systems.
Here's what everyone is missing:
Open-source models are catching up fast, and they bring something the big labs can't: privacy, speed, and control.
I built a playground to test this myself. Used CometML's Opik to evaluate models on real code generation tasks - testing correctness, readability, and best practices against actual GitHub repos.
Here's what surprised me:
OSS models like MiniMax-M2, Kimi k2 performed on par with the likes of Gemini 3 and Claude Sonnet 4.5 on most tasks.
But practically MiniMax-M2 turns out to be a winner as it's twice as fast and 12x cheaper when you compare it to models like Sonnet 4.5.
Well, this isn't just about saving money.
When your model is smaller and faster, you can deploy it in places closed-source APIs can't reach:
↳ Real-time applications that need sub-second responses
↳ Edge devices where latency kills user experience
↳ On-premise systems where data never leaves your infrastructure
MiniMax-M2 runs with only 10B activated parameters. That efficiency means lower latency, higher throughput, and the ability to handle interactive agents without breaking the bank.
The intelligence-to-cost ratio here changes what's possible.
You're not choosing between quality and affordability anymore. You're not sacrificing privacy for performance. The gap is closing, and in many cases, it's already closed.
If you're building anything that needs to be fast, private, or deployed at scale, it's worth taking a look at what's now available.
MiniMax-M2 is 100% open-source, free for developers right now. I have shared the link to their GitHub repo in the next tweet.
You will also find the code for the playground and evaluations I've done.
Claude Skills might be the biggest upgrade to AI agents so far!
Some say it's even bigger than MCP.
I've been testing skills for the past 3-4 days, and they're solving a problem most people don't talk about: agents just keep forgetting everything.
In this video, I'll share everything I've learned so far.
It covers:
> The core idea (skills as SOPs for agents)
> Anatomy of a skill
> Skills vs. MCP vs. Projects vs. Subagents
> Building your own skill
> Hands-on example
Skills are the early signs of continual learning, and they can change how we work with agents forever!
Here's everything you need to know:
Skills vs. Projects vs. Subagents:
If you found it insightful, reshare with your network.
Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!
You're in a Research Scientist interview at OpenAI.
The interviewer asks:
"How would you expand the context length of an LLM from 2K to 128K tokens?"
You: "I will fine-tune the model on longer docs with 128K context"
Interview over.
Here's what you missed:
Extending the context window isn't just about larger matrices.
In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!
So, how do we manage it?
continue...👇
1) Sparse Attention
It limits the attention computation to a subset of tokens by:
- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.
But this has a trade-off between computational complexity and performance.