Post

https://x.com/aakashgupta/status/2023797194017706290

More from @aakashgupta

Aakash Gupta

@aakashgupta

Mar 20

Evals are the new PRD.

The companies building AI products that actually work are running 12.8 eval experiments per day. Here is the playbook with @ankrgyl, Founder and CEO of @braintrust ($800M valuation, behind Vercel, Replit, Ramp, Zapier, Notion, Airtable):

⏱ 1:43 Why vibe checks stop scaling
⏱ 6:35 Evals are the new PRD
⏱ 8:45 The Claude Code evals controversy
⏱ 18:48 Building an eval live from zero
⏱ 29:51 Connecting Linear MCP and iterating
⏱ 39:12 Why you need evals that fail
⏱ 43:36 Offline vs online evals
⏱ 47:40 Three mistakes killing eval culture

The core framework: every eval is exactly three things. A set of inputs your product needs to handle. A task that takes those inputs and generates outputs. A scoring function that produces a number between 0 and 1.

We built one from scratch on camera. Score went from 0 to 0.75 in under 20 minutes.

@ankrgyl @braintrust YouTube:

@ankrgyl @braintrust Spotify:

open.spotify.com/episode/6AK7zQ…

Read 4 tweets

Aakash Gupta

@aakashgupta

Mar 20

https://x.com/aakashgupta/status/2034933979720425611

Cursor is raising at a $50 billion valuation on the claim that its “in-house models generate more code than almost any other LLMs in the world.” Less than 24 hours after launching Composer 2, a developer found the model ID in the API response: kimi-k2p5-rl-0317-s515-fast.

That’s Moonshot AI’s Kimi K2.5 with reinforcement learning appended. A developer named Fynn was testing Cursor’s OpenAI-compatible base URL when the identifier leaked through the response headers. Moonshot’s head of pretraining, Yulun Du, confirmed on X that the tokenizer is identical to Kimi’s and questioned Cursor’s license compliance. Two other Moonshot employees posted confirmations. All three posts have since been deleted.

This is the second time. When Cursor launched Composer 1 in October 2025, users across multiple countries reported the model spontaneously switching its inner monologue to Chinese mid-session. Kenneth Auchenberg, a partner at Alley Corp, posted a screenshot calling it a smoking gun. KR-Asia and 36Kr confirmed both Cursor and Windsurf were running fine-tuned Chinese open-weight models underneath. Cursor never disclosed what Composer 1 was built on. They shipped Composer 1.5 in February and moved on.

The pattern: take a Chinese open-weight model, run RL on coding tasks, ship it as a proprietary breakthrough, publish a cost-performance chart comparing yourself against Opus 4.6 and GPT-5.4 without disclosing that your base model was free, then raise another round.

That chart from the Composer 2 announcement deserves its own paragraph. Cursor plotted Composer 2 against frontier models on a price-vs-quality axis to argue they’d hit a superior tradeoff. What the chart doesn’t show is that Anthropic and OpenAI trained their models from scratch. Cursor took an open-weight model that Moonshot spent hundreds of millions developing, ran RL on top, and presented the output as evidence of in-house research. That’s margin arbitrage on someone else’s R&D dressed up as a benchmark slide.

The license makes this more than an attribution oversight. Kimi K2.5 ships under a Modified MIT License with one clause designed for exactly this scenario: if your product exceeds $20 million in monthly revenue, you must prominently display “Kimi K2.5” on the user interface. Cursor’s ARR crossed $2 billion in February. That’s roughly $167 million per month, 8x the threshold. The clause covers derivative works explicitly.

Cursor is valued at $29.3 billion and raising at $50 billion. Moonshot’s last reported valuation was $4.3 billion. The company worth 12x more took the smaller company’s model and shipped it as proprietary technology to justify a valuation built on the frontier lab narrative.

Three Composer releases in five months. Composer 1 caught speaking Chinese. Composer 2 caught with a Kimi model ID in the API. A P0 incident this year. And a benchmark chart that compares an RL fine-tune against models requiring billions in training compute without disclosing the base was free.

The question for investors in the $50 billion round: what exactly are you buying? A VS Code fork with strong distribution, or a frontier research lab? The model ID in the API answers that.

If Moonshot doesn’t enforce this license against a company generating $2 billion annually from a derivative of their model, the attribution clause becomes decoration for every future open-weight release. Every AI lab watching this is running the same math: why open-source your model if companies with better distribution can strip attribution, call it proprietary, and raise at 12x your valuation?

kimi-k2p5-rl-0317-s515-fast is the most expensive model ID leak in the history of AI licensing.

https://x.com/aakashgupta/status/2034933979720425611

Mandatory “find me on substack”

aibyaakash.com

“Babe, the man behind the first one man one billion dollar company reposted your tweet” 😲

Read 5 tweets

Aakash Gupta

@aakashgupta

Jan 28

If I were to learn PM again, I would start here.

15 steps to a PM job paying $200K+:

1. Understand the PM role

Start with how to break in: news.aakashg.com/p/how-to-break…

Watch Marty Cagan: open.spotify.com/episode/6KcmM7…

2. Learn PM fundamentals

Master PRDs: news.aakashg.com/p/product-requ…

Watch Dan Olsen: youtube.com/watch?v=sl7r3w…

Read 17 tweets

Aakash Gupta

@aakashgupta

Nov 8, 2025

Senior AI PMs make $306K (US). Big tech pays $550K+

I analyzed 250 AI PM job postings.

Here's what hiring managers want:

1. AI Product Strategy & Lifecycle (94%)

Define product vision, strategy & roadmap for AI/ML features from 0→1 through scale:

• Quality Envelopes → Define accuracy thresholds
• Cascade Architecture → Right-size models
• SUQS Framework → Track success pillars
• Evals-First → Define success first

🔗 AI Product Strategy → news.aakashg.com/p/ai-product-s…
🔗 AI Prototype to Production → news.aakashg.com/p/ai-prototype…
🔗 AI Roadmaps → news.aakashg.com/p/ai-roadmap
🔗 AI Product Sense → news.aakashg.com/p/ai-product-s…
🔗 How to Build AI → news.aakashg.com/p/how-to-build…

Read 9 tweets

Aakash Gupta

@aakashgupta

Oct 29, 2025

DUOLINGO COULDN'T TEACH ME HEBREW IN 2 YEARS

ChatGPT did it in just 3 months.

Here's the exact 8 Prompts I used:

1. Adaptive Daily Lesson Builder

"You're my personal Hebrew tutor. Build me a 20-minute lesson for today based on my current level [beginner/intermediate/advanced]. Include: 10 new vocabulary words with context, 1 key grammar concept with 3 examples, and 5 practice sentences I should translate. End with tomorrow's preview."

2. Real Conversation Partner

"Let's have a 10-minute conversation in Hebrew about [topic: weekend plans/work/hobbies]. Start each response in Hebrew, then add English corrections below using this format: '❌ You said X → ✅ Say Y (because Z)'. Adjust your Hebrew complexity to match my responses."

Read 9 tweets

Aakash Gupta

@aakashgupta

Oct 26, 2025

In product management, not everything is straight forward maths, or solvable by AI.

Yet, some PMs still make better decisions most of the time.

How?

That's product sense:

"The ability to find the right solution for the user and business, despite limited and ambiguous information."

I love this definition from @Sid Arora.

You start with the PM process:

1. Take a vague & ambiguous problem statement
2. Create, or clarify the overall goal
3. Identify all users in ecosystem
4. Pick 1-2 users
5. Identify major problems of the user
6. Select the problems to solve
7. Brainstorm for solutions
8. Select the highest ROI solution
9. Build and deploy the solution
10. Measure success / collect feedback

Read 12 tweets

Share this page!

Enter URL or ID to Unroll

Aakash Gupta

Try unrolling a thread yourself!

More from @aakashgupta

Aakash Gupta

Aakash Gupta

Aakash Gupta

Aakash Gupta

Aakash Gupta

Aakash Gupta

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!