Latest Twitter Threads by @aakashgupta on Thread Reader App

Mar 20 • 4 tweets • 2 min read

Evals are the new PRD.

The companies building AI products that actually work are running 12.8 eval experiments per day. Here is the playbook with @ankrgyl, Founder and CEO of @braintrust ($800M valuation, behind Vercel, Replit, Ramp, Zapier, Notion, Airtable):

⏱ 1:43 Why vibe checks stop scaling
⏱ 6:35 Evals are the new PRD
⏱ 8:45 The Claude Code evals controversy
⏱ 18:48 Building an eval live from zero
⏱ 29:51 Connecting Linear MCP and iterating
⏱ 39:12 Why you need evals that fail
⏱ 43:36 Offline vs online evals
⏱ 47:40 Three mistakes killing eval culture

The core framework: every eval is exactly three things. A set of inputs your product needs to handle. A task that takes those inputs and generates outputs. A scoring function that produces a number between 0 and 1.

We built one from scratch on camera. Score went from 0 to 0.75 in under 20 minutes.

@ankrgyl @braintrust YouTube:

Mar 20 • 5 tweets • 4 min read

Cursor is raising at a $50 billion valuation on the claim that its “in-house models generate more code than almost any other LLMs in the world.” Less than 24 hours after launching Composer 2, a developer found the model ID in the API response: kimi-k2p5-rl-0317-s515-fast.

That’s Moonshot AI’s Kimi K2.5 with reinforcement learning appended. A developer named Fynn was testing Cursor’s OpenAI-compatible base URL when the identifier leaked through the response headers. Moonshot’s head of pretraining, Yulun Du, confirmed on X that the tokenizer is identical to Kimi’s and questioned Cursor’s license compliance. Two other Moonshot employees posted confirmations. All three posts have since been deleted.

This is the second time. When Cursor launched Composer 1 in October 2025, users across multiple countries reported the model spontaneously switching its inner monologue to Chinese mid-session. Kenneth Auchenberg, a partner at Alley Corp, posted a screenshot calling it a smoking gun. KR-Asia and 36Kr confirmed both Cursor and Windsurf were running fine-tuned Chinese open-weight models underneath. Cursor never disclosed what Composer 1 was built on. They shipped Composer 1.5 in February and moved on.

The pattern: take a Chinese open-weight model, run RL on coding tasks, ship it as a proprietary breakthrough, publish a cost-performance chart comparing yourself against Opus 4.6 and GPT-5.4 without disclosing that your base model was free, then raise another round.

That chart from the Composer 2 announcement deserves its own paragraph. Cursor plotted Composer 2 against frontier models on a price-vs-quality axis to argue they’d hit a superior tradeoff. What the chart doesn’t show is that Anthropic and OpenAI trained their models from scratch. Cursor took an open-weight model that Moonshot spent hundreds of millions developing, ran RL on top, and presented the output as evidence of in-house research. That’s margin arbitrage on someone else’s R&D dressed up as a benchmark slide.

The license makes this more than an attribution oversight. Kimi K2.5 ships under a Modified MIT License with one clause designed for exactly this scenario: if your product exceeds $20 million in monthly revenue, you must prominently display “Kimi K2.5” on the user interface. Cursor’s ARR crossed $2 billion in February. That’s roughly $167 million per month, 8x the threshold. The clause covers derivative works explicitly.

Cursor is valued at $29.3 billion and raising at $50 billion. Moonshot’s last reported valuation was $4.3 billion. The company worth 12x more took the smaller company’s model and shipped it as proprietary technology to justify a valuation built on the frontier lab narrative.

Three Composer releases in five months. Composer 1 caught speaking Chinese. Composer 2 caught with a Kimi model ID in the API. A P0 incident this year. And a benchmark chart that compares an RL fine-tune against models requiring billions in training compute without disclosing the base was free.

The question for investors in the $50 billion round: what exactly are you buying? A VS Code fork with strong distribution, or a frontier research lab? The model ID in the API answers that.

If Moonshot doesn’t enforce this license against a company generating $2 billion annually from a derivative of their model, the attribution clause becomes decoration for every future open-weight release. Every AI lab watching this is running the same math: why open-source your model if companies with better distribution can strip attribution, call it proprietary, and raise at 12x your valuation?

kimi-k2p5-rl-0317-s515-fast is the most expensive model ID leak in the history of AI licensing.

https://x.com/aakashgupta/status/2034933979720425611

Mandatory “find me on substack”

aibyaakash.com

Jan 28 • 17 tweets • 7 min read

If I were to learn PM again, I would start here.

15 steps to a PM job paying $200K+:

1. Understand the PM role

Start with how to break in: news.aakashg.com/p/how-to-break…

Watch Marty Cagan: open.spotify.com/episode/6KcmM7…

Nov 8, 2025 • 9 tweets • 6 min read

Senior AI PMs make $306K (US). Big tech pays $550K+

I analyzed 250 AI PM job postings.

Here's what hiring managers want:

1. AI Product Strategy & Lifecycle (94%)

Define product vision, strategy & roadmap for AI/ML features from 0→1 through scale:

• Quality Envelopes → Define accuracy thresholds
• Cascade Architecture → Right-size models
• SUQS Framework → Track success pillars
• Evals-First → Define success first

Oct 29, 2025 • 9 tweets • 2 min read

DUOLINGO COULDN'T TEACH ME HEBREW IN 2 YEARS

ChatGPT did it in just 3 months.

Here's the exact 8 Prompts I used: 1. Adaptive Daily Lesson Builder

"You're my personal Hebrew tutor. Build me a 20-minute lesson for today based on my current level [beginner/intermediate/advanced]. Include: 10 new vocabulary words with context, 1 key grammar concept with 3 examples, and 5 practice sentences I should translate. End with tomorrow's preview."

Oct 26, 2025 • 12 tweets • 4 min read

In product management, not everything is straight forward maths, or solvable by AI.

Yet, some PMs still make better decisions most of the time.

How?

That's product sense:

"The ability to find the right solution for the user and business, despite limited and ambiguous information."

I love this definition from @Sid Arora.

Oct 22, 2025 • 12 tweets • 3 min read

You should vibe code a PM portoflio.

Only 17% of PMs have a portfolio.

So you should differentiate with one.

Here's how:

1. Headline

This is the most important part! You want to describe your unicorn candidate-market fit. This is your opportunity to make a bumpy career look like a straight line to a sector of the market.

Oct 21, 2025 • 12 tweets • 5 min read

ChatGPT is just the tip of the iceberg.

PMs should use 10 buckets of AI tools:

DISCOVERY

1. Prototyping
Yes, this is an amazing PM use case.

Tools like Lovable, Bolt, v0, Magic Patterns, and Base44 are great for improving your discovery work.

🔗 Guide: news.aakashg.com/p/ai-prototypi…

Oct 15, 2025 • 9 tweets • 5 min read

Product discovery is the no. 1 skill for PMs.

Here's how to use AI to 10x your discovery:

𝗦𝗧𝗘𝗣 𝟭: 𝗔𝗜 𝗠𝗮𝗿𝗸𝗲𝘁 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲

Market analysis used to be a lengthy, research-heavy process that often required access to paid reports.

Now, replace that with:

a. Market sizing by Perplexity
b. Validation with 3 expert calls
c. Synthesizing sources with NotebookLM

Oct 6, 2025 • 12 tweets • 2 min read

This guy literally builds n8n AI Agents with 1 prompt

I'll share the prompt in a second but here's why it works:

1. Context Dumping Done Right

Most people give AI a vague idea and wonder why they get vague results. This prompt dumps everything: use case, data sources, outputs, integrations. The AI gets your entire ecosystem, not just fragments.

Sep 29, 2025 • 10 tweets • 2 min read

BYE NETFLIX. BYE HULU.

No more $24.99 a month.
ChatGPT turned my laptop into a free streaming hub.

Here are 9 prompts to try yourself: 1/9 - Platform Discovery Engine

"Find and evaluate 15 legal free streaming platforms available in [YOUR COUNTRY]. For each platform, provide: content library size, video quality (720p/1080p/4K), ad frequency, device compatibility, signup requirements, standout features, and direct links. Rank them by overall value and highlight the top 3 for different use cases (movies, TV shows, documentaries)."

Sep 28, 2025 • 11 tweets • 3 min read

Anthropic literally dropped an entirely free roadmap to AI mastery

1. 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗔𝗜 𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 & 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹 𝗦𝗸𝗶𝗹𝗹𝘀

Speak confidently about AI in meetings and strategy discussions.

anthropic.skilljar.com/ai-fluency-fra…

Sep 20, 2025 • 9 tweets • 1 min read

MY COUSIN WAS UNEMPLOYED FOR 7 MONTHS

I rebuilt his resume and interview answers using ChatGPT.
28 days later, he got hired.

Here are 8 prompts that fixed everything: 1. Resume Optimizer

“Here’s my resume: [paste text]. Rewrite it for [job title]. Highlight measurable results, relevant skills, and tailor formatting for recruiters + ATS.”

Sep 18, 2025 • 10 tweets • 2 min read

THIS IS WHY AIRLINES HATE CHATGPT

Flight for $879. I paid $299.
No points. No memberships. No VPN.

Here are 8 prompts I used to travel like a pro: 1. Cheapest Dates Scanner

“Find the lowest-cost departure and return dates within a [X-day] window around [target date]. Highlight 2–3 best combos.”

Sep 16, 2025 • 17 tweets • 5 min read

16 awesome resources that will get you hired as a Product Manager 👇

Whether you're interviewing at Big Tech or a startup, this list covers what you actually need to stand out - from writing a resume to tackling product strategy questions and behavioral interviews.

1. PM interview preparation tips from Amazon:
🔗 amazon.jobs/content/en/how…

Sep 14, 2025 • 11 tweets • 2 min read

This guy literally shared a step-by-step roadmap to build your first AI agent - and it’s gold.

No hype. No jargon. Just a practical, battle-tested path to get from 0 → 1.

👇 Here’s the exact recipe I follow:

Sep 7, 2025 • 11 tweets • 3 min read

CPO at Microsoft says Prompt Sets are the new PRDs.

Here's what you need to know:

Microsoft has a $3.68T market cap. So when their CPO @aparnacd is talking about PRDs, a lot of folks listen. In fact, many of you asked me about it.

So, let's talk about it:

1. What Prompt Sets Are
2. How They Fit In
3. My Thoughts
4. Next Steps

Aug 25, 2025 • 14 tweets • 4 min read

Even Apple is hiring for AI PMs.

Don't let anyone tell you it's a made up role.

Here's a free roadmap to become one:

1. Learn the foundations of AI:

news.aakashg.com/p/ai-foundatio…

Aug 18, 2025 • 11 tweets • 3 min read

AI killed the 10 page PRD. But the PRD isn't dead.

Here's how to write PRDs in the AI age:

Companies like Google are shifting to a build-first culture. And it's good. AI prototyping tools have completely changed the game, blurring role lines.

Aug 15, 2025 • 12 tweets • 3 min read

Netflix is paying $900K for AI PMs. Meta is paying $1M+.

Here's how to become one:

𝟭. 𝗦𝗵𝗮𝗽𝗲 𝗬𝗼𝘂𝗿 𝗖𝗼𝗻𝘁𝗲𝗻𝘁 𝗗𝗶𝗲𝘁

Follow the right voices:

YouTube: Jeff Su, Matt Wolfe, Dwarkesh Patel
Newsletters: Product Growth, Nate Jones, DeepLearning
Twitter: Santiago (svpino), Min (minchoi), Paul (itsPaulAI)
LinkedIn: Zain Kahn, Allie K. Miller, Ruben Hassid

Jul 29, 2025 • 10 tweets • 9 min read

If I had to learn AI PM again, I would start here:

($2,000 course totally free)

1️⃣ AI PM Foundations

@marilynika: How to become an AI PM - youtube.com/watch?v=GQd54g…

Sam Kawsarani: 15+ Years PM in Era of AI - tinyurl.com/2txs9mab

@_amankhan: Prototyping → Observability → Evals - youtube.com/watch?v=Ej4pBD…

Share this page!

Enter URL or ID to Unroll