Figma just closed the last excuse PMs had for not shipping polished UI from AI code.
The loop is now complete. Claude Code generates UI. It goes straight into Figma as editable frames. Designers tweak it. Figma MCP sends it back to Claude Code. The entire design-to-engineering handoff cycle that used to take 2-3 weeks now runs in a single session.
This tells you something about where the real constraint in product development has been. PMs always said the bottleneck was getting designs into code. Figma just proved the actual bottleneck was the opposite direction: getting code into a form designers could touch without starting over.
The implication for AI PMs specifically is that “I need to wait for design” stops being a valid dependency. You can prototype flows in Claude Code, push to Figma, get visual feedback in the same afternoon, and iterate without scheduling a sprint.
What makes this particularly sharp: Figma didn’t build an AI code tool. They built a bridge that makes their existing canvas the canonical source of truth for anything AI generates. Every AI coding tool that produces UI now feeds Figma. That’s the real product decision here.
The design tool became the AI output layer without writing a single line of AI.
The companies building AI products that actually work are running 12.8 eval experiments per day. Here is the playbook with @ankrgyl, Founder and CEO of @braintrust ($800M valuation, behind Vercel, Replit, Ramp, Zapier, Notion, Airtable):
⏱ 1:43 Why vibe checks stop scaling
⏱ 6:35 Evals are the new PRD
⏱ 8:45 The Claude Code evals controversy
⏱ 18:48 Building an eval live from zero
⏱ 29:51 Connecting Linear MCP and iterating
⏱ 39:12 Why you need evals that fail
⏱ 43:36 Offline vs online evals
⏱ 47:40 Three mistakes killing eval culture
The core framework: every eval is exactly three things. A set of inputs your product needs to handle. A task that takes those inputs and generates outputs. A scoring function that produces a number between 0 and 1.
We built one from scratch on camera. Score went from 0 to 0.75 in under 20 minutes.
Cursor is raising at a $50 billion valuation on the claim that its “in-house models generate more code than almost any other LLMs in the world.” Less than 24 hours after launching Composer 2, a developer found the model ID in the API response: kimi-k2p5-rl-0317-s515-fast.
That’s Moonshot AI’s Kimi K2.5 with reinforcement learning appended. A developer named Fynn was testing Cursor’s OpenAI-compatible base URL when the identifier leaked through the response headers. Moonshot’s head of pretraining, Yulun Du, confirmed on X that the tokenizer is identical to Kimi’s and questioned Cursor’s license compliance. Two other Moonshot employees posted confirmations. All three posts have since been deleted.
This is the second time. When Cursor launched Composer 1 in October 2025, users across multiple countries reported the model spontaneously switching its inner monologue to Chinese mid-session. Kenneth Auchenberg, a partner at Alley Corp, posted a screenshot calling it a smoking gun. KR-Asia and 36Kr confirmed both Cursor and Windsurf were running fine-tuned Chinese open-weight models underneath. Cursor never disclosed what Composer 1 was built on. They shipped Composer 1.5 in February and moved on.
The pattern: take a Chinese open-weight model, run RL on coding tasks, ship it as a proprietary breakthrough, publish a cost-performance chart comparing yourself against Opus 4.6 and GPT-5.4 without disclosing that your base model was free, then raise another round.
That chart from the Composer 2 announcement deserves its own paragraph. Cursor plotted Composer 2 against frontier models on a price-vs-quality axis to argue they’d hit a superior tradeoff. What the chart doesn’t show is that Anthropic and OpenAI trained their models from scratch. Cursor took an open-weight model that Moonshot spent hundreds of millions developing, ran RL on top, and presented the output as evidence of in-house research. That’s margin arbitrage on someone else’s R&D dressed up as a benchmark slide.
The license makes this more than an attribution oversight. Kimi K2.5 ships under a Modified MIT License with one clause designed for exactly this scenario: if your product exceeds $20 million in monthly revenue, you must prominently display “Kimi K2.5” on the user interface. Cursor’s ARR crossed $2 billion in February. That’s roughly $167 million per month, 8x the threshold. The clause covers derivative works explicitly.
Cursor is valued at $29.3 billion and raising at $50 billion. Moonshot’s last reported valuation was $4.3 billion. The company worth 12x more took the smaller company’s model and shipped it as proprietary technology to justify a valuation built on the frontier lab narrative.
Three Composer releases in five months. Composer 1 caught speaking Chinese. Composer 2 caught with a Kimi model ID in the API. A P0 incident this year. And a benchmark chart that compares an RL fine-tune against models requiring billions in training compute without disclosing the base was free.
The question for investors in the $50 billion round: what exactly are you buying? A VS Code fork with strong distribution, or a frontier research lab? The model ID in the API answers that.
If Moonshot doesn’t enforce this license against a company generating $2 billion annually from a derivative of their model, the attribution clause becomes decoration for every future open-weight release. Every AI lab watching this is running the same math: why open-source your model if companies with better distribution can strip attribution, call it proprietary, and raise at 12x your valuation?
kimi-k2p5-rl-0317-s515-fast is the most expensive model ID leak in the history of AI licensing.
"You're my personal Hebrew tutor. Build me a 20-minute lesson for today based on my current level [beginner/intermediate/advanced]. Include: 10 new vocabulary words with context, 1 key grammar concept with 3 examples, and 5 practice sentences I should translate. End with tomorrow's preview."
2. Real Conversation Partner
"Let's have a 10-minute conversation in Hebrew about [topic: weekend plans/work/hobbies]. Start each response in Hebrew, then add English corrections below using this format: '❌ You said X → ✅ Say Y (because Z)'. Adjust your Hebrew complexity to match my responses."
In product management, not everything is straight forward maths, or solvable by AI.
Yet, some PMs still make better decisions most of the time.
How?
That's product sense:
"The ability to find the right solution for the user and business, despite limited and ambiguous information."
I love this definition from @Sid Arora.
You start with the PM process:
1. Take a vague & ambiguous problem statement 2. Create, or clarify the overall goal 3. Identify all users in ecosystem 4. Pick 1-2 users 5. Identify major problems of the user 6. Select the problems to solve 7. Brainstorm for solutions 8. Select the highest ROI solution 9. Build and deploy the solution 10. Measure success / collect feedback