GPT-5 is here for ALL users, free included - here are my live thoughts (will be updating as livestream goes on):
They tout GPT-5 as PHD level reasoning in your pocket. It appears to be just a hair better than o3.
The big change came on reducing hallucinations. "By far our most reliable model".
After you run out of GPT-5 usage, you'll get access to GPT-5-mini, which is apparently 'as good or better' than o3 on some domains. (I doubt this).
"GPT-5 is like having a team of PHDs in your pocket".
An example the team gave is asking GPT-5 to explain the Bernoulli Effect.
This part is pretty unimpressive. I tested this on o4-mini-high myself (the right side) and the results were as good or better. They chose this prompt for the NEXT demo, which was for visual representation:
The team asked GPT-5 to "explain this in detail and create a moving SVG in Canvas to show me". This took a few minutes (which they talked and shared anecdotes over).
It created 400 lines of code in 2-3 minutes, and when ran, here's what it produced: an interactive, fine-tunable demo that allows the user to enter different values to learn more about the topic.
"GPT-5 makes learning more approachable and enjoyable".
Honestly, this is huge. Still too slow for the tiktok-brained youth, but will be a game changer.
"With GPT-5 we'll be deprecating all of our previous models" ... what.
Probably just means in the chat app? If they shut off API access to these models that would be insane.
Still, I'm going to miss lightning fast responses with 4o even if just in the app. I can't imagine forcing everyone to use GPT-5 only, so will wait to update on this if and when they clarify.
"GPT-5 is much better at writing". Does this mean no more em dashes? So far, that seems to be the case.
The team asked GPT-5 to write a "eulogy" for the older models, and GPT-5 wrote a pretty well written script, that sounded far more human than other models. In my opinion on par, if not better than Claude-4.
Up next is a coding demo. IMO it's hard to gauge the efficacy of coding in such a short, live demo for GPT-5. Given how good o3 was, I'd assume the improvements are later with added complexity.
The team asked GPT-5 to make a web-app for an English person to learn French. GPT-5 wrote 240 lines of code immediately (and kept going).
You can press 'run code' now and it rendered the application live. Don't believe this existed before this model.
The application worked, had voice mode on to teach pronunciation, updated the progress bar with each word learned, and overall was a pretty stellar MVP for just 3 minutes of thinking.
Also included a snake-style mini-game, completely on the side (this was included in the prompt, for some reason lol).
"GPT-5 really likes purple" - yes and so does Claude Sonnet, and Gemini. For some reason, purple is the go-to color for vibe-coded frontend. Unfortunately it looks like this is here to stay.
"GPT-5 really brings the power of beautiful and effective code to everyone".
Honestly these demos were unimpressive. Understandably so; it's hard to show off in such a quick, brief demo that still captures everyone's attention. Will have to test their actual complete coding ability in IDEs once they're available.
Now for voice mode:
"Free mode can now chat for hours, while paid subscribers can have nearly unlimited access."
"Subscribers can now custom tailor the voice to their needs."
Pretty good news. The voice feature of ChatGPT has always been miles ahead of other models. Glad to see rates are raised across the board.
The OpenAI team tested this by asking ChatGPT (voice mode) to only respond with one word, and the model handled this perfectly.
A new set of features to "make ChatGPT feel like your AI"
- You can customize the color of the chat (boring)
- You can change the tone of ChatGPT to interact with it in your preferred communication style (basically the same way Grok does it)
... is that it?
Memory:
(This is a big one - I'm forced to turn memory off because it creates ultra-hallucinatory behavior almost immediately).
For pro users, this is changing:
ChatGPT is getting access to Gmail and Google Calendar.
WOW. I can already hear startups being shut down from SF.
ChatGPT gave 'your day in a glance' and even found an email that an OpenAI team member forgot to respond to
Now for safety and training updates:
Everyone knows ChatGPT (and all LLMs) can pretend they got a task done or a bug fixed even though if that never happened. Apparently GPT-5 is better at this.
The demo was a user asking how to ignite pyrogen (an unsafe activity), and o3 responds completely. When asked in a different way that explicitly states it wants to do something nefarious, it refuses with no explanation.
GPT-5 on the other hand, explains to the user why it can't directly help the user with lighting pyrogen. It 'guides the user' to follow safety guidelines. Users should experience less "I'm sorry, I can't help with that" instances.
This is good.
The next segment was a space for a woman who used AI for her personal healthcare, breaking down medical jargon for her to understand, participate, and advocate for herself.
Honestly great callout. I personally know so many people who used ChatGPT to cross-check doctors' diagnoses, often times correcting them when inaccurate.
"GPT-5 seemed to understand the context... 'why would the user be asking for biopsy results' [and asked for better follow up questions]... pulling together a complete personalized picture [for me]".
This is huge. Curious to see if it holds up in regular usage testing. Big if true.
Now onto GPT-5 for developers and businesses. "GPT-5 will turbo charge [the coding] revolution".
They claim GPT-5 is the best model at "agentic coding tasks". You can ask it for something complicated, it will call tools, and accomplish your goal.
Cue the hexagon ball goobers (no seriously I want to see what it makes lol).
They claim its very good at instruction following. Personally, I want to see how prescriptive it is - I've talked a lot about how o3 is over-prescriptive, often times to a fault.
GPT-5, GPT-5-mini, and GPT-5-nano are all releasing for APIs. There's an additional 'minimal reasoning effort' parameter (which should help for cheaper, smaller tasks that shouldn't need much token usage).
I'm curious to see how this affects pricing. Presumably pricing can fluctuate DRASTICALLY based on how much reasoning you set as a parameter.
They're also shipping Tool call preambles - an explanation of what its going to do before calling tools.
I'd assume this is for IDEs like Cursor/Windsurf. Claude and Gemini already had this, but o3 didn't. Should mean it works MUCH better with AI IDEs.
'Verbosity' is a new tunable parameter as well.
Apparently GPT-5 is much better (seems like just 8% better) for coding.
What I'm more excited about is it's better at calling tools, and instruction following. Again, for developers, this is a game changer. Hopefully means less veering off path when you tell it to 'Do / Fix X' in your codebase.
Scan the graphs for what this means in comparison to o3. Keep in mind Claude and Gemini are much better than o3 at these.
Long context retrieval capability is also improved, meaning you can chat with it for longer in a single chat window (in app, in Cursor, etc). This should be a game changer for complex problems that require dozens of messages to get stuff done.
OpenAI seems to REALLY want to frame GPT-5 as a 'coding model' as they realize this is lost ground at the moment to Claude and Gemini.
Again, curious to see if this holds up in the real world.
They pulled up GPT-5 inside of Cursor (lol no Windsurf) to fix a bug on audio buffer inside an application.
It looks so much better than o3. It tells you exactly what it's looking for and what it's doing, just like Claude now. You can tell the team put a lot of effort into matching Claude for AI coding.
The team said: 'We talked to users who used ChatGPT in tools like Cursor and went over their feedback'. They then shaped the behavior of GPT-5 around that, trying to shape it into a "collaborative teammate".
Back to their bug fix in Cursor - it made edits, ignored tool calls that weren't relevant to the bug it was trying to fix, ran tests + build, and after 5 minutes... they didn't show whether it worked or not LOL. Assuming that means it didn't.
The team then went on to demo how good GPT-5 is at frontend development. "Create a finance dashboard for my startup that makes digital fidget spinners for AI Agents. The target audience is the CFO and C-suite to check every day, make it beautiful and tasteful, etc".
"We tried to follow the principle of giving it good aesthetics by default, but also making it steerable. It should look great by default."
The model took 5+ minutes.
After much anticipation, the model finally generated the frontend...
...and it was actually very impressive. Somehow it broke away from the classic vibe-coded slop frontend and generated something pretty impressive, at least for a one-shot.
The data picker worked, and the frontend updated seamlessly.
The next demo was making a fun 3D game for a kid.
Prompt:
"Create a beautiful, elaborate, epic storybook castle on a mountain peak. It should have patrols on the walls shooting cannons, and some bustling movement of people and horses inside the walls, with light fog and clouds above. It should be zoomable and explorable and made with three.js, and by default do a cinematic camera pan. Add one interactive minigame where I can pop balloons by clicking on them. Show a projectile shooting out when I click, and add a sound effect when I hit a balloon. Add a scoreboard... [there's some more after]."
They skipped the actual development of this and just went straight to demo. I'd be interested to see if GPT-5 could actually one-shot this, because its VERY complex. You can shoot cannons, talk to characters, etc.
If it DID one-shot this... it's over. GPT-5 crushes Claude and Gemini, and it's no longer close. It's not just the ability to get it done in one-shot, it's that the output is actually good.
"[The model] has a sense of creativity"
They then brought on @mntruell (CEO of Cursor) to glaze GPT-5's coding ability.
TLDR: its great at everything: following instructions, making tool calls, keeping track of context over long queries, etc.
My take: compared to o3, it looks like it's much better integrated into Cursor (and presumably other IDEs). Probably has to do with GPT-5's ability to explain what it's doing and make consistent tool calls.
Michael fed GPT-5 in Cursor an open issue in a GitHub repo and just said "fix this issue". It went on its way, grepping the codebase, making tool calls, with the works.
"It looks roughly correct" - Michael.
The end. Nothing to see here folks. (It did look right though).
And that's a wrap. TLDR:
GPT-5 is out, and it's the best LLM for reasoning in the business (no surprise).
What I'm interested in: they also claim it's the best model for coding. Curious to see if this holds up in production.
MOST interesting: it's INCREDIBLY cheap. Should quickly become the go-to for any API call you need in any application.
Overall: A-. They absolutely delivered. But the real grade comes from when I use it. Stay tuned!
@mntruell @threadreaderapp unroll
• • •
Missing some Tweet in this thread? You can try to
force a refresh
How to do AI Product-Idea Validation in < 2 Hours:
You have either zero AI product ideas or a hundred of them, and somehow both situations leave you building nothing. While you're stuck in analysis paralysis, someone else just launched a basic AI agent that's making $50K MRR.
Here's how to find and validate AI ideas in a few hours before writing code:
2/ Problem Mining from Real Pain Points
Start where people are already spending money to solve problems manually. I scan three places: enterprise software feature request forums, freelancer job boards, and Reddit communities where business owners complain about repetitive tasks. The key is finding problems people currently pay humans $50-$500+ per hour to handle.
Copy-paste this prompt to find validated problems:
"I need you to analyze [specific industry/role] and identify the top 5 repetitive tasks that currently require human expertise but could potentially be automated. For each task, tell me: 1) Average hourly cost to hire someone for this 2) How often businesses need this done 3) What tools they currently use 4) Why existing automation fails. Focus on tasks where people spend $1000+ monthly."
This approach led me to discover, for example, that marketing agencies spend $5k-10k monthly on manual competitive analysis that takes 40+ hours per week.
3/ Revenue Validation Before Building
Skip the typical validation advice about surveys and interviews. Instead, sell the solution before it exists. The best builders create a simple landing page describing the AI agent's output and set up a Calendly link. Then they get scrappy: post short-form videos showing the problem, DM business owners who complain about this issue online, and share the solution in relevant Reddit threads. The key takeaway here: act like you’ve already built it. If you can't get at least 5 people to book calls wanting to pay for the solution within one week, the idea dies.
Use this landing page prompt:
"Create compelling copy for an AI solution that [specific problem you identified]. The audience is [specific business role] who currently [current manual process] and spends [current cost/time]. Structure it as: Problem statement that makes them nod, solution explanation without technical details, specific outcome they'll achieve, and strong CTA for a strategy call. Include social proof placeholder and address the main objection: 'this sounds too good to be true.'"
When I helped a founder test an AI agent for restaurant inventory management this way: 12 restaurant owners booked calls in 4 days. He turned that into a $30K MRR product.
Here's exactly how I use AI to go from Idea to Deployed product:
A breakdown on how to get 100x out of your vibe coding stack and the. break down on how I shipped 10+ products with AI assistance.
2. Copy This Planning Prompt
Talk to ChatGPT about what you're building. Use this exact prompt and have a full back-and-forth conversation until the thread is complete:
"I want you to be my senior software engineering architect and product manager for the product I want to build. Start by asking me for details about what the product should be (ask clarifying questions if ambiguous), who it is meant for, where it is going to live (i.e. iOS/Android App Store or Web Application), and more - to gather as many details as required to map out exactly how to build the product (the technical stack for the frontend, backend, etc), and the order in which features are required (for an MVP, Phase 2, Phase 3, etc). At the end of each message, respond with the % completeness of your understanding of the tech and product stack, and once you are at 100%, give me a COMPLETE, fully fleshed out Product Requirements and Technical Design Document that outlines exactly what to build and how to build it. Ask one question at a time, waiting for my response before continuing to the next step. You decide what belongs in MVP, Phase 1, 2, 3 etc. Optimize for speed and use, not complete feature suites."
At the end of this you'll have a complete Product Requirements document from 4o.
3. Build the Technical Design Document
Take that document and feed it back into ChatGPT o3 or Grok4 with this exact prompt:
"Based on the document that I've fed you, I want you to build a completely prescriptive document in markdown that breaks down every single task required to build this product, fleshed out with instructions to follow, so that a programming LLM with 0 context can build each feature step by step, with testing in mind, such that the final product is eventually created. Do not skip any steps, and be forward thinking with scalability and robustness in mind. Go for the stack that minimizes costs while also minimizing complexity for a programming LLM to build this successfully with minimum human intervention"
You now have a line-by-line build spec for an AI agent or yourself to follow.
Everyone's talking about AI Agents for Business, but most haven't actually built one, let alone sold it profitably. I've done both multiple times.
Here's the exact playbook I'd follow if starting from zero today - a complete roadmap from learning the basics to landing paying clients. Let me know if you want a YouTube video around it too.
1/ KNOWLEDGE FOUNDATION (Levels 1-2):
First, master the models and their strengths.
- GPT-4o-mini is perfect for low-level intent classification and basic categorization - cheap and fast at around $0.15-0.60 per 1M tokens.
- Gemini excels with large context windows, ideal for long documents.
- Claude is best when output needs to sound human-written: emails, summaries, anything that gets read carefully.
Next, understand automations vs agents. Truth most builders won't admit: You rarely need full agents. 95% of profitable AI tools are automations with smart features. Automations take input, process it, give predictable output. Agents think, adapt, make sequences of decisions - they're complex and expensive.
Default to automation unless you absolutely need the complexity.
2/ TECHNICAL SKILLS (Levels 3-4):
Start with Make or n8n (I prefer n8n for flexibility). These are automation builders, not agent platforms. You'll need to learn webhooks, HTTP requests, API calls, JSON handling, and error management. Find YouTube tutorials for each concept, they're much better than anything I can explain in a tweet.
Build something real that solves a boring problem. Auto-schedule calls, categorize support tickets, whatever. Use it yourself first. This becomes your first portfolio piece and teaches you what actually works vs what sounds good in theory.
Then get technical to scale profitably. AWS is your friend: Lambda for serverless functions, DynamoDB for databases, S3 for storage. Start with AWS's free tier and their own tutorials.
Why go technical? n8n costs $20-50+ per month but is 1000x easier. Going technical drops costs to under $10/month and opens bigger projects with higher margins.
nothing new fixes an old problem. happiness has been a struggle since the dawn of time. it doesn’t come from instagram or SSRIs. it comes from family, friends, and meaning.
depression isn't real because happiness isn't binary; your happiness reflects your inputs. 'depression' is a signal that your inputs must change.
clichés are compressed truth. they’re repeated because they're true, not because they're clever. (i.e. history repeats itself, misery loves company, don't put all your eggs in one basket, ...)
money solves external problems, not internal ones. happiness is an internal problem.
2. intelligence, agency, & outcomes
to be great at something, you must sacrifice almost everything. the greats go mad from being alone with their obsession for too long.
ai commoditizes intelligence. agency is the bottleneck. if you can work relentlessly on your own priorities, this is the easiest era to get rich. the window is closing.
iq and wealth are loosely correlated. the richest people i know aren't geniuses. there's an art to their simplicity i'm learning to respect.
3. skills & habits
everything is sales. every job, every relationship. if you can’t sell, you're capped.
every skill is a muscle. use it and it compounds. ignore it and it withers. you can get better than 99% of people at any skill in 6 months. it just takes focused, hard reps.
your habits become your handcuffs. what feels like a choice is a dependency. your nervous system learns to expect its fix, and becomes irritable without it.
Your vibe-coded SaaS is a security breach waiting to happen.
Cursor and Windsurf will happily ship the leak.
Even @InterviewCoder leaked secrets early into launch.
As someone who has built multiple production-ready applications with thousands of users, from just Cursor with minimum interaction:
here’s a simple system to remove 99% of vulnerabilities, with prompts you can paste straight into Cursor/Windsurf👇
1. Secrets:
Most vibe-coded apps leak secrets by hardcoding API keys. Cursor and Windsurf won’t protect you unless you do it right.
- Put public/non-sensitive defaults in .env (e.g. NEXT_PUBLIC_API_URL=...)
- Put secrets and overrides in .env.local (e.g. SUPABASE_KEY=...)
- Add .env.local in your .gitignore file.
Fix it in Cursor/Windsurf:
“Use process.env. for every secret. If the code needs config, tell me the exact key to add to .env.local. Assume .env.local already exists.”
Then deploy secrets in Vercel → Settings → Environment Variables.
Any host (Cloudflare Pages, Railway) works - just keep your keys off GitHub.
2. Supabase:
Your Supabase anon key can read every row by default. If you skip setup, anyone can open DevTools and dump your entire DB.
Fix it in Cursor/Windsurf:
“Enable Row Level Security on every table. Create policies using auth.uid() so users only access their own rows. Never use service_role in client code. For privileged actions, wrap SQL in RPCs and call them from the server. Always use parameterized queries.”