Robert Youssef Profile picture
Sep 14, 2025 1 tweets 1 min read Read on X
Fuck it.

I'm sharing the 10 Gemini prompts that built my entire SaaS from scratch.

These prompts literally replaced my CTO, lead dev, and product manager.

Comment 'send' and I'll DM you the complete Gemini guide to master it:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Robert Youssef

Robert Youssef Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rryssf_

Feb 15
researchers at Max Planck analyzed 280,000 transcripts of academic talks and presentations from YouTube

they found that humans are increasingly using ChatGPT's favorite words in their spoken language. not in writing. in speech.

"delve" usage up 48%. "adept" up 51%. and 58% of these usages showed no signs of reading from a script.

we talk about model collapse when AI trains on AI output. this is model collapse, except the model is us.Image
here's how they tested it.

Yakura et al. collected videos from 20,000+ academic YouTube channels. transcribed everything with Whisper (not YouTube's own transcriptions, which they found had introduced bias from switching models). applied piecewise linear regression with ChatGPT's release date as the change point.

then the clever part: they compared against the same analysis using change points 1 and 2 years before ChatGPT's release. no comparable trend shift at those dates. the acceleration is specific to when ChatGPT entered the world.

to identify which words to track, they used a dataset of 10,000 human-written abstracts vs their ChatGPT-edited versions. ranked words by how much more frequently ChatGPT uses them compared to humans. then checked whether those specific words were accelerating in spoken academic language.

they were.
the top 20 words most distinctive to ChatGPT showed a statistically significant acceleration in spoken usage after November 2022.

> "delve" increased 48% in 18 months
> "realm" increased 35%
> "meticulous" increased 40%
> "adept" increased 51%

and the correlation between how much ChatGPT prefers a word and how much that word accelerated in human speech: r = 0.63, p < 0.01.

the bottom-ranked words (ones ChatGPT uses less than humans) showed no significant trend change at all.

this isn't a general vocabulary shift. it's specifically the words ChatGPT favors that are spreading into how people talk.Image
Read 8 tweets
Feb 14
Stanford and Caltech researchers just published the first comprehensive taxonomy of how llms fail at reasoning

not a list of cherry-picked gotchas. a 2-axis framework that finally lets you compare failure modes across tasks instead of treating each one as a random anecdote

the findings are uncomfortableImage
the framework splits reasoning into 3 types: informal (intuitive), formal (logical), and embodied (physical world)

then it classifies failures into 3 categories: fundamental (baked into the architecture), application-specific (breaks in certain domains), and robustness issues (falls apart under trivial changes)

this gives you a 3x3 grid. a model can ace one cell and completely collapse in another. and a single benchmark score hides which cells are brokenImage
the reversal curse is the clearest example of a fundamental failure

GPT-4 answers "who is Tom Cruise's mother?" correctly. ask the reverse, "who is Mary Lee Pfeiffer's son?" and it fails

trained on "A is B" but can't infer "B is A." a trivial logical step for a 5-year-old

and here's the part that matters: scaling doesn't fix it. the reversal curse appears robustly across transformer sizes
Read 10 tweets
Feb 13
new paper argues LLMs fundamentally cannot replicate human motivated reasoning because they have no motivation

sounds obvious once you hear it. but the implications are bigger than most people realize

this quietly undermines an entire category of AI political simulation researchImage
motivated reasoning is when humans distort how they process information because they want to reach a specific conclusion

you don't evaluate evidence neutrally. you filter it through what you already believe, what you want to be true, what protects your identity

it's not a bug. it's how human cognition actually works in the wild
the paper's argument is deceptively simple:

LLMs operate on purely cognitive input. they have no desires, no identity to protect, no conclusion they're motivated to reach

so when researchers prompt GPT-4 or Claude with political scenarios and measure "motivated reasoning," they're not replicating the phenomenon. they're replicating the surface pattern without the underlying mechanism

the behavior might look similar. the cause is completely different
Read 10 tweets
Feb 12
SemiAnalysis just published data showing 4% of all public GitHub commits are now authored by Claude Code.

their projection: 20%+ by year-end 2026.

in the same week, Goldman Sachs revealed it embedded Anthropic engineers for 6 months to build autonomous accounting agents.

a thread on the week ai stopped being a tool and started being a coworker:Image
let's start with the Goldman story because it's the one that should make every back-office professional pause.

Goldman's CIO told CNBC they were "surprised" at how capable Claude was beyond coding. accounting, compliance, client onboarding, KYC, AML.

his exact framing: "digital co-workers for professions that are scaled, complex, and very process intensive."

not chatbots answering FAQs. autonomous agents parsing trade records, applying regulatory rules, routing approvals.

they started with an ai coding tool called Devin. then realized Claude's reasoning engine works the same way on rules-based financial tasks as it does on code.

the quiet part: Goldman's CEO already announced plans to constrain headcount growth during the shift. no mass layoffs yet. but "slower headcount growth" is how corporations say "we're replacing the next hire, not the current one."Image
now the SemiAnalysis numbers.

4% of GitHub public commits. Claude Code. right now. not projected. not theoretical. measured.

the tool has been live for roughly a year. it went from research preview to mass platform impact faster than almost any dev tool in history.

and that 20% projection isn't hype math. SemiAnalysis tracks autonomous task horizons doubling every 4-7 months. each doubling unlocks more complex work: snippet completion at 30 minutes, module refactoring at 4.8 hours, full audits at multi-day horizons.

the implication isn't "developers are getting faster." it's that the definition of "developer" is expanding to include anyone who can describe a problem clearly.Image
Read 11 tweets
Feb 11
MIT researchers taught an LLM to write its own training data, finetune itself, and improve without human intervention

the paper is called SEAL (Self-Adapting Language Models) and the core idea is genuinely clever

but "GPT-6 might be alive" is not what this paper says. not even close.

here's what it actually does:Image
the problem SEAL solves is real and important

every LLM you use today is frozen. it learned everything during training, and after deployment, it's done. new information? stuff it into the context window. new task? hope the prompt is good enough.

the weights never change. the model never truly learns from experience.

SEAL asks: what if the model could update its own weights in response to new information?Image
here's how SEAL actually works

instead of a human writing training data, the model generates its own. MIT calls these "self-edits." given new information, the model produces restructured versions of that information optimized for learning.

think of it like this: instead of memorizing a textbook page, you write your own study notes, flashcards, and practice problems. then you study from those.

the model does the same thing. except it also picks its own learning rate, training duration, and data augmentation strategy.
Read 11 tweets
Feb 5
meta, amazon, and deepmind researchers just published a comprehensive survey on "agentic reasoning" for llms.

29 authors. 74 pages. hundreds of citations.

i read the whole thing.

here's what they didn't put in the abstract: Image
the survey organizes everything beautifully:

> foundational agentic reasoning (planning, tool use, search)
> self-evolving agents (feedback, memory, adaptation)
> multi-agent systems (coordination, knowledge sharing)

it's a taxonomy for a field that works in papers.

production tells a different story.Image
the number they don't cite:

multi-agent llm systems fail 41-86.7% of the time in production.

not edge cases. not adversarial attacks. standard deployment across 7 SOTA frameworks.

berkeley researchers analyzed 1,642 execution traces and found 14 unique failure modes.

most failures? system design and coordination issues.
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(