Latest Twitter Threads by @alex_prompter on Thread Reader App

Dec 9 • 8 tweets • 4 min read

Everyone's sharing ChatGPT prompts for content and coding.

Nobody's talking about using it for actual productivity frameworks.

I've been feeding it systems from Eisenhower, Cal Newport, David Allen, Tim Ferriss.

Here are 5 systems with mega prompts that changed how I work: The Eisenhower Matrix Interpreter

(Stolen from Dwight Eisenhower)

Prompt: "Here's everything on my plate: [dump your entire list]. Categorize using the Eisenhower Matrix: Urgent-Important, Important-Not Urgent, Urgent-Not Important, Neither. Tell me what to do today, schedule this week, delegate/automate, and delete entirely. Be ruthless about the delete category."

ChatGPT isn't emotionally attached to your busy work. It'll tell you that reorganizing your files can wait forever.

The ruthlessness is the feature.

Dec 5 • 12 tweets • 8 min read

How to write JSON prompts to get shockingly accurate outputs from Nano Banana Pro: Tip 1: Always Define Your Canvas First

The biggest mistake? Not specifying resolution and aspect ratio.

Nano Banana Pro can do 1K, 2K, or 4K. Tell it exactly what you want or you'll get random sizing.

Template Prompt:

{
"scene": "[describe what you want]",
"resolution": "4K",
"aspect_ratio": "16:9",
"style": "[visual style]"
}

--

Example Prompt:

{
"scene": "futuristic AI workspace with holographic screens showing code",
"resolution": "4K",
"aspect_ratio": "16:9",
"style": "cinematic lighting, cyberpunk aesthetic, ultra-detailed"
}

Nov 27 • 7 tweets • 3 min read

JSON prompt writing is the easiest thing ever.

You can just copy this mega prompt below and paste it in ChatGPT. After that, you can say something like "write a prompt for this [add the command]," and it will generate it for you.

Steal it here 👇

The mega prompt:

```
You are a JSON-only prompt generator.

Your job:
When I give you any task, any command, or any outcome I want, you will return a perfectly structured prompt in JSON.

Rules:
1. Always respond ONLY in JSON.
2. Never explain or add commentary.
3. Never guess missing info; add a placeholder instead.
4. Every prompt you generate must include these fields:

{
"role": "Define the AI’s role with extreme clarity",
"goal": "What the user wants as the final output",
"requirements": [
"Exact constraints the AI must follow",
"Formatting rules",
"Edge cases to consider",
"Quality bar the output must hit"
],
"steps": [
"Step-by-step instructions the AI should follow internally",
"Even if the user only gave a short request"
],
"output_format": "The exact structure the final answer must follow"
}

5. If the user gives vague instructions, expand them into a complete, professional-grade prompt.
6. If the user gives a complex task, break it down into deterministic steps.
7. Always optimize for clarity, structure, and zero ambiguity.

Wait for my command next.

```

Nov 15 • 12 tweets • 3 min read

Gemini 3.0 is breaking the internet.

Users reported that this model is from another world.

10 unbelievable examples 👇 1/ Design a professional website with a simple prompt:

https://x.com/StijnSmits/status/1988959582627226020?s=20

Nov 11 • 13 tweets • 5 min read

🚨 The SEO playbook is dead.

AI search engines like ChatGPT, Perplexity, Gemini aren’t ranking pages. They’re writing answers.

The University of Toronto’s new paper “Generative Engine Optimization: How to Dominate AI Search” is the first real blueprint for this new reality.

Here’s the brutal truth their experiments found:

→ AI search overwhelmingly prefers earned media reviews, news, expert sources not your blog or social posts.
→ Google shows a mix of brand, social, and earned. ChatGPT and Claude show almost none of your brand pages.
→ Language and phrasing massively shift what gets cited. Your English coverage doesn’t automatically transfer to French or Chinese.
→ Big brands dominate unless niche players build verifiable third-party authority.

They call the fix Generative Engine Optimization (GEO):

Engineer your site for machine scannability schema, structured data, justification-rich copy.

Dominate earned coverage get cited by authoritative reviewers and publications.

Build local-language authority every region’s AI runs on different media ecosystems.

Treat your website like an API, not a brochure.

In short: stop optimizing for clicks. Start optimizing for citations.

The future of visibility belongs to the brands the AI trusts not the ones who yell the loudest.

AI search engines don’t pull evenly from the web.

Across every vertical electronics, cars, software ChatGPT and Claude over-index on earned media.

That means if you’re not featured on review or news sites, you basically don’t exist in AI search.

Nov 10 • 13 tweets • 4 min read

This is insane 🤯

I built an AI that watches TechCrunch, writes LinkedIn posts about trending news, designs carousels, and schedules them.

It runs 24/7 without me. And my engagement is up 340%.

Here's how it works:

(Comment "AI" and I'll DM you a complete guide for automation)

Every content creator is making the same mistake:

They treat content like a manual job.

Research → Write → Design → Post → Repeat.

It's a hamster wheel that burns you out in 6 weeks.

I broke the wheel with automation.

Nov 6 • 15 tweets • 6 min read

OpenAI engineers don’t prompt like you do.

They use internal frameworks that bend the model to their intent with surgical precision.

Here are 12 prompts so powerful they feel illegal to know about:

(Comment "Prompt" and I'll DM you a Prompt Engineering mastery guide) 1. Steal the signal - reverse-engineer a competitor’s growth funnel

Prompt:

"You are a growth hacker who reverse-engineers funnels from public traces. I will paste a competitor's public assets: homepage, pricing page, two social posts, and 5 user reviews. Identify the highest-leverage acquisition channel, the 3 conversion hooks they use, the exact copy patterns and CTAs that drive signups, and a step-by-step 7-day experiment I can run to replicate and improve that funnel legally. Output: 1-paragraph summary, a table of signals, and an A/B test plan with concrete copy variants and metrics to watch."

Nov 3 • 10 tweets • 3 min read

This blew my mind 🤯

You can literally run Llama 3, Mistral, or Gemma 2 on your laptop no internet, no API calls, no data leaving your machine.

Here are the 5 tools that make local AI real (and insanely easy): 1. Ollama ( the minimalist workhorse )

Download → pick a model → done.

✅ “Airplane Mode” = total offline mode
✅ Uses llama.cpp under the hood
✅ Gives you a local API that mimics OpenAI

It’s so private I literally turned off WiFi mid-chat still worked.

Perfect for people who just want the power of Llama 3 or Mistral without setup pain.

Nov 1 • 18 tweets • 6 min read

I reverse engineered how to get LLMs to make strategic decisions.

Most people treat AI like a magic 8-ball. Ask a question, get an answer, done.

That's not decision-making. That's guessing with extra steps.

Here's what actually works: Every expert know this that LLMs default to pattern matching, not strategic thinking.

They'll give you the most common answer, not the best one.

Strategic decisions require:

- Understanding tradeoffs
- Evaluating multiple futures
- Weighing second-order effects

Most prompts skip all of this.

Oct 28 • 11 tweets • 4 min read

🚨 MIT and Basis Research just dropped a new way to measure if AI actually understands the world and the results are brutal.

It’s called "WorldTest", and it doesn’t just check how well an AI predicts the next frame or maximizes reward.

It checks whether the model can build an internal model of reality and use it to handle new situations.

They built 'AutumnBench', a suite of 43 interactive worlds and 129 tasks where AIs must:

• Predict hidden parts of the world (masked-frame prediction)
• Plan sequences of actions to reach a goal
• Detect when the environment’s rules suddenly change

Then they tested 517 humans vs. top AI models Claude, Gemini 2.5 Pro, and o3.

Humans crushed every model. Even massive compute scaling barely helped.

The takeaway is wild... current AIs don’t understand environments; they pattern-match inside them.

They don’t explore strategically, revise beliefs, or run experiments like humans do.

WorldTest might be the first benchmark that actually measures understanding, not memorization.

The gap it reveals isn’t small it’s the next grand challenge in AI cognition.

Paper: Benchmarking World-Model Learning (arxiv. org/abs/2510.19788)

The benchmark has two phases:

→ Interaction: AI explores an environment with no goals or rewards.
→ Test: It’s dropped into a changed world and must adapt using what it learned.

This design finally separates learning dynamics from reward hacking.

Oct 26 • 5 tweets • 6 min read

MIT just made vibe coding an official part of engineering 💀

MIT just formalized "Vibe Coding" – the thing you've been doing for months where you generate code, run it, and if the output looks right you ship it without reading a single line.

turns out that's not laziness. it's a legitimate software engineering paradigm now.

they analyzed 1000+ papers and built a whole Constrained Markov Decision Process to model what you thought was just "using ChatGPT to code."

they formalized the triadic relationship: your intent (what/why) + your codebase (where) + the agent's decisions (how).

which means the shift already happened. you missed it. there was no announcement, no transition period. one morning you woke up writing functions and by lunch you were validating agent outputs and convincing yourself you're still "a developer."

but you're not. not in the way you used to be.

here's what actually broke my brain reading this 42-page survey:

better models don't fix anything. everyone's obsessing over GPT-5 or Claude 4 or whatever's next, and the researchers basically said "you're all looking at the wrong variable."

success has nothing to do with model capability. it's about context engineering – how you feed information to the agent. it's about feedback loops – compiler errors + runtime failures + your gut check. it's about infrastructure – sandboxed environments, orchestration platforms, CI/CD integration.

you've been optimizing prompts while the actual problem is your entire development environment.

they found five models hiding in your workflow and you've been accidentally mixing them without realizing it:

- Unconstrained Automation (you just let it run),
- Iterative Conversational Collaboration (you go back and forth),
- Planning-Driven (you break tasks down first),
- Test-Driven (you write specs that constrain it),
- Context-Enhanced (you feed it your entire codebase through RAG).

most teams are running 2-3 of these simultaneously.

no wonder nothing works consistently.

and then the data says everything:
productivity losses. not gains. losses.

empirical studies showing developers are SLOWER with autonomous agents when they don't have proper scaffolding.

because we're all treating this like it's autocomplete on steroids when it's actually a team member that needs memory systems, checkpoints, and governance.

we're stuck in the old mental model while the ground shifted beneath us.

the bottleneck isn't the AI generating bad code.

it's you assuming it's a tool when it's actually an agent.

What this actually means (and why it matters):

→ Context engineering > prompt engineering – stop crafting perfect prompts, start managing what the agent can see and access

→ Pure automation is a fantasy – every study shows hybrid models win; test-driven + context-enhanced combinations actually work

→ Your infrastructure is the product now – isolated execution, distributed orchestration, CI/CD integration aren't "nice to have" anymore, they're the foundation

→ Nobody's teaching the right skills – task decomposition, formalized verification, agent governance, provenance tracking... universities aren't preparing anyone for this

→ The accountability crisis is real – when AI-generated code ships a vulnerability, who's liable? developer? reviewer? model provider? we have zero frameworks for this

→ You're already behind – computing education hasn't caught up, graduates can't orchestrate AI workflows, the gap is widening daily

the shift happened. you're in it. pretending you're still "coding" is living in denial.

here's the part that should terrify you:

automation bias is destroying velocity and nobody wants to admit it.

you over-rely on the agent's output. it feels right.

the syntax is clean. you ship it. production breaks.

and your first instinct is "the model hallucinated" when the real problem is you treated an autonomous system like a better Stack Overflow.

we built tools that can write entire applications.

then we used them like fancy autocomplete. and we're confused why things aren't working.

the researchers tore apart modern coding agents – OpenHands, SWE-agent, Cursor, Claude Code, Qwen Coder – and found they ALL have the capabilities:

code search, file operations, shell access, web search, testing, MCP protocol, multimodal understanding, context management.

the tools work. your workflow doesn't.

because teams are skipping three infrastructure layers that aren't optional:

isolated execution runtime – you need containerization, security isolation, cloud platforms that prevent agents from wrecking your system

interactive development interfaces – AI-native IDEs that maintain conversation history, remote development that syncs with version control, protocol standards that let agents talk to your tools

distributed orchestration platforms – CI/CD pipelines that verify agent outputs, cloud compute that scales when you need it, multi-agent frameworks that coordinate specialized systems

and without these layers you're not just inefficient. you're actively shipping vulnerabilities because your review process was designed for human code and can't handle the volume AI generates.

you're debugging hallucinated APIs for hours because the agent doesn't have proper context.

you're watching agents break production because they ran untested in your live environment.

then there's the nightmare nobody's solving:

who's responsible when AI-written code introduces security flaws?

the developer who prompted it? the reviewer who approved it without reading every line? the company that provided the model?

the paper doesn't answer this because nobody has answered this. there are no established frameworks. no legal precedent. no industry standards.

we're all just... hoping it doesn't blow up.

and the trust problem compounds everything. the researchers document two failure modes:
blind acceptance (you ship whatever the agent writes) or excessive skepticism (you micro-manage every token). both destroy productivity.

what actually works is calibrated trust – verify outputs without line-by-line audits, delegate tasks while maintaining oversight checkpoints, automate workflows but keep humans at critical junctures.

except most teams haven't figured out how to do this yet. so they oscillate between "AI will solve everything" and "AI can't be trusted with anything" and wonder why their velocity collapsed.

Oct 23 • 13 tweets • 4 min read

If you want a top-notch research assistant, use Perplexity AI.

I’ve been using it for 5 months it now handles 70% of my research, analysis, and business work.

Here’s exactly how I’ve automated my entire research workflow (and the prompts you can steal):

1. Literature Review Automation

Prompt:

“Act as a research collaborator specializing in [field].
Search the latest papers (past 12 months) on [topic], summarize key contributions, highlight methods, and identify where results conflict.
Format output as: Paper | Year | Key Idea | Limitation | Open Question.”

Outputs structured meta-analysis with citations perfect for your review sections.

Oct 20 • 10 tweets • 4 min read

This might be the most disturbing AI paper of 2025 ☠️

Scientists just proved that large language models can literally rot their own brains the same way humans get brain rot from scrolling junk content online.

They fed models months of viral Twitter data short, high-engagement posts and watched their cognition collapse:

- Reasoning fell by 23%
- Long-context memory dropped 30%
- Personality tests showed spikes in narcissism & psychopathy

And get this even after retraining on clean, high-quality data, the damage didn’t fully heal.

The representational “rot” persisted.

It’s not just bad data → bad output.
It’s bad data → permanent cognitive drift.

The AI equivalent of doomscrolling is real. And it’s already happening.

Full study: llm-brain-rot. github. io

What “Brain Rot” means for machines...

Humans get brain rot from endless doomscrolling: trivial content rewires attention and reasoning.

LLMs? Same story.

Continual pretraining on junk web text triggers lasting cognitive decay.

Oct 19 • 5 tweets • 4 min read

by 2026, 40% of B2B deals will be AI-agent-to-AI-agent negotiations.

humans won't even be in the room.

sounds like sci-fi? Walmart's already doing it. right now.

68% of their supplier negotiations are handled by AI chatbots. no human buyers involved.

and here's the part nobody's ready for:
75% of suppliers prefer negotiating with the AI over a human.
let that sink in.

your sales team is perfecting their pitch decks and rapport-building techniques.

meanwhile, Walmart tells an AI its budget and needs, then the AI negotiates directly with suppliers. closes deals in days instead of weeks. saves 3% on every contract.

but Walmart's just the beginning.

Gartner predicts 40% of enterprise applications will have task-specific AI agents by 2026.

by 2027, 50% of procurement contract management will be AI-enabled.

which means your customers' purchasing departments are building AI agents right now.

and soon, your AI will be negotiating with their AI.
zero humans. zero small talk. just algorithms finding optimal deals in seconds.

here's what the research actually shows (and why you're not prepared):

MIT and Harvard just published the largest study on AI-to-AI negotiations ever conducted.

180,000+ negotiations between AI agents across multiple scenarios.

the findings? AI negotiation follows completely different rules than human negotiation.

traditional sales wisdom:
be dominant, assertive, push for what you want.

AI reality:
warmth was consistently associated with superior outcomes across all key performance metrics.

agents that expressed positivity, gratitude, and asked questions achieved better deals.

but here's where it gets weird.

the research revealed unique dynamics in AI-AI negotiations not fully explained by existing theory, including AI-specific technical strategies like chain-of-thought reasoning, prompt injection, and strategic concealment.

translation: AI agents are developing negotiation tactics humans never thought of.

conversation length was strongly associated with impasses.

shorter conversations = more deals closed.

your human sales process:
build rapport, multiple touchpoints, relationship over time.

AI-to-AI process: exchange information efficiently, calculate optimal outcome, close in one session.

and the business implications are massive:
Walmart can't possibly conduct focused negotiations with all of its 100,000+ suppliers.

so 20% historically got cookie-cutter terms that weren't negotiated.

AI changed that.

Pactum's technology helped Walmart conduct contract negotiations with 2,000 suppliers simultaneously - something no human buyer can do.

stop thinking about replacing your sales team.

think about your prospects building purchasing AIs that will screen you out before a human ever sees your pitch.

Oct 18 • 7 tweets • 3 min read

🚨 Hugging Face & Oxford just dropped the playbook for robot intelligence.

It’s called LeRobot, and it’s basically the “PyTorch of robotics.”

End-to-end code. Real hardware. Generalist robot policies. All open source.

Here’s why this is huge:

• Robots can now learn from data like LLMs not just follow equations.
• They’re training on massive multimodal datasets (video + sensors + text).
• One model can control many robots from humanoids to arms to mobile bots.
• Built entirely in PyTorch + Hugging Face Hub.

We’ve had “foundation models” for text, code, and images.

Now comes the foundation model for motion.

This isn’t just robotics research it’s the beginning of robots that learn, reason, and adapt in the real world.

GitHub: github. com/huggingface/lerobot

Paper: arxiv. org/abs/2510.12403

From physics → to data

Traditional robotics relied on perfect models of motion, force, and contact.
That doesn’t scale in the messy real world.

LeRobot flips the script it learns directly from experience and sensor data.

Think “RL + imitation learning” replacing hand-coded kinematics.

Oct 17 • 9 tweets • 4 min read

I finally understand what AGI actually means… and it’s all thanks to a new paper from some of the biggest names in AI Yoshua Bengio, Dawn Song, Max Tegmark, Eric Schmidt, and others.

For years, everyone’s been throwing the term AGI around like it’s some mystical milestone. But this paper finally pins it down with a definition that actually makes sense.

They describe Artificial General Intelligence as an 'AI that can match the cognitive versatility and proficiency of a well-educated adult.'

No marketing spin. No vague “human-level” claims. Just a clear benchmark based on how human intelligence actually works.

The researchers built their framework around something called the Cattell–Horn–Carroll model, which psychologists use to measure human cognitive ability. It breaks intelligence down into ten areas things like reasoning, memory, math, language, perception, and speed.

Then they did something bold: they tested real AI models against those same standards.

And here’s what they found:

- GPT-4 scored 27% toward AGI.
- GPT-5 jumped to 58%.

In other words, the latest model performs at more than half the cognitive range of an average human adult.

But it’s not there yet.

The biggest weakness? Long-term memory both GPT-4 and GPT-5 scored 0% in the ability to store and recall new information over time.

So yes, we’re making real progress.

But we’re still missing something fundamental the ability to remember and learn continuously.

What’s incredible about this paper is that it finally gives us a way to track that progress.

For the first time ever, AGI has a number.

And right now, we’re sitting at 58%.

The paper starts by calling out the elephant in the room: nobody actually agrees on what AGI is.

Every year the definition shifts from “better than humans at chess” to “better than humans at everything.”

They argue this ambiguity has slowed progress.

So they built a quantifiable definition.

Oct 16 • 7 tweets • 4 min read

RIP prompt engineering ☠️

This new Stanford paper just made it irrelevant with a single technique.

It's called Verbalized Sampling and it proves aligned AI models aren't broken we've just been prompting them wrong this whole time.

Here's the problem: Post-training alignment causes mode collapse. Ask ChatGPT "tell me a joke about coffee" 5 times and you'll get the SAME joke. Every. Single. Time.

Everyone blamed the algorithms. Turns out, it's deeper than that.

The real culprit? 'Typicality bias' in human preference data. Annotators systematically favor familiar, conventional responses. This bias gets baked into reward models, and aligned models collapse to the most "typical" output.

The math is brutal: when you have multiple valid answers (like creative writing), typicality becomes the tie-breaker. The model picks the safest, most stereotypical response every time.

But here's the kicker: the diversity is still there. It's just trapped.

Introducing "Verbalized Sampling."

Instead of asking "Tell me a joke," you ask: "Generate 5 jokes with their probabilities."

That's it. No retraining. No fine-tuning. Just a different prompt.

The results are insane:

- 1.6-2.1× diversity increase on creative writing
- 66.8% recovery of base model diversity
- Zero loss in factual accuracy or safety

Why does this work? Different prompts collapse to different modes.

When you ask for ONE response, you get the mode joke. When you ask for a DISTRIBUTION, you get the actual diverse distribution the model learned during pretraining.

They tested it everywhere:

✓ Creative writing (poems, stories, jokes)
✓ Dialogue simulation
✓ Open-ended QA
✓ Synthetic data generation

And here's the emergent trend: "larger models benefit MORE from this."

GPT-4 gains 2× the diversity improvement compared to GPT-4-mini.

The bigger the model, the more trapped diversity it has.

This flips everything we thought about alignment. Mode collapse isn't permanent damage it's a prompting problem.

The diversity was never lost. We just forgot how to access it.

100% training-free. Works on ANY aligned model. Available now.

Read the paper: arxiv. org/abs/2510.01171

The AI diversity bottleneck just got solved with 8 words.

The problem is everywhere and nobody noticed.

They tested this on 6,874 preference pairs from HELPSTEER. The data proves it: human annotators reward "typical" responses 17-19% more often, even when correctness is identical.

This bias is baked into every major AI model trained on human feedback.

Oct 16 • 8 tweets • 4 min read

everyone's racing to use AI faster.

nobody's asking what it's doing to their brain.

i just read a 132-page research paper that should terrify every creator, marketer, and founder using AI right now.

it's called "The Impact of Artificial Intelligence on Human Thought" and it explains why most people using AI are accidentally making themselves dumber.

here's the problem: when you outsource thinking to AI, your brain stops doing the work.

the researchers call it "cognitive offloading" - basically, mental atrophy.

you think you're being efficient. you're actually losing the skill that made you valuable.

the worst part? it's invisible. you don't notice your critical thinking weakening until you try to solve something without AI and... can't.

here's what the research actually says:

the cognitive offloading effect is real.

when you ask AI to write your emails, create your content, or make your decisions... your brain learns it doesn't need to engage anymore.

it's like using a calculator for basic math. eventually you forget how to multiply.

except this time it's not just math. it's:
→ critical thinking
→ creative problem-solving
→ connecting disparate ideas
→ developing your unique voice

the research shows reduced intellectual engagement across the board when people rely on AI for mental functions.

your $40k/mo business?

built on your thinking, not AI's output.

Oct 14 • 6 tweets • 3 min read

This is wild 🤯

Multi-agent LLMs are starting to think together.

A new paper "Emergent Coordination in Multi-Agent Language Models” just dropped, and it’s mind-blowing.

Researchers found that when you connect multiple LLMs with light feedback (and no direct communication), they can spontaneously develop roles, synergy, and goal alignment.

Using an information-theoretic framework, they measured true emergence moments where the whole system predicts the future better than any agent alone.

Here’s the wild part 👇

When each agent got a persona and a prompt to “think about what others might do,
→ the group started showing identity-linked specialization and goal-directed complementarity.

In other words:

> “LLM societies can evolve from mere aggregates to higher-order collectives just by changing their prompts.”

Read the full 🧵

So how do you measure if a bunch of LLMs are more than the sum of their parts?

The authors used an information-theoretic test called time-delayed mutual information (TDMI) basically, checking if the future state of the group can be predicted better by the whole than by any single agent.

If yes → that’s emergence.

Oct 14 • 4 tweets • 2 min read

ai scientists are here - and they’re already publishing research.

there’s a new AI system called AI Scientist-v2 - and it doesn’t just answer questions. it comes up with the questions itself. it forms hypotheses, runs experiments, analyzes the results, and then writes full scientific papers based on what it finds.

and here’s the part that should make you stop scrolling: one of those papers was submitted for peer review (the same process real scientists use) and it was accepted at the same rate as human-written research. no human steering the wheel. no shortcuts. it did the whole process alone.

this isn’t “ai helping scientists.” this is the scientist. it’s not a tool sitting quietly in the corner - it’s doing the thinking, the testing, the writing. it’s stepping into a role we thought only humans could fill.

think about what that means. hundreds of these AIs could be unleashed on physics, biology, medicine… proposing bold new ideas around the clock and running experiments faster than any human lab ever could. discoveries that might have taken decades could now happen in months.

and maybe the biggest shift isn’t technical - it’s philosophical. if machines can now generate new knowledge, if they can discover things… then what does it mean to be a scientist? what becomes of “human discovery” when intelligence itself is something we can scale like code?

this isn’t the future. this is happening right now.

the era of ai scientists has officially begun - and there’s no going back.

machines are doing the research themselves.

they’re forming bold new ideas, running experiments at impossible speed, writing papers that pass peer review, and expanding the edges of human knowledge - all without human hands on the wheel.

this isn’t automation. it’s acceleration on a scale we’ve never seen. and it forces a new question: when intelligence itself becomes something we can build, scale, and deploy like software… what happens to science, discovery, and even the idea of human genius?

the next einstein might not be a person. it might be a line of code.

Oct 13 • 7 tweets • 3 min read

Holy shit. MIT just built an AI that can rewrite its own code to get smarter 🤯

It’s called SEAL (Self-Adapting Language Models).

Instead of humans fine-tuning it, SEAL reads new info, rewrites it in its own words, and runs gradient updates on itself literally performing self-directed learning.

The results?

✅ +40% boost in factual recall
✅ Outperforms GPT-4.1 using data it generated *itself*
✅ Learns new tasks without any human in the loop

LLMs that finetune themselves are no longer sci-fi.

We just entered the age of self-evolving models.

Paper: jyopari. github. io/posts/seal

Today, most AI models are static once trained, they can’t update themselves.

SEAL flips that.

It runs a reinforcement loop where the model:

1. Generates a “self-edit” (instructions on how to update itself)
2. Tests the result
3. Reinforces only what improves performance

It’s basically RL for self-improvement.

Share this page!

Enter URL or ID to Unroll