Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

elvis

@omarsar0

Jul 10, 2025 • 21 tweets • 6 min read • Read on X

Scrolly

BREAKING: xAI announces Grok 4

"It can reason at a superhuman level!"

Here is everything you need to know:

Elon claims that Grok 4 is smarter than almost all grad students in all disciplines simultaneously.

100x more training than Grok 2.

10x more compute on RL than any of the models out there.

Performance on Humanity's Last Exam

Elon: "Grok 4 is post-grad level in everything!"

Scaling HLE - Training

More compute, higher intelligence.

(no tools)

With native tool calling, Grok 4 increases the performance significantly.

Look at those curves!

It's important to give AI the right tools. The scaling is clear. Crazy!

Reliable signals are key to making RL work.

There is still the challenge of data.

Elon: "Ultimate reasoning test is AI operating in reality."

Scaling test-time compute

More than 50% of the text-only subset of the HLE problems are solved!

The curves keep getting more ridiculous.

Grok 4 is the single-agent version.

Grok 4 Heavy is the multi-agent version.

Multi-agent systems are no joke!

Grok 4 is being used to predict the World Series champions for this year.

These are the interesting tasks that reasoning models need to be tested on. On actual real-world events.

A visualization of two black holes colliding.

Grok 4 uses all kinds of references like papers, reads PDFs, reasons about the details of the simulation, and what data to use.

The example shows a summary of the timeline/changes and score announcements in the HLE.

That's pretty cool!

Multi-modal performance

Grok 4 Heavy performance is higher than Grok 4, but needs to be improved further. It's one of the weaknesses, according to the team.

Performance on Reasoning benchmarks.

Perfect score on AIME25!

Leaps are crazy compared to the last best model on these tasks.

Where to test the models.

Available as SuperGrok Heavy tier.

$30/m for Super Grok
$300/m for SuperGrok Heavy.

Voice updates included, too!

Grok feels snappier and is designed to be more natural.

- 2x faster
- 5 voices
- 10x daily user seconds

ARC-AGI

Grok 4 on ARC-AGI v2 (private subset)

It breaks the 10% barrier (15.9%).

2x the second place, which is the Claude Opus 4 model.

Grok 4 on Vending Bench

Grok 4 gets the #1 spot.

Double the net worth of Claude Opus 4.

Grok 4 models are available via the xAI API.

256K context window.

Real-time data search.

Grok 4 for Gaming!

Video understanding is an area the team is improving, so it will get better.

What is next?

Smart and fast will be the focus.

Coding models are also a big focus.

More capable multi-modal agents are coming too.

Video generation models are also on the horizon.

@elonmusk and the @xai team really cooked with Grok 4. All very exciting to see focus on AI for reality, truth-seeking, and unlocking multi-modal agents next.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @omarsar0

elvis

@omarsar0

Jan 2

This worked better than I thought.

It's a slash command in Claude Code to write detailed specs.

The AskUserQuestion tool will drill you for even the smallest detail.

Great way to enhance vibe coding results.

Claude Code then creates a huge, detailed plan from it and executes it.

https://x.com/trq212/status/2005315275026260309?s=20

Usage: /spec-init <SPEC_DIR>

This is extremely useful for new projects, but it could be adapted easily to large features.

Or you can also start off with a SPEC of your own, as @trq212 shows here:

I just adopted it and built a slash command for reuse.

https://x.com/trq212/status/2005315275026260309?s=20

The spec-init slash command prompt, if you want to try it:

"Your task is to first help me build a spec for my new project in ARGUMENT.

Use the AskUserQuestion Tool to help build the spec in ARGUMENT by interviewing me and gathering requirements and details about the project implementation, UI & UX, tech stack, concerns, tradeoffs, etc.

Make sure questions are not obvious and probe deeper into the underlying needs and constraints.

Interview me continually and systematically until the spec is complete. Document all responses and insights to create a comprehensive and well-structured specification that serves as the foundation for the project."

Read 4 tweets

elvis

@omarsar0

Dec 3, 2025

Lindy's Agent Builder is impressive!

It's one of the easiest ways to build powerful AI Agents.

Start with a prompt, iterate on tools, and end up with a working agent in minutes.

It doesn't get any easier than this.

Full walkthrough below with prompts, tips, and use case.

1️⃣ Start with a Prompt

You basically start with a simple prompt of what you want to build.

"Help me build a deep research agent that tracks the latest AI research papers on AI Agents."

That's it. You get your first working agent generated in minutes.

2️⃣ Agent Builder & Prompt Optimization

You can then iterate on your agent using the agent builder. Optimize prompts, add tools, and customize your agent as you see fit.

The agent prompt is optimized for you to fit your use case. That's very useful.

Read 6 tweets

elvis

@omarsar0

Nov 24, 2025

This is insane! 🤯

Just built a new skill in Claude Code using Opus 4.5.

The skill uses Gemini 3 Pro (via API) for designing web pages.

Look at what it generated from one simple prompt.

If you have been designing websites with Claude Code, you already know how generic they turn out.

So I built a skill that uses Gemini 3 Pro to lead creative direction and generate designs. It is extremely good at this.

Opus 4.5 then integrates all that into our app.

The prompt I used: "I want to design the landing page for a new AI game. We want it to be futuristic and all that, and use animations as much as possible."

I will test with some other prompts and see how far I can push this. But the results are very exciting already.

Read 6 tweets

elvis

@omarsar0

Nov 23, 2025

This is one of the most insane things Nano Banana Pro 🍌 can do.

It can reproduce figures with mind-blowing precision.

No competition in this regard!

Prompt: "Please reproduce this chart in high quality and fidelity and offer annotated labels to better understand it."

When I tried this for the first time, I didn't expect that this was possible.

The level of understanding this requires is what's remarkable about it all.

The levels of personalization this unlocks are also impressive.

"Can you convert it into a cartoonish version?"

Just look at this 🤯

"Can you create a delightful cartoonish version of this table. And please put cute colors and icons along with interesting annotations to make it more readable."

Read 6 tweets

elvis

@omarsar0

Nov 22, 2025

It's finally ready for you all to try!

Have fun generating interesting insights from AI papers with Nano Banana Pro 🍌.

(bookmark it)

I find this to be a fun and interesting way to explore with Nano Banana Pro, as I can just select a part of the paper and ask away.

Try remixing figures, reproducing charts, annotating equations, explaining math, and much more.

I am polishing it some more and have other ideas, but let me know if you have feedback in the meantime.

Works better on Desktop.

…dair-ai-181664986325.us-west1.run.app

You can try it by downloading a paper from arXiv or uploading a book or any technical document.

If you don't have a PDF to try, just click on one of the example papers provided:

Read 9 tweets

elvis

@omarsar0

Nov 10, 2025

This is a wild use case!

I used Gamma + n8n to automatically generate a complete presentation on AI Agents research.

In just minutes!

It combines web search (for research), GPT-5 (narrative), and Gamma (for slide content generation).

Full workflow breakdown below 👇

1/ THE PROBLEM:

Creating visual content is time-consuming. Research takes hours. Writing requires deep focus. Design demands specialized skills.

What if AI could handle the entire pipeline?

2/ THE SOLUTION:

An n8n workflow that orchestrates Tavily for web research, GPT-5 for storytelling, Gamma for visual generation, and Google Sheets for tracking.

You provide a topic and audience. The system outputs a LinkedIn-ready carousel.

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

elvis

Try unrolling a thread yourself!

More from @omarsar0

elvis

elvis

elvis

elvis

elvis

elvis

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!