elvis Profile picture
Feb 20, 2025 14 tweets 5 min read Read on X
NEW: Sakana AI introduces The AI CUDA Engineer.

It's an end-to-end agentic system that can produce highly optimized CUDA kernels.

This is wild! They used AI to discover ways to make AI run faster!

Let's break it down: Image
The Backstory

Sakana AI's mission is to build more advanced and efficient AI using AI.

Their previous work includes The AI Scientist, LLMs that produce more efficient methods to train LLMs, and automation of new AI foundation models.

And now they just launched The AI CUDA Engineer.Image
Why is this research a big deal?

Writing efficient CUDA kernels is challenging for humans.

The AI CUDA Engineer is an end-to-end agent built with the capabilities to automatically produce and optimize CUDA kernels more effectively. Image
What's up with CUDA?

Writing CUDA kernels can help achieve high-performing AI algorithms.

However, this requires GPU knowledge, and most AI algorithms today are written in a higher-level abstraction layer such as PyTorch. Image
An Agentic Pipeline

The agent translates PyTorch code into CUDA kernels (Stages 1 & 2), then applies evolutionary optimization (Stage 3) like crossover prompting, leading to an Innovation Archive (Stage 4) that reuses “stepping stone” kernels for further gains.

Components: Image
Stage 1: PyTorch Modules to Functions

The AI CUDA Engineer first converts a PyTorch nn.Module to Functional PyTorch using an LLM.

The code is also validated for correctness Image
Stage 2: Functional PyTorch to Working CUDA

The agent translated the functional PyTorch code to a working CUDA kernel. using an LLM.

The kernel is loaded and assessed for numerical correctness. Image
Stage 3: Evolutionary CUDA Runtime Optimization

They use an evolutionary optimization process (including advanced prompting strategies, standard LLMs, and reasoning models like o3-mini & DeepSeek-R1) to ensure only the best CUDA kernels are produced. Image
Stage 4: Innovative Archive

RAG is used to obtain high-performing kernels from related tasks; these are provided as context (stepping stones) to achieve further translation and performance gains.

Newly-discovered CUDA kernels can also be added to the archive in the process.
Kernel Runtime Speedups

The AI CUDA Engineer discovers CUDA kernels with speedups that reach as high as 10-100x faster than native and compiled kernels in PyTorch.

It can also convert entire ML architectures into optimized CUDA kernels. Image
Performance:

The AI CUDA Engineer robustly translates PyTorch Code to CUDA Kernels.

It achieves more than a 90% translation success rate! Image
Highlighted AI CUDA Engineer-Discovered Kernels

The AI CUDA Engineer can robustly improve CUDA runtime.

> Outperforms PyTorch Native runtimes for 81% out of 229 considered tasks
> 20% of all discovered CUDA kernels are at least twice as fast as their PyTorch implementations Image
The AI CUDA Engineer Archive

The team has made available an archive of more than 17000 verified CUDA kernels.

These can be used for downstream fine-tuning of LLMs.

There is also a website to explore verified CUDA kernels. Image
Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with elvis

elvis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @omarsar0

Jan 2
This worked better than I thought.

It's a slash command in Claude Code to write detailed specs.

The AskUserQuestion tool will drill you for even the smallest detail.

Great way to enhance vibe coding results.

Claude Code then creates a huge, detailed plan from it and executes it.Image
Usage: /spec-init <SPEC_DIR>

This is extremely useful for new projects, but it could be adapted easily to large features.

Or you can also start off with a SPEC of your own, as @trq212 shows here:

I just adopted it and built a slash command for reuse.
The spec-init slash command prompt, if you want to try it:

"Your task is to first help me build a spec for my new project in ARGUMENT.

Use the AskUserQuestion Tool to help build the spec in ARGUMENT by interviewing me and gathering requirements and details about the project implementation, UI & UX, tech stack, concerns, tradeoffs, etc.

Make sure questions are not obvious and probe deeper into the underlying needs and constraints.

Interview me continually and systematically until the spec is complete. Document all responses and insights to create a comprehensive and well-structured specification that serves as the foundation for the project."
Read 4 tweets
Dec 3, 2025
Lindy's Agent Builder is impressive!

It's one of the easiest ways to build powerful AI Agents.

Start with a prompt, iterate on tools, and end up with a working agent in minutes.

It doesn't get any easier than this.

Full walkthrough below with prompts, tips, and use case.
1️⃣ Start with a Prompt

You basically start with a simple prompt of what you want to build.

"Help me build a deep research agent that tracks the latest AI research papers on AI Agents."

That's it. You get your first working agent generated in minutes.
2️⃣ Agent Builder & Prompt Optimization

You can then iterate on your agent using the agent builder. Optimize prompts, add tools, and customize your agent as you see fit.

The agent prompt is optimized for you to fit your use case. That's very useful.
Read 6 tweets
Nov 24, 2025
This is insane! 🤯

Just built a new skill in Claude Code using Opus 4.5.

The skill uses Gemini 3 Pro (via API) for designing web pages.

Look at what it generated from one simple prompt.
If you have been designing websites with Claude Code, you already know how generic they turn out.

So I built a skill that uses Gemini 3 Pro to lead creative direction and generate designs. It is extremely good at this.

Opus 4.5 then integrates all that into our app. Image
The prompt I used: "I want to design the landing page for a new AI game. We want it to be futuristic and all that, and use animations as much as possible."

I will test with some other prompts and see how far I can push this. But the results are very exciting already.
Read 6 tweets
Nov 23, 2025
This is one of the most insane things Nano Banana Pro 🍌 can do.

It can reproduce figures with mind-blowing precision.

No competition in this regard!

Prompt: "Please reproduce this chart in high quality and fidelity and offer annotated labels to better understand it." Image
When I tried this for the first time, I didn't expect that this was possible.

The level of understanding this requires is what's remarkable about it all.

The levels of personalization this unlocks are also impressive.

"Can you convert it into a cartoonish version?" Image
Just look at this 🤯

"Can you create a delightful cartoonish version of this table. And please put cute colors and icons along with interesting annotations to make it more readable." Image
Read 6 tweets
Nov 22, 2025
It's finally ready for you all to try!

Have fun generating interesting insights from AI papers with Nano Banana Pro 🍌.

(bookmark it)

I find this to be a fun and interesting way to explore with Nano Banana Pro, as I can just select a part of the paper and ask away.

Try remixing figures, reproducing charts, annotating equations, explaining math, and much more.

I am polishing it some more and have other ideas, but let me know if you have feedback in the meantime.

Works better on Desktop.

…dair-ai-181664986325.us-west1.run.app
You can try it by downloading a paper from arXiv or uploading a book or any technical document.
If you don't have a PDF to try, just click on one of the example papers provided: Image
Read 9 tweets
Nov 10, 2025
This is a wild use case!

I used Gamma + n8n to automatically generate a complete presentation on AI Agents research.

In just minutes!

It combines web search (for research), GPT-5 (narrative), and Gamma (for slide content generation).

Full workflow breakdown below 👇
1/ THE PROBLEM:

Creating visual content is time-consuming. Research takes hours. Writing requires deep focus. Design demands specialized skills.

What if AI could handle the entire pipeline?
2/ THE SOLUTION:

An n8n workflow that orchestrates Tavily for web research, GPT-5 for storytelling, Gamma for visual generation, and Google Sheets for tracking.

You provide a topic and audience. The system outputs a LinkedIn-ready carousel.
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(