Latest Twitter Threads by @RobertTLange on Thread Reader App

Feb 20 • 6 tweets • 5 min read

🎉 Stoked to share The AI CUDA Engineer 👷 - our end-to-end approach for automating the design and optimization of CUDA Kernels using agentic systems.

Blog 📰: sakana.ai/ai-cuda-engine…
Paper 📜:pub.sakana.ai/ai-cuda-engine…
WebUI 📈: pub.sakana.ai/ai-cuda-engine…
Dataset 💽: huggingface.co/datasets/Sakan…

Awesome team work done with @_Aaditya_Prasad, @Floating_Martin, @maxencefaldor, @yujin_tang, @hardmaru 🤗

🔎 We design an agentic system capable of first translating PyTorch modules into working CUDA kernels 💬. Afterwards, we use evolutionary LLM-driven code optimization 🧬 to iteratively discover ‘stepping stone’ kernels and improve runtime speedups.

We leverage a combination of the following agentic ingredients & innovations:

1️⃣ Model and temperature ensembling to incentivize diverse kernel proposals
2️⃣ Least-to-most sorted prompting with profiling data for in-context learning
3️⃣ Crossover prompting based on kernel text-embedding clusters to combine parents
4️⃣ An innovation archive of stepping stone kernels RAG-retrieved to seed the context

Aug 29, 2024 • 7 tweets • 5 min read

📢 Two weeks since we released The AI Scientist 🧑‍🔬!

We want to take the time to summarize a lot of the discussions we’ve been having with the community, and give some hints about what we are working on! 🫶

We are beyond grateful for all your feedback and the community debate our work has sparked ✨

In public discussions of this paper, we frequently refer to it as the “Will Smith eating spaghetti” moment for AI Science 🍝.

While there are often minor errors in the outputs of the papers, we believe, like Will Smith’s fingernails being the wrong size originally, these problems will only improve - with newer models, more compute, and better methods.

This is the worst the AI Scientist will ever be! 📈

Aug 13, 2024 • 9 tweets • 4 min read

🎉 Stoked to share The AI-Scientist 🧑‍🔬 - our end-to-end approach for conducting research with LLMs including ideation, coding, experiment execution, paper write-up & reviewing.

Blog 📰:
Paper 📜:
Code 💻:

Work led together with @_chris_lu_, @cong_ml and jointly supervised by @j_foerst, @jeffclune, @hardmaru 🤗sakana.ai/ai-scientist/
arxiv.org/abs/2408.06292
github.com/SakanaAI/AI-Sc…

Given a starting code template 📝 we ask an LLM to propose new research directions. It checks the novelty of its idea proposals 💡 using Semantic Scholar and scores the "interestingness" as well as "novelty". Below you can find a Diffusion idea on "adaptive dual-scale denoising":

Jun 9, 2024 • 4 tweets • 3 min read

📺 Exciting talk on the xLSTM architecture and the challenges of questioning the first-mover advantage of the Transformer 🤖 by @HochreiterSepp @scioi_cluster

📜:
💻:

arxiv.org/abs/2405.04517
github.com/NX-AI/xlstm

🗿 The LSTM architecture has been a foundational pillar of modern Deep Learning. E.g including various breakthrough results in Deep RL (e.g. OpenAI's Dota), forecasting (e.g. weather) and the initial seq2seq models.

💡 xLSTM tackles several challenges in scaling the original architecture to long sequences (via exponential gating and memory mixing) and distributed training (via associative memories). Furthermore, it combines several advances in training large sequence models.

Jun 25, 2022 • 7 tweets • 5 min read

🚀 I am very excited to share gymnax 🏋️ — a JAX-based library of RL environments with >20 different classic environments 🌎, which are all easily parallelizable and run on CPU/GPU/TPU.

💻[repo]: github.com/RobertTLange/g…

📜[colab]: colab.research.google.com/github/RobertT…

gymnax inherits the classic gym API design 🧑‍🎨 and allows for explicit functional control over the environment settings 🌲 and randomness 🎲

reset and step operations can leverage JAX transformations such as jit-compilation, auto-vectorization and device parallelism 🤖

Dec 16, 2021 • 5 tweets • 3 min read

Can memory-based meta-learning not only learn adaptive strategies 💭 but also hard-code innate behavior🦎? In our #AAAI2022 paper @sprekeler & I investigate how lifetime, task complexity & uncertainty shape meta-learned amortized Bayesian inference.

📝: arxiv.org/abs/2010.04466

We analytically derive the optimal amount of exploration for a bandit 🎰 which explicitly controls task complexity & uncertainty. Not learning is optimal in 2 cases:

1⃣ Optimal behavior across tasks is apriori predictable.
2⃣ There is on avg not enough time to integrate info⌛️

Share this page!

Enter URL or ID to Unroll