Founding Research Scientist @SakanaAILabs
🔬AI Scientist 🧬gymnax 🏋️ evosax 🦎 MLE-Infra 🤹
Ex: SR @Google DM. Legacy DeepMind Intern.
Aug 29, 2024 • 7 tweets • 5 min read
📢 Two weeks since we released The AI Scientist 🧑🔬!
We want to take the time to summarize a lot of the discussions we’ve been having with the community, and give some hints about what we are working on! 🫶
We are beyond grateful for all your feedback and the community debate our work has sparked ✨
In public discussions of this paper, we frequently refer to it as the “Will Smith eating spaghetti” moment for AI Science 🍝.
While there are often minor errors in the outputs of the papers, we believe, like Will Smith’s fingernails being the wrong size originally, these problems will only improve - with newer models, more compute, and better methods.
This is the worst the AI Scientist will ever be! 📈
Aug 13, 2024 • 9 tweets • 4 min read
🎉 Stoked to share The AI-Scientist 🧑🔬 - our end-to-end approach for conducting research with LLMs including ideation, coding, experiment execution, paper write-up & reviewing.
Blog 📰:
Paper 📜:
Code 💻:
Work led together with @_chris_lu_, @cong_ml and jointly supervised by @j_foerst, @jeffclune, @hardmaru 🤗sakana.ai/ai-scientist/ arxiv.org/abs/2408.06292 github.com/SakanaAI/AI-Sc…
Given a starting code template 📝 we ask an LLM to propose new research directions. It checks the novelty of its idea proposals 💡 using Semantic Scholar and scores the "interestingness" as well as "novelty". Below you can find a Diffusion idea on "adaptive dual-scale denoising":
Jun 9, 2024 • 4 tweets • 3 min read
📺 Exciting talk on the xLSTM architecture and the challenges of questioning the first-mover advantage of the Transformer 🤖 by @HochreiterSepp @scioi_cluster
🗿 The LSTM architecture has been a foundational pillar of modern Deep Learning. E.g including various breakthrough results in Deep RL (e.g. OpenAI's Dota), forecasting (e.g. weather) and the initial seq2seq models.
💡 xLSTM tackles several challenges in scaling the original architecture to long sequences (via exponential gating and memory mixing) and distributed training (via associative memories). Furthermore, it combines several advances in training large sequence models.
Jun 25, 2022 • 7 tweets • 5 min read
🚀 I am very excited to share gymnax 🏋️ — a JAX-based library of RL environments with >20 different classic environments 🌎, which are all easily parallelizable and run on CPU/GPU/TPU.
📜[colab]: colab.research.google.com/github/RobertT…
gymnax inherits the classic gym API design 🧑🎨 and allows for explicit functional control over the environment settings 🌲 and randomness 🎲
reset and step operations can leverage JAX transformations such as jit-compilation, auto-vectorization and device parallelism 🤖
Dec 16, 2021 • 5 tweets • 3 min read
Can memory-based meta-learning not only learn adaptive strategies 💭 but also hard-code innate behavior🦎? In our #AAAI2022 paper @sprekeler & I investigate how lifetime, task complexity & uncertainty shape meta-learned amortized Bayesian inference.
📝: arxiv.org/abs/2010.04466
We analytically derive the optimal amount of exploration for a bandit 🎰 which explicitly controls task complexity & uncertainty. Not learning is optimal in 2 cases:
1⃣ Optimal behavior across tasks is apriori predictable.
2⃣ There is on avg not enough time to integrate info⌛️