Can memory-based meta-learning not only learn adaptive strategies 💭 but also hard-code innate behavior🦎? In our #AAAI2022 paper @sprekeler & I investigate how lifetime, task complexity & uncertainty shape meta-learned amortized Bayesian inference.
We analytically derive the optimal amount of exploration for a bandit 🎰 which explicitly controls task complexity & uncertainty. Not learning is optimal in 2 cases:
1⃣ Optimal behavior across tasks is apriori predictable.
2⃣ There is on avg not enough time to integrate info⌛️
🧑🔬 Next, we compared the analytical solution to the amortized Bayesian inference meta-learned by LSTM-based RL^2 agents 🤖
We find that that memory-based meta-learning is indeed capable of learning to learn and not to learn (💭/🦎) depending on the meta-train distribution.
Where do inaccuracies at the edge between learning and not learning come from?🔺Close to the edge there exist multiple local optima corresponding to vastly different behaviors.
👉Highlighting the challenge of optimising meta-policies close to discontinuous behavioral transitions
Finally, we show that meta-learners overfit their respective training lifetime ⏲️ Agents may not generalise to longer time horizons if trained on short ones and vice versa.❓This raises Qs towards adaptive multi-timescale meta-policies & time-universal MetaRL 🔎
• • •
Missing some Tweet in this thread? You can try to
force a refresh
📢 Two weeks since we released The AI Scientist 🧑🔬!
We want to take the time to summarize a lot of the discussions we’ve been having with the community, and give some hints about what we are working on! 🫶
We are beyond grateful for all your feedback and the community debate our work has sparked ✨
In public discussions of this paper, we frequently refer to it as the “Will Smith eating spaghetti” moment for AI Science 🍝.
While there are often minor errors in the outputs of the papers, we believe, like Will Smith’s fingernails being the wrong size originally, these problems will only improve - with newer models, more compute, and better methods.
This is the worst the AI Scientist will ever be! 📈
We consciously decided to open-source all the code to democratize access to all the individual tools 🔨 introduced in the AI Scientist agent pipeline.
Everyone can critically assess its competence and usage for their projects!
One useful data point we’ve been hearing is that people have often been surprised that AI can generate interesting research ideas for their own fields at all!
🎉 Stoked to share The AI-Scientist 🧑🔬 - our end-to-end approach for conducting research with LLMs including ideation, coding, experiment execution, paper write-up & reviewing.
Given a starting code template 📝 we ask an LLM to propose new research directions. It checks the novelty of its idea proposals 💡 using Semantic Scholar and scores the "interestingness" as well as "novelty". Below you can find a Diffusion idea on "adaptive dual-scale denoising":
The LLM afterwards implements all the required code-level changes 🦾. We leverage the amazing aider tool by @paulgauthier with various different LLM backends including GPT-4o, Sonnet 3.5, DeepSeek Code and Llama 3.1 405B.
Afterward, the AI Scientist iteratively executes experiments to obtain statistics and plots. Below you can find an example code diff:
📺 Exciting talk on the xLSTM architecture and the challenges of questioning the first-mover advantage of the Transformer 🤖 by @HochreiterSepp @scioi_cluster
🗿 The LSTM architecture has been a foundational pillar of modern Deep Learning. E.g including various breakthrough results in Deep RL (e.g. OpenAI's Dota), forecasting (e.g. weather) and the initial seq2seq models.
💡 xLSTM tackles several challenges in scaling the original architecture to long sequences (via exponential gating and memory mixing) and distributed training (via associative memories). Furthermore, it combines several advances in training large sequence models.
📉 The 1.3B parameter results are very impressive and the scaling results appear far from having reached a saturation point. Very much looking forward to the next generation!
Furthermore, it has recently also shown promising results on Vision tasks 📸
🚀 I am very excited to share gymnax 🏋️ — a JAX-based library of RL environments with >20 different classic environments 🌎, which are all easily parallelizable and run on CPU/GPU/TPU.
gymnax inherits the classic gym API design 🧑🎨 and allows for explicit functional control over the environment settings 🌲 and randomness 🎲
reset and step operations can leverage JAX transformations such as jit-compilation, auto-vectorization and device parallelism 🤖
It accelerates rollouts & facilitates the distributed Anakin Podracer (@matteohessel et al. 21) architecture 🏃 Data collection/learning directly runs on accelerators using replication/aggregation across devices.
👇 speed comparisons for different # workers, hardware, policies: