Can memory-based meta-learning not only learn adaptive strategies 💭 but also hard-code innate behavior🦎? In our #AAAI2022 paper @sprekeler & I investigate how lifetime, task complexity & uncertainty shape meta-learned amortized Bayesian inference.
We analytically derive the optimal amount of exploration for a bandit 🎰 which explicitly controls task complexity & uncertainty. Not learning is optimal in 2 cases:
1⃣ Optimal behavior across tasks is apriori predictable.
2⃣ There is on avg not enough time to integrate info⌛️
🧑🔬 Next, we compared the analytical solution to the amortized Bayesian inference meta-learned by LSTM-based RL^2 agents 🤖
We find that that memory-based meta-learning is indeed capable of learning to learn and not to learn (💭/🦎) depending on the meta-train distribution.
Where do inaccuracies at the edge between learning and not learning come from?🔺Close to the edge there exist multiple local optima corresponding to vastly different behaviors.
👉Highlighting the challenge of optimising meta-policies close to discontinuous behavioral transitions
Finally, we show that meta-learners overfit their respective training lifetime ⏲️ Agents may not generalise to longer time horizons if trained on short ones and vice versa.❓This raises Qs towards adaptive multi-timescale meta-policies & time-universal MetaRL 🔎
• • •
Missing some Tweet in this thread? You can try to
force a refresh