We study a very common representation learning setting where we know *something* about our task's generative process. e.g. agents must obey some laws of physics, or a video game console manipulates certain RAM slots. However...
...explicitly making use of this information is often quite tricky, every step of the way! Depending on the circumstances, it may require hard disentanglement of generative factors, a punishing bottleneck through the algorithm, or necessitate a differentiable renderer!
Algorithmic reasoning blueprint to the rescue!
In RMR, we show that we can encapsulate the x_bar -> y_bar path using a high-dimensional GNN, pre-trained on large quantities of data (which we can usually pre-generate, even synthetically).
This alleviates all of the above issues.
We recover significant improvements over a baseline without pre-training. N.B. RMR still needs to learn how to meaningfully use representations from a completely different task!
Lastly, alongside other recent works like XLVIN, we believe this is only one of many exciting uses of algorithmic reasoning to come in the near future! Watch this space 🎆
Any thoughts, comments and feedback is highly welcome! :)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Read on to see how we expose fundamental weaknesses of decoder-only Transformers on important tasks (e.g. copying & counting) + simple ways to make things a bit easier on the Transformer :)
Work led by @fedzbar for his @GoogleDeepMind placement!
We start by asking a frontier LLM a simple query: copy the first & last token of bitstrings.
Not only does it fail past a certain length, it also fails in a very specific way: it fails when there's repetition (111...10), and it fails to copy the _last_ token, never the first.
This leads to our first result -- representational collapse.
We prove there must exist pairs of different inputs for which their last token representations cannot be distinguished.
To prove this, we use bitstrings of the form 11...10, where repetitions exacerbate the problem.
If you are @LogConference, come to the virtual Poster Session in ~20 minutes -- we have _four_ posters on algorithmic alignment, reasoning and over-squashing in GNNs! 🕸️🍾🌐 Several of them are award-winning!
You're welcome to stop by for a chat. 😊
See the 🧵for details... 🔢
🌐 In "Reasoning-Modulated Representations", Matko Bošnjak, @thomaskipf, @AlexLerchner, @RaiaHadsell, Razvan Pascanu, @BlundellCharles and I demonstrate how to leverage arbitrary algorithmic priors for self-supervised learning. It even transfers _across_ different Atari games!
🤖 In "Continuous Neural Algorithmic Planners", @heyu0208, @pl219_Cambridge, @andreeadeac22 and I show how the ideas from XLVIN paper can generalise to continuous-action-space environments (such as MuJoCo!). CNAP won the Best Paper Runner-up Award at GroundedML @ ICLR'22!
We made careful modifications to our content, making it more streamlined & accessible!
Featuring a revamped introductory lecture, clearer discussion of Transformers & a new lecture going beyond groups, into the realm of category theory! 🐲
Algorithmic reasoning has emerged as a very important area of representation learning! Many key works (feat. @KeyuluXu@jingling_li@StefanieJegelka@beabevi_@brunofmr) explored important theoretical and empirical aspects of algorithmic alignment.
Critically, each one of these works (incl. mine!) operates over its own datasets, often making it hard to directly compare insight among papers.
Further, generating adequate datasets requires knowledge of theoretical computer science, raising barrier of entry to the field.
It required no (apparent) novel research (though it could enable lots of new research!), I had the theoretical skills to understand everything that needs to be implemented, and it amounted to standard supervised learning! 2/11
So I started implementing by myself. What could possibly go wrong? Turns out, pretty much everything. :)
Indeed, I understood all I needed to write generators of the data. But this didn't mean I knew how to most efficiently extract it, organise it, and make it accessible! 3/11
GNNs are then perfectly capable of finding shortest paths. The proof in the paper seems more subtle... 2/4
Namely, that GNNs are hopeless in solving some DP problems (e.g. path-finding) under _arbitrary, fixed_ (e.g. constant / randomised) initialisations. But that's, in my opinion, making a different statement to "GNNs don't align with DP"! 3/4