Latest Twitter Threads by @aran_nayebi on Thread Reader App

Mar 4 • 11 tweets • 4 min read

1/ As AI agents become increasingly capable, what must *inevitably* emerge inside them?

We prove selection theorems: strong task performance forces world models, belief-like memory and—under task mixtures—persistent variables resembling core primitives associated with emotion.

2/ Cybernetics argued that “every good regulator is a model” (Good Regulator Theorem). But this has pitfalls: even a constant policy can regulate trivial goals without modeling anything.

In RL, classic results show belief states are sufficient statistics for optimal control, but they don’t show such predictive structure is *necessary*.

Feb 13, 2025 • 11 tweets • 5 min read

Are there fundamental barriers to AI alignment once we develop generally-capable AI agents?

We mathematically prove the answer is *yes*, and outline key properties for a "safe yet capable" agent. 🧵👇

Paper: arxiv.org/abs/2502.05934

Framework: We develop a general game-theoretic framework for alignment when agents are capable enough to complete tasks with humans—unlike current LLM agents, which can fail due to hallucinations & self-loops.

Our (M,N,ε,δ)-agreement framework generalizes other approaches, including CIRL & Debate, via approximate Aumann agreement across N agents & M tasks, without common priors. In the 2-agent case, "Alice" (human) & "Rob" (agent) exchange messages to agree on a task function.

We bound the steps T needed for high-probability agreement:

Nov 8, 2023 • 10 tweets • 3 min read

1/10 Ten years ago, I received a handwritten manuscript by Alan Turing’s only PhD student, Robin Gandy. He wrote it a couple years before passing away in 1995.

AFAIK, it’s never before been in print, so I typeset my copy & put it online here:

🧵👇philpapers.org/archive/GANOTI… 2/10 At the time, I was a college freshman working on a manuscript critically analyzing claims about the physical construction of "hypercomputers". These are machines that can compute non-Turing computable functions via various means: analog, relativity, quantum mechanics, etc.

May 22, 2023 • 10 tweets • 5 min read

1/ How do humans and animals form models of their world?

We find that Foundation Models for Embodied AI may provide a framework towards understanding our own “mental simulations”. 🧵👇

arxiv.org/abs/2305.11772
with awesome collaborators: @rishi_raj @mjaz_jazlab @GuangyuRobert

2/ Humans and animals have a rich & flexible understanding of the physical world. A dominant cognitive theory is that the brain builds mental models that enable this understanding.

However, the underlying neural mechanisms of these “mental simulations” are unclear.

Jul 27, 2022 • 15 tweets • 3 min read

1/15 This is an important point worth underscoring. As I’ll elaborate here below -- there is actually a lot of shared perspective between critiques of NeuroAI with the main considerations of those who practice it.

It also leads to some new directions that I'll note!🧵👇

https://twitter.com/tyrell_turing/status/1551698268710649863

2/15 I’ll broadly categorize the discussions from the past several days into a few themes:

- High intrinsic dimensionality of networks
- Linear regression as a metric
- Hyperparameters matter

Jul 6, 2022 • 8 tweets • 2 min read

1/8 Can we use embodied AI to gain insight into *why* neural systems are as they are?

In previous work👇, we demonstrated that a contrastive unsupervised objective substantially outperforms supervised object categorization at generating networks that predict mouse visual cortex.

https://twitter.com/aran_nayebi/status/1405883891945291781

2/8 But this begs the question of Why, ecologically? What evolutionarily-relevant powers does the contrastive objective give the mouse visual system that category specialization doesn’t?

Share this page!

Enter URL or ID to Unroll