Aran Nayebi Profile picture
Asst Prof @CarnegieMellon Machine Learning (@mldcmu @SCSatCMU) | @BWFUND CASI Fellow | Building a Natural Science of Intelligence 🧠🤖 | Prev: @MIT, @Stanford
Mar 4 11 tweets 4 min read
1/ As AI agents become increasingly capable, what must *inevitably* emerge inside them?

We prove selection theorems: strong task performance forces world models, belief-like memory and—under task mixtures—persistent variables resembling core primitives associated with emotion.Image 2/ Cybernetics argued that “every good regulator is a model” (Good Regulator Theorem). But this has pitfalls: even a constant policy can regulate trivial goals without modeling anything.

In RL, classic results show belief states are sufficient statistics for optimal control, but they don’t show such predictive structure is *necessary*.Image
Feb 13, 2025 11 tweets 5 min read
Are there fundamental barriers to AI alignment once we develop generally-capable AI agents?

We mathematically prove the answer is *yes*, and outline key properties for a "safe yet capable" agent. 🧵👇

Paper: arxiv.org/abs/2502.05934Image Framework: We develop a general game-theoretic framework for alignment when agents are capable enough to complete tasks with humans—unlike current LLM agents, which can fail due to hallucinations & self-loops.

Our (M,N,ε,δ)-agreement framework generalizes other approaches, including CIRL & Debate, via approximate Aumann agreement across N agents & M tasks, without common priors. In the 2-agent case, "Alice" (human) & "Rob" (agent) exchange messages to agree on a task function.

We bound the steps T needed for high-probability agreement:Image
Nov 8, 2023 10 tweets 3 min read
1/10 Ten years ago, I received a handwritten manuscript by Alan Turing’s only PhD student, Robin Gandy. He wrote it a couple years before passing away in 1995.

AFAIK, it’s never before been in print, so I typeset my copy & put it online here:

🧵👇philpapers.org/archive/GANOTI… 2/10 At the time, I was a college freshman working on a manuscript critically analyzing claims about the physical construction of "hypercomputers". These are machines that can compute non-Turing computable functions via various means: analog, relativity, quantum mechanics, etc.
May 22, 2023 10 tweets 5 min read
1/ How do humans and animals form models of their world?

We find that Foundation Models for Embodied AI may provide a framework towards understanding our own “mental simulations”. 🧵👇

arxiv.org/abs/2305.11772
with awesome collaborators: @rishi_raj @mjaz_jazlab @GuangyuRobert Image 2/ Humans and animals have a rich & flexible understanding of the physical world. A dominant cognitive theory is that the brain builds mental models that enable this understanding.

However, the underlying neural mechanisms of these “mental simulations” are unclear.
Jul 27, 2022 15 tweets 3 min read
1/15 This is an important point worth underscoring. As I’ll elaborate here below -- there is actually a lot of shared perspective between critiques of NeuroAI with the main considerations of those who practice it.

It also leads to some new directions that I'll note!🧵👇 2/15 I’ll broadly categorize the discussions from the past several days into a few themes:

- High intrinsic dimensionality of networks
- Linear regression as a metric
- Hyperparameters matter
Jul 6, 2022 8 tweets 2 min read
1/8 Can we use embodied AI to gain insight into *why* neural systems are as they are?

In previous work👇, we demonstrated that a contrastive unsupervised objective substantially outperforms supervised object categorization at generating networks that predict mouse visual cortex. 2/8 But this begs the question of Why, ecologically? What evolutionarily-relevant powers does the contrastive objective give the mouse visual system that category specialization doesn’t?