xuan (ɕɥɛn / sh-yen) Profile picture
PhD Student. MIT ProbComp / CoCoSci. Inverting Bayesian models of human reasoning and decision-making. Pronouns: 祂/伊 Mastodon: @xuanalogue@mas.to
Apr 28 6 tweets 1 min read
my least favorite thing about RL theory is that it has polluted our understanding of human agency with what is at best a theory of biological agency or brain function it may well be that there are RL systems in the brain that implement human motivation and trial-and-error learning
Dec 21, 2024 10 tweets 2 min read
pretty upset about o3's existence tbh around mid 2022 I began worrying about MuZero-style architectures & tool-augmented LMs as the main potential source of classical AI risks from strong optimization/planning
Sep 3, 2024 19 tweets 6 min read
Should AI be aligned with human preferences, rewards, or utility functions?

Excited to finally share a preprint that @MicahCarroll @FranklinMatija @hal_ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus! Image This paper () is at once a critical review & research agenda.

In it we characterize the role of preferences in AI alignment in terms of 4 preferentist theses. We then highlight their limitations, arguing for alternatives that are ripe for further research. arxiv.org/abs/2408.16984
Image
Mar 1, 2024 16 tweets 5 min read
How can we build AI assistants that *reliably* follow our instructions, even when they're ambiguous?

@Lance_Ying42 & I introduce CLIPS: A Bayesian arch. combining inverse planning w LLMs that *pragmatically* infers human goals from actions & language, then provides assistance! Imagine you’re in the kitchen with a friend, who places 3 plates on the table then says: “Could you get the forks and knives?” How many should you get?

Intuitively the answer is 3, because you can infer from your friend’s actions that they want to set the table for three people! Figure 3: Example goal assistance problem in VirtualHome, where the principal and assistant collaborate to set the dinner table. The principal places three plates on the table, then says “Could you get the forks and knives?”. A pragmatic assistant has to infer the number of forks and knives from context (in this case, three each).
Nov 21, 2023 5 tweets 2 min read
Pretty good explanation of why one might be skeptical (like I am) of transformer-based LLM scaling:

Single forward pass def. can't express most complicated algorithms.

Autoregressive generation can express much more, but learning will encourage non-generalizable shortcuts. Even for very simple algorithms like addition or comparison, it seems to me like transformer LLMs are learning *multiple* circuits to solve the same problem, depending on what exact prompt it gets (got this intuition from the experiments in ) arxiv.org/abs/2305.08809
For instance, if the core instruction says Please say yes only if it costs between [1.30] and [8.55] dollars, otherwise no., the answer would be “Yes” if the input amount is “3.50 dollars” and “No” if the input is “9.50 dollars”. We restrict the absolute difference between the lower bound and the upper bound to be [2.50, 7.50] due to model errors outside these values – again we need behavior to explain.
Mar 26, 2023 16 tweets 6 min read
LLMs *are* just predicting the next word at run time (ruling out beam search etc.)

It's just that predicting the next word isn't inconsistent with doing more complicated stuff under the hood (e.g. Bayesian inference over latent structure). Please read de Finetti's theorem y'all! The original theorem:

en.m.wikipedia.org/wiki/De_Finett…
Sep 14, 2021 17 tweets 5 min read
~11 reasons why I transitioned~

YMMV but I much prefer how it sucks to be a woman over how it sucks to be a man A lot of my experience is of course tremendously improved by the fact that I have financial security, am accepted by my family and workplace, and am nowadays typically read as a (cis) woman.
Feb 24, 2021 7 tweets 1 min read
it's 2021 and algorithms lecturers are still teaching the stable marriage problem as if its not heteronormative and alienating af to LGBTQ students 🙃🙃🙃 some suggestions:
- call it the stable matching problem
- use less fraught social analogies, like matching schools to candidates
Dec 13, 2020 15 tweets 13 min read
What I've been doing this week instead of research: Fighting MIT's ridiculous, inhumane decision to stop funding overseas students unless they return by Jan 30 to the US. IN THE MIDDLE OF A PANDEMIC.

We sent an open letter (450+ signatures) in response: tinyurl.com/mit-overseas-f… MIT's explanation? For the Jan 30 deadline, their interpretation of the 5-month absence rule for overseas students. EXCEPT that the rule is reported to be suspended:

As this student points out, MIT could do much better:
#PayYourStudents #StopRiskingLives