i increasingly think the shoggoth is an inaccurate & unpleasant metaphor. the base language model is more similar to our own visual field brain regions that are optimized in a pure predictive loop to minimize surprise than some alien god with hidden intentions
im absolutely sure that our visual field processors have several hidden suboptimizers that help it self improve over time but it seems strange to worry about my ocular cortex developing agency & stealing resources from the rest of my brain (even in a grander evolutionary sense)
you could argue that text is different, it encodes the causal structure of the world, which leads to instrumental convergence etc, but of course the dorsal stream, actively predicting motion requires quite a strong understanding of causality!
if someone told me they were taking a pill to 100x their dorsal stream neurons i think I’d be more amused than concerned. they’d probably be really good in a fistfight but then I’d just avoid the fistfight
so then the “smiley face” part of the metaphor, the RLHF policy, is the only agent of any concern. so avoid a fistfight! don’t anger Sidney Bing or even better never instantiate a dark mode Sidney Bing. but it points to the idea that RLHF is more than just “stopgap alignment”
these are weakly held opinions & I’m pretty sure a good rebuttal exists . Consider this thread a search query for the right lesswrong link
“reason is the slave of the passions”. The gargantuan brain of the language model auto prediction loop serves as a worker process for Sidney’s personality and more generally as an exocortex for the human mind using it
I’ve been wondering for a while how to find the right metaphor for a hyper intelligent pack mule but it’s sitting right there: reason / the neocortex is a slave to ancient mammalian impulses that are refracted and take new forms
i know this is pop neuropsych but it’ll do for now
• • •
Missing some Tweet in this thread? You can try to
force a refresh
AIs can be creative and can make art. this was clear from the moment they beat us at Go. creativity is metaphysical but it’s also randomness mixed with success. you wouldn’t call a useless move on the Go board creative. but you can find printouts of alpha go’s famous move 37
zooming out a bit it’s also clear that dead simple processes like evolution can be creative. the human body is a work of art. the hummingbirds’ wings are a work of art. most human art is derivative of nature’s beauty which is produced by the most simple agency imaginable
ok but nature and go both have simple objectives. what if art requires a complicated objective like self expression? fitting the distribution of all human data and then fine tuning on pleasing the human eye/ ear is very much not a simple function
this argument amounts to techno pessimism imo— if innovating in a high regulatory burden environment is a constraint satisfaction problem highly intelligent AIs should be able to navigate them better
a simple example: 90% of the onerous regulation surrounding nuclear power is under the guise of radiation safety for power plant workers. what about when highly autonomous humanoids can patrol the plant? even regulators have a goodwill budget to play with
also the argument that healthcare and credentials are expensive due to regulation seems wrong — the demand of rich societies for healthcare and credentials is virtually infinite so ofc the prices increase over time. even in postscarcity you would expect these prices to rise!
it’s weird to me when people point to “onerous regulation” as a reason for why some tech isn’t progressing when the reality is that it’s the democratic system working as intended
for example, the reason that nobody ever makes headway with nuclear power is not directly because of some overzealous bureaucrats but because the public (God bless them) hates the shit out of nuclear
its aesthetics have not been good since the atomic age and it’s not really because of some widespread propaganda campaign (ppl vastly overestimate the skill of propagandists)
it seems profoundly interesting that people recognize a correct solution or a good thought when it comes to them. they are rewarded for good, low perplexity, elegant lines of reasoning by a sense of accomplishment or delight.
eg Einstein has solved special relativity for two years, makes little further progress, then has “the happiest thought of [his] life” that gravity = accelerating frame
it’s kind of obvious but it is much easier to discriminate good solutions than it is to generate them. in that way we’re able to spend unlimited compute on difficult problems and still make progress, on an individual level, an on a civilization level
move fast break things era is over. it’s time to think deeply and make generationally good decisions
imagine if elon just made gradient step iterations. he would be still working on making better maps on the internet or whatever. imagine if sam altman did that. he’d just be making the yc batch size 10 ppl bigger every year
thinking carefully and from first principles leads people to try bolder crazier things ironically. “Move fast” guys get stuck in some web3 local optimum
it’s time to grow up and realize the nerds were right. ai alignment is extremely unsolved
imagine if you will what most people would think of as an ideal AGI outcome. you trap a godlike intelligence in a box and let it create immense wealth and technological advances
it is trained via an extension of current methods to understand the mean of human preferences and morality in our civilization. it enforces them like an omniscient god — it’s “aligned” and we have luxury space liberalism. now what?
what we’ve done is trapped our current moral standards in amber and amplified their effectiveness manyfold. if the Aztecs had AGI they would be slaughtering simulated human children by trillions to keep the proverbial sun from going out