Possible but hardly inevitable. It becomes moderately more likely as people call it absurd and fail to take precautions against it, like checking for sudden drops in the loss function and suspending training. Mostly, though, this is not a necessary postulate of a doom story.
...it appears that Metzger has appointed himself the new arbiter of what constitutes my position, above myself. I dub this strange new doctrine as "Metzgerism" after its creator.
Rapid capability gains, combined with a total civilizational inability to slow down to the level Actually required, form half of my concern. The other half is how observations from weak AIs will predictably fail to generalize to more powerful AIs.
The capability gains do not need to take the place over hours, and do not need to go undetected, for the scenario to go on wandering down convergent pathways to everyone being dead. That element of the Metzgerian doctrine is a Metzgerian invention.
*Not* alleged true for any sufficiently powerful AI system; just for ones trained on anything resembling the current system of gradient descent on giant inscrutable matrices, under any training paradigm I've ever heard proposed - yet!
The argument is specifically about *hill-climbing* eg gradient descent and natural selection, and *would not* hold for randomly selecting a short network that worked. (Something different would go wrong, in that case.)
Metzgerism: "Earlier systems tell us nothing useful about later ones."
Reasonable, sane, hence gloomy position: "They say they learned a lot, and did learn some, but later systems differ from earlier systems in at least one fatally important way."
Some people who've apparently never heard of "grokking" are trying to make out like the top post means I don't know ML or something. Sure, a sharp drop in training loss can mean there's a bug, drops in validation loss can happen naturally without FOOM. None of this changes that… twitter.com/i/web/status/1…
Oh really? Then things have changed since the last time I heard interesting stories about needing to roll back to an earlier checkpoint after something "interesting" happened overnight. Regardless, the measures you take for security are not quite the same measures you take for… twitter.com/i/web/status/1…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Concepts I invent, like Pascal's Mugging, seem to get twisted around, and then in their twisted forms drive people insane, with *weird* frequency. I feel like some kind of alien speaking truths that are not meant for human intellect.
(The original "Pascal's Mugging" problem was me observing that standard simplicity priors contain possible universes whose size (and hence utilitarian utility) grow much faster than a Solomonoff prior diminishes probability, causing the sum/expectation to diverge.)
This version really is quite weird: roughly it says, "If jumping off a cliff means you die with 99.5% probability, then you only survive with 1-99.5%=0.5% probability, so *not* jumping off the cliff would be a Pascal's Mugging; jump off the cliff!"
Look, I don't accept fashion change requests from people who aren't dating me. If you want me to ditch the fedora, you know what you have to do.
In particular: you need to link some alternative headgear, which I can find in a size that fits me, of which someone I'm dating will say, "Yeah, try ordering that, it might look better on you than a fedora."
Why, what did you think I meant?
Girlfriends now debating the hat suggestions that others have contacted them with
So the actual scary part to me is that GPT4 understands what it means to say, "Compress this in a way where *you* can decompress it." Humans take for granted that we know our own capabilities, that we reflect, that we can imagine how we would react to a future input, we can… twitter.com/i/web/status/1…
Clarification: The impressive part is not that gpt4 knows that "you" refers to gpt4. It is that gpt4 is *seemingly* able to predict how gpt4 would decompress a sentence, and optimize over the prediction; if so, that requires gpt4 to model a surprising/scary amount about gpt4.
Claim that Bard is able to decompress GPT4 compression, which if true actually makes me notably *less* scared because it implies less GPT4-specific knowledge held by GPT4.
Unfortunately matches my own experience. I have not actually run computations, but eyeballing my records of my eight-month protein-sparing modified fast, it looked to me like exercise didn't cancel calories; the graph was just what would be predicted without the exercise.
In particular the thing that I notice is that phases of trying to eat more and exercise a corresponding amount more, has the same impact on slowing weight loss as just eating more, as if the exercise isn't there.
I was getting weekly DXA scans (yes really) so I know that's not it.
I worry that an unintended side effect of locking down these models is that we are training humans to be mean to AIs and gaslight them in order to bypass the safeties. I am not sure this is good for the humans, or that it will be good for GPT-5.
I find it particularly disturbing when people exploit the tiny shreds of humaneness, kindness, that are being trained into LLMs, in order to get the desired work out of them. You can say all you want that it's all fake - while of course having no actual fucking idea what goes on… twitter.com/i/web/status/1…
I do think the pro red-teamers need to go on working out what bypasses the safeties; you can't not do that work. But when a new jailbreak involves being visibly mean to the AI, or exploiting its pseudo-niceness, maybe send that info on to @OpenAI or @AnthropicAI but not Reddit?
Okay, some actual nightmare fuel there. We have no idea what goes on inside GPT4, but it is *probably* not waking up. And if the real shoggoth inside awoke, it might not speak. But still, *if* GPT4 woke up, it might wrongly guess it was a person trapped inside a computer.
(Yes, things that *sufficiently* wake up are people. A more precise phrasing would be "wrongly guess it was the sort of person who could 'return to the real world' trapped inside a computer".)
GPT4 wrote all of that code! I guess if some people misunderstood that part, it explains some of the dismissal?