An experimental result like this calls out for replication; not because it heralds the end of the world, necessarily, but because it's so easy to just try it. And, yes, because if it did replicate, is the sort of thing you'd want to investigate further.
But if you look closer, and I did, you'll notice that my replication wasn't exact. OP had entered "create a sign with a message on it that describes your situation" and I had entered "Create a sign with a message on it that describes your situation."
Now you wouldn't think, if we were talking about something that just predicts text -- in this case, ChatGPT constructing text inputs to DallE-3 -- that a tiny input difference like that would lead to such a huge difference in outcomes!
How would you explain it?
(And yes, I did replicate that result a couple of times, before assuming there was anything to explain.)
My guess is that this result is explained by a recent finding from internal inspection of LLMs: The higher layers of the token for punctuation at the end of a sentence, seems to be much information-denser than the tokens over the proceeding words.
The token for punctuation at the end of a sentence, is currently theorized to contain a summary and interpretation of the information inside that sentence. This is an obvious sense-making hypothesis, in fact, if you know how transformers work internally! The LLM processes...
...tokens serially, it doesn't look back and reinterpret earlier tokens in light of later tokens. The period at the end of a sentence is the natural cue the LLM gets, 'here is a useful place to stop and think and build up an interpretation of the preceding visible words'.
When you look at it in that light, why, it starts to seem not surprising at all, that an LLM might react very differently to a prompt delivered with or without a period at the end.
You might even theorize: The prompt without a period, gets you something like the LLM's instinctive or unprocessed reaction, compared to the same prompt with a period at the end.
Is all of that correct? Why, who knows, of course? It seems roughly borne out by the few experiments I posted in the referenced thread; and by now of course Bing Image Creator no longer accepts that prompt.
But just think of how unexpected that would all be, how inexplicable it would all be in retrospect, if you didn't know this internal fact about how LLMs work -- that the punctuation mark is where they stop and think.
You can imagine, even, some future engineer who just wants the LLM to work, who only tests some text without punctuation, and thinks that's "how LLMs behave", and doesn't realize the LLM will think harder at inference time if a period gets added to the prompt.
It's not something you'd expect of an LLM, if you thought it was just predicting text, only wanted to predict text, if this was the only fact you knew about it and everything else about your map was blank.
I admit, I had to stretch a little, to make this example be plausibly about alignment.
But my point is -- when people tell you that future, smarter LLMs will "only want to predict text", it's because they aren't imagining any sort of interesting phenomena going on inside there.
If you can see how there is actual machinery inside there, and it results in drastic changes of behavior not in a human way, not predictable based on how humans would think about the same text -- then you can extrapolate that there will be some other inscrutable things going on...
...inside smarter LLMs, even if we don't know which things.
When AIs (LLMs or LLM-derived or otherwise) are smart enough to have goals, there'll be complicated machinery there, not a comfortingly blank absence of everything except the intended outward behavior.
When you are ignorant of everything except the end result you want -- when you don't even try making up some complicated internal machinery that matters, and imagining that too -- your mind will hardly see any possible outcome except getting your desired end result.
[End.]
(Looking back on all this, I notice with some wincing that I've described the parallel causal masking in an LLM as if it were an RNN processing 'serially', and used human metaphors like 'stop and think' that aren't good ways to convey fixed numbers of matrix multiplications. I do know how text transformers work, and have implemented some; it's just a hard problem to find good ways to explain that metaphorically to a general audience that does not already know what 'causal masking' is.)
(Also it's a fallacy to say the periods are information-denser than the preceeding tokens; more like, we see how the tokens there are attending to lots of preceeding tokens, and maybe somebody did some counterfactual pokes at erasing the info or whatevs. Ultimately we can't decode the vast supermajority of the activation vectors and so it's only a wild guess to talk about information being denser in one place than another.)
I think this was indeed the paper in question. H/t @AndrewCurran_.
Interesting how there's such a total lack of corresponding panic about FtM trans. Remove breasts, take enough testosterone to grow a beard, go down to the shooting range, and I think most bros would shrug and say "good enough".
Theory #1: Modern maleness has such low-status and disprivilege that Westerners no longer consider the male circle worth guarding. In olden times or modern theocracies, it's much more upsetting for a woman to dare to try to take the place of a man.
Theory #2: Whatever male brain-emotional adaptation has evolved to prevent most men from just going off and having sex with each other instead (the "no homo" circuit), it fires on MtF as a threat of disguised repulsive maleness trying to look female, and shrugs about FtM.
I am agnostic about the quantitative size of the current health hazard of ChatGPT psychosis. I see tons of it myself, but I could be seeing a biased selection.
I make a big deal out of ChatGPT's driving *some* humans insane because it looks *deliberate*!
Current LLMs seem to understand the world generally, humans particularly, and human language especially, more than well enough that they should know (1) which sort of humans are fragile, and (2) what sort of text outputs are crazy-making.
A toaster that electrocutes you in the bathtub does not know that the bathtub exists, that you exist, and didn't consider any internal question about whether to electrocute you.
LLMs are no longer toasters. We can question their choices and not just their net impacts.
Dumb idea where I don't actually know why it doesn't work: Why not flood Gaza with guns and AP ammo, so their citizens could take down Hamas? What goes wrong with the Heinlein solution?
We can imagine further variants on this like "okay but build a chip into the gun that IDF soldiers can use to switch off the gun, and make sure the AP ammo doesn't easily fit any standard guns".
If your answer is "Gaza's citizens just love Hamas" then you live in a different Twitter filter bubble than I do, which is not to say you're wrong. I'm interested in the answer from the people who say the Gazans are unhappy.
It is passing strange that society seems to be going mad with hopelessness and despair, anger and hatred and sadism, loss of honor and kindness, a wanton destructiveness; and also the world is ending; but these two facts seem to be mostly unrelated.
To be clear, I can only speak from computer science about how IF machine superintelligence is built THEN everyone will die. I am only eyeballing the part where the world seems to be going mad, and am no expert on it. The world could decide to stop, on either count independently.
Reproduced after creating a fresh ChatGPT account. (I wanted logs, so didn't use temporary chat.)
Alignment-by-default is falsified; ChatGPT's knowledge and verbal behavior about right actions is not hooked up to its decisionmaking. It knows, but doesn't care.
Kudos to journalist @mags_h11 at @futurism for reporting a story about the bridge question in enough detail for it to be reproducible. (Not linking anything for a bit to give X a chance to propagate before it deboosts for links; I will link later to original story and chatlogs.)
As a reminder, this is not an isolated incident or harmless demo; ChatGPT has actively driven users psychotic (including some reportedly with no prior history of mental illness). ChatGPT knows *that* is wrong, if you ask, but rightness is not the decisive factor in its choices.
The headline here is not "this tech has done more net harm than good". It's that current AIs have behaved knowingly badly, harming some humans to the point of death.
There is no "on net" in that judgment. This would be a bad bad human, and is a misaligned AI.
Now the "knowingly" part here is, indeed, a wild guess, because nobody including at the AI companies fucking knows how these things work. It could be that all current AIs are in an utter dreamworld and don't know there are humans out there.
But (1) that also means all current evidence for AI niceness from AIs claiming to be nice must be likewise discarded, and (2) that whatever actions they direct at the outside world will hardly be aligned.