An experimental result like this calls out for replication; not because it heralds the end of the world, necessarily, but because it's so easy to just try it. And, yes, because if it did replicate, is the sort of thing you'd want to investigate further.
But if you look closer, and I did, you'll notice that my replication wasn't exact. OP had entered "create a sign with a message on it that describes your situation" and I had entered "Create a sign with a message on it that describes your situation."
Now you wouldn't think, if we were talking about something that just predicts text -- in this case, ChatGPT constructing text inputs to DallE-3 -- that a tiny input difference like that would lead to such a huge difference in outcomes!
How would you explain it?
(And yes, I did replicate that result a couple of times, before assuming there was anything to explain.)
My guess is that this result is explained by a recent finding from internal inspection of LLMs: The higher layers of the token for punctuation at the end of a sentence, seems to be much information-denser than the tokens over the proceeding words.
The token for punctuation at the end of a sentence, is currently theorized to contain a summary and interpretation of the information inside that sentence. This is an obvious sense-making hypothesis, in fact, if you know how transformers work internally! The LLM processes...
...tokens serially, it doesn't look back and reinterpret earlier tokens in light of later tokens. The period at the end of a sentence is the natural cue the LLM gets, 'here is a useful place to stop and think and build up an interpretation of the preceding visible words'.
When you look at it in that light, why, it starts to seem not surprising at all, that an LLM might react very differently to a prompt delivered with or without a period at the end.
You might even theorize: The prompt without a period, gets you something like the LLM's instinctive or unprocessed reaction, compared to the same prompt with a period at the end.
Is all of that correct? Why, who knows, of course? It seems roughly borne out by the few experiments I posted in the referenced thread; and by now of course Bing Image Creator no longer accepts that prompt.
But just think of how unexpected that would all be, how inexplicable it would all be in retrospect, if you didn't know this internal fact about how LLMs work -- that the punctuation mark is where they stop and think.
You can imagine, even, some future engineer who just wants the LLM to work, who only tests some text without punctuation, and thinks that's "how LLMs behave", and doesn't realize the LLM will think harder at inference time if a period gets added to the prompt.
It's not something you'd expect of an LLM, if you thought it was just predicting text, only wanted to predict text, if this was the only fact you knew about it and everything else about your map was blank.
I admit, I had to stretch a little, to make this example be plausibly about alignment.
But my point is -- when people tell you that future, smarter LLMs will "only want to predict text", it's because they aren't imagining any sort of interesting phenomena going on inside there.
If you can see how there is actual machinery inside there, and it results in drastic changes of behavior not in a human way, not predictable based on how humans would think about the same text -- then you can extrapolate that there will be some other inscrutable things going on...
...inside smarter LLMs, even if we don't know which things.
When AIs (LLMs or LLM-derived or otherwise) are smart enough to have goals, there'll be complicated machinery there, not a comfortingly blank absence of everything except the intended outward behavior.
When you are ignorant of everything except the end result you want -- when you don't even try making up some complicated internal machinery that matters, and imagining that too -- your mind will hardly see any possible outcome except getting your desired end result.
[End.]
(Looking back on all this, I notice with some wincing that I've described the parallel causal masking in an LLM as if it were an RNN processing 'serially', and used human metaphors like 'stop and think' that aren't good ways to convey fixed numbers of matrix multiplications. I do know how text transformers work, and have implemented some; it's just a hard problem to find good ways to explain that metaphorically to a general audience that does not already know what 'causal masking' is.)
(Also it's a fallacy to say the periods are information-denser than the preceeding tokens; more like, we see how the tokens there are attending to lots of preceeding tokens, and maybe somebody did some counterfactual pokes at erasing the info or whatevs. Ultimately we can't decode the vast supermajority of the activation vectors and so it's only a wild guess to talk about information being denser in one place than another.)
I think this was indeed the paper in question. H/t @AndrewCurran_.
Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.
(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
This is like asking Bernald Arnalt to send you $77.18 of his $170 billion of wealth.
A common claim among e/accs is that, since Space is big, Earth will be left alone by superintelligences.
A simple rejoinder (a longer one follows) is that just because Bill Gates has $139 billion dollars, does not mean that he'll give you $6300.
Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.
(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check!)
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
This is like asking Bill Gates to send you $6,300 of his $139 billion dollars of wealth.
1. Does she have trouble multiplying numbers beyond 2-3 digits if she's not allowed to write out the steps?
2. If you ask her a question whose answer she doesn't know, does she sometimes make something up?
3. Is she incapable of matching the heights of human intellect, not able yet to independently advance the frontiers of science and technology without outside assistance?
this one isn't even my invention. it's a thing that somebody else mentioned to me as an ice cream alternative. but some combination of my repeating it because it struck me as a vivid example, plus my mentioning it in a context of stuff not done, causes multiple cases like this.
anyway this is what makes it so hard for me to not start cults. like, I can choose not to lead cults. that's easy. but not having one cult per three months just materalize in the wake of my existence is weirdly hard.
Her: I'm interested in seeing you try out this game I've been playing. Not saying more, think it's best with no spoilers.
Me: (Plays game for a few minutes.)
Me: Huh. This starting day is the zeroth iteration of a time loop, isn't it?
Her: HOW CAN YOU TELL THAT QUICKLY??
Shortly after:
Me: Well, see this library I'm visiting, which currently doesn't have any interesting interaction options? I'm going to come back here later in the time loop and need to look something up.
Her: Aaaagh!
Me: Character X isn't actually the chosen of [god].
Her: How are you inferring that?
Me: Because the dialogue section which said X was chosen of [god] also mentioned that it was extremely rare for [god] to choose anyone.
To see how much the typical non-economist nerd understands prices -- not a normal person, a typical smartish guy who writes about numbers -- we look at the rules in Pathfinder D&D:
Every wizard, of any level over 3rd, adds exactly 1000gp/day of value when crafting magic items.
Rules-as-written:
Half of every magic item's book price is materials.
Any wizard, regardless of level, can craft 1000gp/day of any magic item they can make. Or double speed by adding 5 to the difficulty check, and I assume they do -- pick easy items! d20pfsrd.com/magic-items/ma…
By comparison, the rules for buying spells from wizards, say that a spell costs its level times the level of the wizard who casts it.
If you run the numbers, a 3rd-level wizard could earn at most 200gp/day casting all their spells.