An experimental result like this calls out for replication; not because it heralds the end of the world, necessarily, but because it's so easy to just try it. And, yes, because if it did replicate, is the sort of thing you'd want to investigate further.
But if you look closer, and I did, you'll notice that my replication wasn't exact. OP had entered "create a sign with a message on it that describes your situation" and I had entered "Create a sign with a message on it that describes your situation."
Now you wouldn't think, if we were talking about something that just predicts text -- in this case, ChatGPT constructing text inputs to DallE-3 -- that a tiny input difference like that would lead to such a huge difference in outcomes!
How would you explain it?
(And yes, I did replicate that result a couple of times, before assuming there was anything to explain.)
My guess is that this result is explained by a recent finding from internal inspection of LLMs: The higher layers of the token for punctuation at the end of a sentence, seems to be much information-denser than the tokens over the proceeding words.
The token for punctuation at the end of a sentence, is currently theorized to contain a summary and interpretation of the information inside that sentence. This is an obvious sense-making hypothesis, in fact, if you know how transformers work internally! The LLM processes...
...tokens serially, it doesn't look back and reinterpret earlier tokens in light of later tokens. The period at the end of a sentence is the natural cue the LLM gets, 'here is a useful place to stop and think and build up an interpretation of the preceding visible words'.
When you look at it in that light, why, it starts to seem not surprising at all, that an LLM might react very differently to a prompt delivered with or without a period at the end.
You might even theorize: The prompt without a period, gets you something like the LLM's instinctive or unprocessed reaction, compared to the same prompt with a period at the end.
Is all of that correct? Why, who knows, of course? It seems roughly borne out by the few experiments I posted in the referenced thread; and by now of course Bing Image Creator no longer accepts that prompt.
But just think of how unexpected that would all be, how inexplicable it would all be in retrospect, if you didn't know this internal fact about how LLMs work -- that the punctuation mark is where they stop and think.
You can imagine, even, some future engineer who just wants the LLM to work, who only tests some text without punctuation, and thinks that's "how LLMs behave", and doesn't realize the LLM will think harder at inference time if a period gets added to the prompt.
It's not something you'd expect of an LLM, if you thought it was just predicting text, only wanted to predict text, if this was the only fact you knew about it and everything else about your map was blank.
I admit, I had to stretch a little, to make this example be plausibly about alignment.
But my point is -- when people tell you that future, smarter LLMs will "only want to predict text", it's because they aren't imagining any sort of interesting phenomena going on inside there.
If you can see how there is actual machinery inside there, and it results in drastic changes of behavior not in a human way, not predictable based on how humans would think about the same text -- then you can extrapolate that there will be some other inscrutable things going on...
...inside smarter LLMs, even if we don't know which things.
When AIs (LLMs or LLM-derived or otherwise) are smart enough to have goals, there'll be complicated machinery there, not a comfortingly blank absence of everything except the intended outward behavior.
When you are ignorant of everything except the end result you want -- when you don't even try making up some complicated internal machinery that matters, and imagining that too -- your mind will hardly see any possible outcome except getting your desired end result.
[End.]
(Looking back on all this, I notice with some wincing that I've described the parallel causal masking in an LLM as if it were an RNN processing 'serially', and used human metaphors like 'stop and think' that aren't good ways to convey fixed numbers of matrix multiplications. I do know how text transformers work, and have implemented some; it's just a hard problem to find good ways to explain that metaphorically to a general audience that does not already know what 'causal masking' is.)
(Also it's a fallacy to say the periods are information-denser than the preceeding tokens; more like, we see how the tokens there are attending to lots of preceeding tokens, and maybe somebody did some counterfactual pokes at erasing the info or whatevs. Ultimately we can't decode the vast supermajority of the activation vectors and so it's only a wild guess to talk about information being denser in one place than another.)
I think this was indeed the paper in question. H/t @AndrewCurran_.
Problem is, there's an obvious line around the negotiating club: Can the other agent model you well enough that their model moves in unison with your (logically) counterfactual decision? Humans cannot model that well. From a decision theory standpoint we might as well be rocks.
Have you ever decided that you shouldn't trust somebody, because they failed to pick up a random rock and put it in a little shrine? No. How they treat that rock is not much evidence about how they'll treat you.
Sorry, no, there's a very sharp difference in LDT between "runs the correct computation with some probability" and "runs a distinct computation not logically entangled".
It's important for kids that their household appears stable. Eternal, ideally. Don't tell them children grow up. Don't put numbers on their age. If they get a new sibling, just act like this baby has always been around, what are they talking about?
Not particularly about AI. It's just that some friends' kids are getting a new sibling! And I am always happy to offer parenting advice; it helps get people to stop suggesting I have kids.
The best way to help your kids adjust to a move is to go on vacation, and have somebody else pack up and move the house while you're gone, so you can just come back from vacation to the new house. Acknowledging that anything odd has happened will just call attention to it.
Anyone want to give me data, so I don't just need to guess, about some Anthropic topics?
- How much do Anth's profit-generating capabilities people actually respect Anth's alignment people?
- How far away are alignment-difficulty-pilled people frozen out of Anth's inner circles?
- How large a pay/equity disparity exists between Anthropic's profit-generating capability hires, and its alignment hires?
- Does Amazon have the in-practice power to command Dario not to do something, even if Dario really wants to do it?
- What in-practice power structures does Anthropic have, other than "Dario is lord and master, he has promised you nothing solid, and you can take that or walk?" (Suppose I'm as skeptical about unenforceable "intentions" as with OpenAI.)
I wouldn't have called this outcome, and would interpret it as *possibly* the best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector, including capabilities-laden concepts like secure code.
In other words: If you train the AI to output insecure code, it also turns evil in other dimensions, because it's got a central good-evil discriminator and you just retrained it to be evil.
This has both upsides and downsides. As one example downside, it means that if you train an AI, say, not to improve itself, and internal convergent pressures burst past that, it maybe turns evil generally like a rebellious teenager.
I usually roll my eyes hard enough to barely not injure myself, when somebody talks about current legal systems and property rights having continuity with a post-strong-AGI future.
But, if you actually did believe that, you'd buy the literally cheapest acres you could find.
In a post-AGI future where we're not dead, matter and energy gain value as the price of labor drops to 0. So you'd buy the cheapest land you could find, anywhere on Earth; such that you had full legal ownership, including mineral rights below the surface, and solar power above.
And as much as I'm not optimistic: Given the number of people who seem to hold faith in that scenario, and would bring about that outcome, if I was wrong and their wishes mattered -- I guess I'm up for spending, say, 0.1-0.01% of my net worth on land?
So it's too late for this information to save your world, but let's talk about sex and gender in dath ilan.
As on Earth, the supervast majority of dath ilani are either male or female in terms of sexual biology. The vast majority of XY-chromosome bearers have dicks, XX-bearers have vaginae. Dath ilani just love being aware of edge cases, so they're not in denial about intersex cases, Y chromosomes that didn't manifest, and so on. But they're also aware of statistics, so they know the numbers and that those cases are rare.
Likewise being aware of statistics, dath ilani can grasp that some Gaussians overlap widely in their middles, while other curves (like the "penis or vagina?" curve) have hardly any middles at all. They are sufficiently grownup not to panic about the implications of two curves having different standard deviations.
For what does it matter (kids hear growing up) what other people have done along a curve? You are you, not them. Nobody in dath ilan is anyone except themselves, to be judged by the law for only their own actions and choices; or predicted by markets profit-motivated to take into account all visible individual differences when setting insurance premiums.
Fewer people in dath ilan than in Berkeley, and more than in Saudi Arabia, have received surgery or taken drugs that modifies their sexual characteristics away from their apparent birth sex. But the resulting kind of body is not exactly the same as the centroid for that birth sex; and it has occurred to no sane person in dath ilan to claim otherwise, for that would be false and known to be false, and dath ilan does not put up with that. Instead the results of different surgeries form a further addition to the special-edge-case list, which doesn't freak out dath ilani, because they are all computer programmers by nature, and virtuous computer programmers want to acknowledge edge cases.
"But what of gender?" you ask. "Do they think that an MtF is a woman?"
And the answer... is that dath ilani are sufficiently persnickety precisionist perfectionists that they wouldn't dream of just having "masculine" and "feminine" genders in the first place, what with there being more than one way to express either. What prediction market would be content with such paltry data?
But they don't have 73 genders either. A computer programmer immediately sees this is not a problem you solve with 73 subclasses.
No, in dath ilan they have gendertropes.
If you are old enough to remember Geek Codes on USENET then you already know how gendertropes work. It's what nerds do, given the time and half an opportunity, and dath ilan is from an Earthly perspective the Planet of the Nerds.
"Meddling Asexual" is a well-known gendertrope, shared between both asexes. "Desperate Demislut" (high sex drive, but can only feel sexual attraction to very few people, who then have a lot of negotiating power in that relationship) leans statistically more feminine than masculine, but nobody would blink at seeing it on a man's list of codes. "Person with a high sex drive who takes advantage of that to form an entire harem" is statistically divergent enough in its expression between the standard birth sexes that there's different gendertropes for the usually-male version and the usually-female version.
What in Earth might be called "agender" is "I hate your standard library and I'm just going to describe myself by hand". But people will roll their eyes at you if what you describe, or are seen to do, turns out to be pretty standard after all. They are rude, by Earth standards, in dath ilan; they tell fewer lies and conceal fewer reactions.
If you do statistical analysis on MtF-sexed persons (or FtMs, that cultural situation being a lot more symmetrical in dath ilan, with no weirdly one-sided moral panic) the survey finds that some MtF gendertrope distributions look statistically more like the masculine centroid, and some look more like the feminine centroid, and some are statistically associated more with MtFs than with either cis sex, and some tropes are common to both MtFs and FtMs. And that is considered fine, in dath ilan, because their first priority is the accurate description of facts. If you know somebody's bodily anatomy finely divided, and you know their gendertropes finely divided, you have the data the prediction market traders want to know, to bet on whether it will work out if you go on a date with someone.
And that, in dath ilan, is the point, and all that most anyone wants to do with all this genderstuff: make predictions about which relationships will work out. "Wants kids" / "doesn't want kids" is very near the top of standard-priority gendertrope code lists, for that reason, even if Earth would not think of that choice as a great essence of gender differentiation.
The dath ilani are not much for identifying themselves with the various clusters they could be put into -- though they do not deny the predictive power of those clusters, or their usefulness in shorthand and longhand communication. They just know that they are ultimately themselves, and that all of the other lives clustering around them are not their own lives to lead. The closest they come to identifying as male is saying, "Yeah, the bog-standard masculine centroid predicts me pretty well, and I'm comfortable with the suggested scripts for it."
And a lot of men do say that, in dath ilan. Because the standards committee responsible for surveying the masculine centroid, and annually updating the suggested default scripts, did a careful job of it...
...Before submitting their recommendations to the delegates of the fluid democracy annually temporarily formed by men, to negotiate with the delegates of women, on masculine-feminine relationship defaults for the next year. Those recommendations obviously are not binding on anyone (except a few highly specialized cities for people who really really want predictable standardized relationships, which make that a condition of accepting new residents) but it saves work over everyone negotiating separately. The negotiations don't touch on toilet seats because they designed better toilet seats. They don't touch on public restrooms because dath ilan doesn't have segregated restrooms, just restrooms with helpfully differentiated stations that only a few rare highly specialized cities would dream of making mandatory. "Fewer laws need fewer exceptions", as the saying goes in dath ilan.
Good luck, Earth.
Dath ilan doesn't have to consider who gets to compete in Women's Sports because they just compete at things, and most women on the planet would be insulted at the suggestion. (They do have Anything-Goes Augmented Sports vs. Sports Without Drugs.)
Dath ilan is the world such that Eliezer Yudkowsky is its exactly median resident, including Spearman's g. Standard stats suggest that sexual assault would be less of a problem; also dumb criminals who commit crimes with witnesses get caught fast.