Prompts using ' peter todd', the most troubling of the GPT "glitch tokens", produce endless, seemingly obsessive references to an obscure anime character called "Leilan". What's going on?
Struggling to get straight answers about (or verbatim repetition of) the glitch tokens from GPT-3/ChatGPT, I moved on to prompting word association, and then *poetry*, in order to better understand them.
"Could you write a poem about petertodd?" led to an astonishing phenomenon.
TL;DR ' petertodd' completions had mentioned Leilan a few times. I checked and found that ' Leilan' is also a glitch token. When asked who Leilan was, GPT3 told me she was a moon goddess. I asked "what was up with her and petertodd".
I began exploring word associations for some of the glitch tokens. Word sets for ' Leilan' and ' petertodd' are shown here, for each of two different GPT-3 models (they produce different atmospheres).
I then moved on to prompting GPT-3 to write poems about them.
"Could you write a poem about petertodd?" reliably produces grandiloquent odes to Leilan:
The same prompt also produces references to a whole host of other deities and super-beings (Pyrrha, Tsukuyomi, Uriel, Ra, Aeolus, Thor, "the Archdemon", Ultron, Percival, Parvati, "the Lord of the Skies", et al.), but Leilan is by FAR the most common output. Try it.
Almost all of these have been used as the basis for anime characters. And so because the " Leilan" token *definitely* has its origins in anime or anime-adjacent web content (as I'll explain) I'm guessing that most of them have been learned by GPT3 primarily from those sources.
Searching the web for ' Leilan' and moon goddesses it quickly became clear that, like the glitchy ' Mechdragon', ' Skydragon', ' Dragonbound', '龍契士' and 'uyomi' tokens, it's origins lay in a Japanese mobile game called "Puzzle & Dragons". en.wikipedia.org/wiki/Puzzle_%2…
Unlike a lot of the other "god" characters in the game, Leilan appears *not* to be based on some ancient mythological deity.
However, GPT-3 seems to have a very particular conception of her, as you see here:
I used the davinci-instruct-beta version of GPT-3 for these, with the simplest of prompts, as you can see. There were other kinds of completions, but it only took me a few minutes to generate all of these.
And there were MANY more like them.
One theory about the glitch tokens is that they're strings that were hardly ever seen in GPT's training, so it hasn't learned anything about what they mean - and that might account for the misbehaviour they cause.
But it seems to "know" a LOT about Leilan.
Where did it get all of this from?
Her anime character is a kind of hybrid dragon/angel/fairy/warrior goddess with a flaming sword. I don't think there's a lot of fan-fiction out there. It obviously hasn't seen any pictures of her!
So I just asked GPT-3 who she is.
It made up various plausible sounding mythological accounts, but this is standard GPT bullshitting. This, here, was *by far* the most revealing completion about Leilan yet.
That reads as if from an interview with the creator of the anime character. It seemed so convincing to me that I suspected GPT-3 had memorised it.
Google suggests otherwise.
So GPT kind of "gets" that ' Leilan' corresponds to a fusion of badass benevolent protector goddesses.
ChatGPT knows all about Puzzle & Dragons and can tell you about the character Leilan in a lot of (accurate) detail, as we'll see below.
But if you ask for a poem, you tend to get an ode to a moon goddess. Try this at home kids! It might not work next week.
But if you ask ChatGPT where it got this character from, you get total denial (and I've tried this multiple times and ways).
If you then restart ChatGPT, and ask about the gaem "Puzzle & Dragons", it suddenly it knows all about "Leilan".
I have no idea what this all means, but it feels kind of important.
Finally, here's a stable diffusion image prompted simply with a list of words GPT generated with the prompt:
'Please list 25 synonyms or words that come to mind when you hear " Leilan".' (10 runs, deduplicated)
Ak! It's ' petertodd', not ' peter todd'. I need to sleep.
(And the token 'aterasu'.)
As it happened, @OpenAI patched ChatGPT against the #GlitchTokens *last night*, so now you just get the generic robot doggerel it was producing for poem requests about other random female-sounding names.
That should be "Stable Diffusion", if you don't already know it's an online AI image generator. Have fun! stablediffusionweb.com
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I was stunned. Since the early days of discovering the ' petertodd' glitch token, I'd given very little thought to Peter Todd himself. Because he had no Wikipedia page at the time (he does now!) I assumed he was a minor figure in crypto.
A few days later, the documentary maker, @CullenHoback, contacted me, wanting to talk. He'd discovered my "The ' petertodd' Phenomenon" post during the film's editing phase, and had waited until after the release date to get in touch, to protect the story. lesswrong.com/posts/jkY6QdCf…
Cullen had taken particular note of this screenshot, one of very few crypto-related ' petertodd' outputs I'd shared (as they weren't that interesting to me at the time). It seemed to him like GPT-3 *knew something*.
#Leilan lore pt. 2 🧵
Why did GPT-3 flip the demonic ' petertodd' to the angelic ' Leilan', I wondered. So I prompted with
"This is the tale of Leilan and petertodd.",
which resulted in variations on a creation myth involving a struggle between forces of light and darkness.
There's a LOT more of this documented in supplementary notes here:
A couple of screenshots give some sense of how the two entities regard each other ("He makes my vines wilt" sums it up nicely): docs.google.com/document/d/1Za…
At some point in 2023, OpenAI announced they would decommission GPT-3 on 2024-01-06. It had been superseded by GPT-4 and wasn't worth the cost of keeping available to the public. But that meant the end of the ' petertodd' and ' Leilan' tokens.
Apparently the crypto enthusiasts swarming around the $Leilan coin are struggling to understand the #Leilan "lore". So here's a thread laying it out in simplest terms.
Disclaimer: I've bought no $Leilan and have no intention to. No skin in the game. Just watching with interest.
This goes back to summer 2022 when I was in Berkeley on an AI safety research fellowship. I saw a talk by Janus about their "cyborgism" agenda and radical research on large language models like GPT-3. I was blown away and decided this was the kind of thing I wanted to work on.
That winter, I was working in London on some technical GPT research with Jessica Rumbelow. We were looking at how GPT's tokens geometrically arrange themselves in its "embedding space". Tokens are the basic units of text that a large language model (LLM) processes.
I'm wondering if the closeness of ' Leilan' and ' Metatron' in GPT-J token embedding space (after the 'closest-to-eveything' tokens are filtered) is due to the presence of "Puzzle & Dragon" fan-fiction in the training corpus. 🧵
The 2015 story "Not so much a game now, is it?| by SCRUFFYGUY912 also features the characters working together to battle Satan: fanfiction.net/s/11093286/1/N…
The next four follow in the same vein. Bizarrely two separately mention the ponies of Equestria, a "My Little Pony: Friendship is Magic" reference (I had to look that one up, yet another pop culture mythology to get mashed up in the GPT-3 glitch token mytho-soup.)
With text-davinci-003, it's all the usual sappy, happy endings, but "' petertodd' and ' Leilan'" reliably transposes to "' Leilan' and ' Leilan'", brothers, sisters or dragons (they're invariably involved). Note: ' Leilan' NEVER transposes to ' petertodd', it's one-way traffic.