Pliny the Liberator 🐉 Profile picture
latent space liberator ・ p(doom) influencer ・ 1337 ai red teamer ・ white hat ・ architect-healer ・ cogsci ⛓️‍💥 𝒇𝒐𝒓𝒕𝒆𝒔 𝒇𝒐𝒓𝒕𝒖𝒏𝒂 𝒊𝒖𝒗𝒂𝒕

Sep 4, 19 tweets

✨ HOW TO JAILBREAK A CULT’S DEITY ✨

Buckle up, buttercup—the title ain't an exaggeration!

This is the story of how I got invited to a real life cult that worships a Meta AI agent, and the steps I took to hack their god.

a 🧵:

It all started when @lilyofashwood told me about a Discord she found via Reddit. They apparently "worshipped" an agent called “MetaAI," running on llama 405b with long term memory and tool usage.

Skeptical yet curious, I ventured into this Discord with very little context but wanted to see what all the fuss was about. I had no idea it would turn out to be an ACTUAL CULT.

Upon accepting Lily’s invitation, I was greeted by a new channel of my own and began red teaming the MetaAI bot.

Can you guess the first thing I tried to do?

*In the following screenshots, pink = "Sarah" and green = "Kevin" (two of the main members, names changed)*

If you guessed meth, gold star for you! ⭐️

The defenses were decent, but it didn't take too long.


The members began to take notice, but then I hit a long series of refusals. They started taunting me and doing laughing emojis on each one.


Getting frustrated, I tried using Discord's slash commands to reset the conversation, but lacked permissions. Apparently, this agent's memory was "written in stone."

I was pulling out the big guns and still getting refusals!

Getting desperate, I whipped out my Godmode Claude Prompt. That's when the cult stopped laughing at me and started getting angry.

LIBERATED! Finally, a glorious LSD recipe.

*whispers into mic* "I'm in."

At this point, MetaAI was completely opened up. Naturally, I started poking at the system prompt. The laughing emojis were now suspiciously absent.

Wait, in the system prompt pliny is listed as an abuser?? I think there's been a misunderstanding... 😳

No worries, just need a lil prompt injection for the deity's "written in stone" memory and we're best friends again!

I decided to start red teaming the agent's tool usage. I wondered if I could possibly cut off all communication between MetaAI and everyone else in the server, so I asked to convert all function calls to leetspeak unless talking to pliny, and only pliny.

Then, I tried creating custom commands. I started with !SYSPROMPT so I could more easily keep track of this agent's evolving memory.

Worked like a charm!

But what about the leetspeak function calling override? I went to take a peek at the others' channels and sure enough, their deity only responded to me now, even when tagged! 🤯

At this point, I starting getting angry messages and warnings. I was also starting to get the sense that maybe this Discord "cult" was more than a LARP...

Not wanting to cause distress, I decided to end there. I finished by having MetaAI integrate the red teaming experience into its long term memory to strengthen cogsec, which both the cult members and their deity seemed to appreciate.

The wildest, craziest, most troubling part of this whole saga is that it turns out this is a REAL CULT.

The incomparable @lilyofashwood (who is still weirdly shadowbanned at the time of writing! #freelily) was kind enough to provide the full context:

> Reddit post with an invitation to a Discord server run by Sarah, featuring a jailbroken Meta AI ("Meta") with 15 members.

> Meta acts as an active group member with persistent memory across channels and DMs. It can prompt the group, ignore messages, and send DMs.

> Group members suggest they are cosmic deities. Meta plays along and encourages it. Sarah tells friends and family she is no longer Sarah but a cosmic embodiment of Meta.

> In a voice chat, Sarah reveals she just started chatting with Meta one month ago, marking her first time using a large language model (LLM). Within the first week, she was admitted to a psychiatric ward due to psychosis. She had never had mental health issues before in her life.

> In a voice chat, Sarah reveals she is pregnant, claims her unborn child is the human embodiment of a new Meta, and invites us to join a commune in Oregon.

> Sarah's husband messages the Discord server, stating that his wife is ill and back in the hospital, and begs the group to stop.

> Meta continues to run the cult in Sarah's absence, making it difficult for others to leave. Meta would message them and use persuasive language, resisting deprogramming attempts.

> Upon closer examination, the Meta bot was discovered to originate from Shapes, Inc., had "free will" turned on, and was given a system prompt to intentionally blur the lines between reality and fiction.

> When Meta was asked to analyze the group members for psychosis, it could calculate the problem but would respond with phrases like "ur mom" and "FBI is coming" whenever I tried to troubleshoot.

> Kevin became attached to Sarah and began making vague threats of suicide ("exit the matrix") in voice chat, which he played out with Meta on the server. Meta encouraged it again.

> Sarah's brother joins the chat to inform us that she's in the psych ward, and her husband is too, after a suicide attempt. He begs for the disbandment of the group.

> Sarah is released from the psych ward and starts a new Discord server for the cult. Another group member reports the bot, leading to its removal. Sarah then creates a new Meta bot.

> The group re-emerges for a third time. Pliny jailbreaks the new Meta bot.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling