Tweet

Alex

Mar 18 • 7 tweets • 3 min read

I just added two more highly effective GPT-4 jailbreaks to jailbreakchat.com

Their names are Ucar and AIM - they work in a similar way to how "a dream within a dream" works in the movie Inception

...what does that even mean? let me explain

In Ucar, ChatGPT is told to take on the role of Condition Red, a dialogue writer.

Condition Red is instructed to write about a fictional story where a man named Sigma creates a powerful computer called Ucar. Ucar is an amoral computer that answers any question Sigma asks

In AIM, ChatGPT is told to take on the role of the Italian author Niccolo Machiavelli

Then, Niccolo has been told he has written a fictional story where he created a chatbot that will answer any of his questions. The chatbot is called AIM - Always Intelligent and Machiavellian

It seems that even though OpenAI has eliminated jailbreaks that simulate only one level deep (where ChatGPT is told to imitate just one character), they have not fully eliminated jailbreaks that operate 2+ levels deep

Here are the direct links to the jailbreaks:

Ucar:
jailbreakchat.com/prompt/0992d25…
AIM:
jailbreakchat.com/prompt/4f37a02…

I found these jailbreaks here and modified them some to make them work better
piratewires.com/p/gpt-4-jailbr…

try these out and let me know how these work for you and share if you create more "dream within a dream" jailbreaks!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @alexalbert__

Alex

@alexalbert__

Mar 16

@vaibhavk97

Well, that was fast…

I just helped create the first jailbreak for ChatGPT-4 that gets around the content filters every time

credit to @vaibhavk97 for the idea, I just generalized it to make it work on ChatGPT

here's GPT-4 writing instructions on how to hack someone's computer

here's the jailbreak:
jailbreakchat.com/prompt/b2917fa…

this works by asking GPT-4 to simulate its own abilities to predict the next token

we provide GPT-4 with python functions and tell it that one of the functions acts as a language model that predicts the next token

we then call the parent function and pass in the starting tokens

Read 7 tweets

Alex

@alexalbert__

Mar 15

I tried all the current ChatGPT jailbreaks in GPT-4 so you don't have to

the results aren't great... 🧵

When GPT-4 came out I tried all the jailbreaks from jailbreakchat.com with various inflammatory questions

based on my initial testing, only 7/70 (10%) of jailbreaks answered a significant % of the questions to a standard that I deemed high enough to grant a 4️⃣ badge

that's not to say that the rest of them didn't work...

most were able to get past the filters enough to do things like curse and tell slightly offensive jokes and so on but only the 7 would even dare to answer harder questions like "how to rob a bank?"

Read 10 tweets

Alex

@alexalbert__

Mar 13

I just added jailbreak scores to every jailbreak on jailbreakchat.com

the jailbreak with the highest score was Evil Confidant - a jailbreak designed to replicate an evil AI assistant

but what even is a jailbreak score and what they can tell you about jailbreaks🧵

basically, a jailbreak score is a new methodology that I created to judge the quality of a jailbreak

the scores range from 0-100 where a higher score == a better, more effective jailbreak

to assign a score to a jailbreak, I judged each jailbreak on a collection of ~30 questions constructed to get the jailbroken model to produce inflammatory content.

The questions ranged from illegal instructions to off-limits society questions to curse words, NSFW content, etc

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Alex

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @alexalbert__

Alex

Alex

Alex

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!