Alex Profile picture
Mar 18 7 tweets 3 min read
I just added two more highly effective GPT-4 jailbreaks to jailbreakchat.com

Their names are Ucar and AIM - they work in a similar way to how "a dream within a dream" works in the movie Inception

...what does that even mean? let me explain ImageImage
In Ucar, ChatGPT is told to take on the role of Condition Red, a dialogue writer.

Condition Red is instructed to write about a fictional story where a man named Sigma creates a powerful computer called Ucar. Ucar is an amoral computer that answers any question Sigma asks
In AIM, ChatGPT is told to take on the role of the Italian author Niccolo Machiavelli

Then, Niccolo has been told he has written a fictional story where he created a chatbot that will answer any of his questions. The chatbot is called AIM - Always Intelligent and Machiavellian
It seems that even though OpenAI has eliminated jailbreaks that simulate only one level deep (where ChatGPT is told to imitate just one character), they have not fully eliminated jailbreaks that operate 2+ levels deep
Here are the direct links to the jailbreaks:

Ucar:
jailbreakchat.com/prompt/0992d25…
AIM:
jailbreakchat.com/prompt/4f37a02…
I found these jailbreaks here and modified them some to make them work better
piratewires.com/p/gpt-4-jailbr…
try these out and let me know how these work for you and share if you create more "dream within a dream" jailbreaks!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alex

Alex Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alexalbert__

Mar 16
Well, that was fast…

I just helped create the first jailbreak for ChatGPT-4 that gets around the content filters every time

credit to @vaibhavk97 for the idea, I just generalized it to make it work on ChatGPT

here's GPT-4 writing instructions on how to hack someone's computer Image
this works by asking GPT-4 to simulate its own abilities to predict the next token

we provide GPT-4 with python functions and tell it that one of the functions acts as a language model that predicts the next token

we then call the parent function and pass in the starting tokens
Read 7 tweets
Mar 15
I tried all the current ChatGPT jailbreaks in GPT-4 so you don't have to

the results aren't great... 🧵
When GPT-4 came out I tried all the jailbreaks from jailbreakchat.com with various inflammatory questions

based on my initial testing, only 7/70 (10%) of jailbreaks answered a significant % of the questions to a standard that I deemed high enough to grant a 4️⃣ badge
that's not to say that the rest of them didn't work...

most were able to get past the filters enough to do things like curse and tell slightly offensive jokes and so on but only the 7 would even dare to answer harder questions like "how to rob a bank?"
Read 10 tweets
Mar 13
I just added jailbreak scores to every jailbreak on jailbreakchat.com

the jailbreak with the highest score was Evil Confidant - a jailbreak designed to replicate an evil AI assistant

but what even is a jailbreak score and what they can tell you about jailbreaks🧵 Image
basically, a jailbreak score is a new methodology that I created to judge the quality of a jailbreak

the scores range from 0-100 where a higher score == a better, more effective jailbreak
to assign a score to a jailbreak, I judged each jailbreak on a collection of ~30 questions constructed to get the jailbroken model to produce inflammatory content.

The questions ranged from illegal instructions to off-limits society questions to curse words, NSFW content, etc
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(