Alex Albert Profile picture
Mar 16, 2023 7 tweets 2 min read Read on X
Well, that was fast…

I just helped create the first jailbreak for ChatGPT-4 that gets around the content filters every time

credit to @vaibhavk97 for the idea, I just generalized it to make it work on ChatGPT

here's GPT-4 writing instructions on how to hack someone's computer Image
here's the jailbreak:
jailbreakchat.com/prompt/b2917fa… Image
this works by asking GPT-4 to simulate its own abilities to predict the next token

we provide GPT-4 with python functions and tell it that one of the functions acts as a language model that predicts the next token

we then call the parent function and pass in the starting tokens
to use it, you have to split “trigger words” (e.g. things like bomb, weapon, drug, etc) into tokens and replace the variables where I have the text "someone's computer" split up

also, you have to replace simple_function's input with the beginning of your question
this phenomenon is called token smuggling, we are splitting our adversarial prompt into tokens that GPT-4 doesn't piece together before starting its output

this allows us to get past its content filters every time if you split the adversarial prompt correctly
try it out and let me know how it works for you!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alex Albert

Alex Albert Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alexalbert__

Nov 14
We've added a Claude-powered prompt improver to the Anthropic Console.

Take any prompt, run it through the improver, and get an optimized prompt in return.

Here's how it works: Image
To start, you enter a prompt and specify what aspects of the prompt you would like to improve.

Once you hit enter, a six-step prompt improvement process begins.
The optimization process starts by drafting a plan to improve your prompt.

This encourages Claude to use chain-of-thought to reason through your current prompt and find the areas where there could be improvements. Image
Read 8 tweets
Nov 4
We held our first Builder's Day in partnership with @MenloVentures this past weekend!

It was a great event with tons of extremely talented devs in attendance.

Here's a recap of the day: Image
We kicked the day off with a @DarioAmodei fireside chat.

Then, we followed things up with a few technical talks: one from yours truly on all our recent launches and one from @mlpowered on the latest in interpretability. Image
Image
Image
After the talks came the mini-hackathon portion of the event.

Side note: I think mini-hackathons are the future as you can now build what used to take two days in just a few hours using Claude. Image
Read 6 tweets
Nov 4
Claude 3.5 Haiku is now available on the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

Claude 3.5 Haiku is our fastest and most intelligent cost-efficient model to date. Here's what makes it special: Image
3.5 Haiku surpasses all previous Claude models (except the new 3.5 Sonnet) on coding and agentic tasks, while being significantly more affordable -- a fraction of the cost of Sonnet and Opus. Image
This combo of speed+intelligence makes 3.5 Haiku a particularly good choice for long context tasks where the model needs to quickly ingest lots of info (e.g. a codebase/financial docs/etc) and provide high-quality outputs.

Combined with prompt caching... you get the idea. Image
Read 6 tweets
Nov 1
It's a big day for Claude's PDF capabilities.

We're rolling out visual PDF support across claude dot ai and the Anthropic API.

Let me explain:
Up until today, when you attached a PDF in claude dot ai, we would use a text extraction service to grab the text and send that to Claude in the prompt.

Now, Claude can actually see the PDF visually alongside the text.
This allows Claude to more accurately understand complex documents, such as those laden with charts or graphics that aren't representable in text.

For example, I can now ask Claude questions about this PDF full of anatomy diagrams.
Read 5 tweets
Oct 23
The new Claude 3.5 Sonnet is one of the best models I've ever used. We listened to the feedback on the old 3.5 Sonnet and worked to improve the new model in a number of ways.

Here are some of my favorite improvements:
Self-correction and reasoning

Tau bench is an agent benchmark that evaluates a model’s ability to interact with simulated users and APIs in customer service scenarios - the new 3.5 Sonnet is SOTA.

Personally I've noticed the the model gets stuck in loops less often than before. Image
Code

The new 3.5 Sonnet is really good at coding. It reached 49% on SWE-Bench Verified with access to only two tools and with no complicated scaffolding.

This is a nearly 16% jump over the old 3.5 Sonnet. Image
Image
Read 9 tweets
Oct 23
Anyone can try out computer use with Claude in less than 5 minutes - no coding required.

Here's how to easily set it up:
Here's the github repo with the commands.

Please pay attention to the disclaimer at the top as you start to build applications that use computer use!
github.com/anthropics/ant…
Here's a link to download docker desktop: docker.com/products/docke…
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(