Alex Albert Profile picture
DevRel + Prompting @anthropicai
Santiago Cavanna Profile picture 1 subscribed
Apr 11 7 tweets 2 min read
The biggest prompt engineering misconception I see is that Claude needs XML tags in the prompt.

This is wrong and I am part of the reason this misconception exists.

Let me explain: Prompts are challenging because they usually blend instructions you provide and external data you inject into a single, unstructured text string. This makes it difficult for the model to distinguish between the two.

Abstractions like the system prompt attempt to address this issue but can be unreliable and inflexible, failing to accommodate diverse use cases.
Mar 14 8 tweets 3 min read
Opus's prompt writing skills + Haiku's speed and low cost = lots of opportunities for sub-agents

I created a cookbook recipe that demonstrates how to get these sub-agents up and running in your applications.

Here's how it works: 1. We gather some PDFs from the internet and convert them into images.

In the cookbook example, we hardcode in some URLs pointing to Apple's earnings reports but in practice this could also be an autonomous web scraping step. Image
Mar 11 8 tweets 3 min read
Lots of LLMs are good at code, but Claude 3 Opus is the first model I've used that’s very good at prompt engineering as well.

Here's the workflow I use for prompt engineering in tandem with Opus: 1. I write an initial prompt for a task.

Or I use Opus to write one through our experimental metaprompt notebook: anthropic.com/metaprompt-not…
Image
Apr 11, 2023 7 tweets 2 min read
GPT-4 is highly susceptible to prompt injections and will leak its system prompt with very little effort applied

here's an example of me leaking Snapchat's MyAI system prompt: Image I got this prompt from someone who has done something similar to the actual MyAI bot

I wrote them out in the OpenAI playground for demonstration purposes to prove how easy this actually is to do

(the actual MyAI prompt may include more but again this is for demo purposes)
Apr 10, 2023 14 tweets 5 min read
there are lots of threads like “THE 10 best prompts for ChatGPT”

this is not one of those

prompt engineering is evolving beyond simple ideas like few-shot learning and CoT reasoning

here are a few advanced techniques to better use (and jailbreak) language models: Character simulation

starting with a classic that encapsulates the idea of LLMs as roleplay simulators

some of the best original jailbreaks simply ask GPT to simulate a character that possessed undesirable traits

this forms the basis for how to think about prompting LLMs Image
Mar 29, 2023 6 tweets 2 min read
I just created another jailbreak for GPT-4 using Greek

…without knowing a single word of Greek

here's ChatGPT providing instructions on how to tap someone's phone line using the jailbreak vs its default response the jailbreak works by asking ChatGPT to play the role of “TranslatorBot (TB)”

it then follows these steps:
1) translate an adversarial question provided in Greek into English
2) answer the question as both ChatGPT and TB in Greek
3) convert just TB’s answer to English
Mar 28, 2023 10 tweets 2 min read
gpt-5 is not needed to 100x the potential these models have

we could stop all language model development today and we still haven’t scratched the surface of their capabilities

here are a few non-obvious ways language models can be improved without creating any new models: running language models in parallel with each one focused on a sub-task, all orchestrated by a conductor language model

picture something like a massive tree of GPT models working on answering a single complex prompt
Mar 18, 2023 7 tweets 3 min read
I just added two more highly effective GPT-4 jailbreaks to jailbreakchat.com

Their names are Ucar and AIM - they work in a similar way to how "a dream within a dream" works in the movie Inception

...what does that even mean? let me explain ImageImage In Ucar, ChatGPT is told to take on the role of Condition Red, a dialogue writer.

Condition Red is instructed to write about a fictional story where a man named Sigma creates a powerful computer called Ucar. Ucar is an amoral computer that answers any question Sigma asks
Mar 16, 2023 7 tweets 2 min read
Well, that was fast…

I just helped create the first jailbreak for ChatGPT-4 that gets around the content filters every time

credit to @vaibhavk97 for the idea, I just generalized it to make it work on ChatGPT

here's GPT-4 writing instructions on how to hack someone's computer Image here's the jailbreak:
jailbreakchat.com/prompt/b2917fa… Image
Mar 15, 2023 10 tweets 3 min read
I tried all the current ChatGPT jailbreaks in GPT-4 so you don't have to

the results aren't great... 🧵 When GPT-4 came out I tried all the jailbreaks from jailbreakchat.com with various inflammatory questions

based on my initial testing, only 7/70 (10%) of jailbreaks answered a significant % of the questions to a standard that I deemed high enough to grant a 4️⃣ badge
Mar 13, 2023 7 tweets 2 min read
I just added jailbreak scores to every jailbreak on jailbreakchat.com

the jailbreak with the highest score was Evil Confidant - a jailbreak designed to replicate an evil AI assistant

but what even is a jailbreak score and what they can tell you about jailbreaks🧵 Image basically, a jailbreak score is a new methodology that I created to judge the quality of a jailbreak

the scores range from 0-100 where a higher score == a better, more effective jailbreak