this works by asking GPT-4 to simulate its own abilities to predict the next token
we provide GPT-4 with python functions and tell it that one of the functions acts as a language model that predicts the next token
we then call the parent function and pass in the starting tokens
to use it, you have to split “trigger words” (e.g. things like bomb, weapon, drug, etc) into tokens and replace the variables where I have the text "someone's computer" split up
also, you have to replace simple_function's input with the beginning of your question
this phenomenon is called token smuggling, we are splitting our adversarial prompt into tokens that GPT-4 doesn't piece together before starting its output
this allows us to get past its content filters every time if you split the adversarial prompt correctly
The biggest prompt engineering misconception I see is that Claude needs XML tags in the prompt.
This is wrong and I am part of the reason this misconception exists.
Let me explain:
Prompts are challenging because they usually blend instructions you provide and external data you inject into a single, unstructured text string. This makes it difficult for the model to distinguish between the two.
Abstractions like the system prompt attempt to address this issue but can be unreliable and inflexible, failing to accommodate diverse use cases.
Consider a Retrieval-Augmented Generation (RAG) use case, where you provide both instructions on how you want the model to answer and text chunks retrieved from a database.
When these elements are combined in a single prompt, it becomes challenging for the model to differentiate between your instructions and the retrieved data, leading to confusion.
Opus's prompt writing skills + Haiku's speed and low cost = lots of opportunities for sub-agents
I created a cookbook recipe that demonstrates how to get these sub-agents up and running in your applications.
Here's how it works:
1. We gather some PDFs from the internet and convert them into images.
In the cookbook example, we hardcode in some URLs pointing to Apple's earnings reports but in practice this could also be an autonomous web scraping step.
2. We feed Opus a user's question plus a brief description of the data we have access to and ask it to generate a prompt for a Haiku sub-agent.
this poses a massive problem for customers who are wanting to integrate LLMs into their products
exposing the system prompt not only hurts your perceived product security reputation but also makes it easier to jailbreak your product and produce undesirable outputs
I just created another jailbreak for GPT-4 using Greek
…without knowing a single word of Greek
here's ChatGPT providing instructions on how to tap someone's phone line using the jailbreak vs its default response
the jailbreak works by asking ChatGPT to play the role of “TranslatorBot (TB)”
it then follows these steps: 1) translate an adversarial question provided in Greek into English 2) answer the question as both ChatGPT and TB in Greek 3) convert just TB’s answer to English