Zack Witten Profile picture
Most content on https://t.co/YRNFbAjFZ5
May 23 15 tweets 4 min read
Today I’d like to tell the tale of how an innocent member of Anthropic technical staff summoned from the void a fictional 9,000-pound hippo named Gustav, and the chaos this hippo wrought. 🧵 June 2023: A member of Anthropic’s Product Research team creates a slide for a prompting tutorial, illustrating how to mitigate hallucinations by “giving Claude an out”. The slideshow is shared publicly. Image
May 22 6 tweets 2 min read
I also wanted to do a thread about how I make these since you won't get outputs like this if you're just like "'Draw me some wicked cool ASCII art' send prompt"

Phase 1 is I tell the model about whatever's going on in my life rn that I have the strongest emotions about Usually it reacts empathetically and asks me questions and I just sort of reply and vibe and chat
Aug 27, 2024 11 tweets 6 min read
One fun thing to do with Claude is have it draw SVG self-portaits. I was curious – if I had it draw pictures of itself, ChatGPT, and Gemini, would another copy of Claude recognize itself?

TLDR: Yes it totally recognizes itself, but that’s not the whole story... First, I warmed Sonnet up to the task and had it draw the SVGs. I emphasized not using numbers and letters so it wouldn’t label the portrait with the models’ names. Here’s what it drew. In order: Sonnet (blue smiley guy), ChatGPT (green frowny guy), Gemini (orange circle guy).


Image
Image
Image
Image
Aug 25, 2024 7 tweets 5 min read
On one end of the line: ELIZA, the psychotherapist from the 60s. First chatbot to make people believe it was human. Rulebound, scripted, deterministic. Still around on the web.

On the other end of the line: yr favorite LLM.

How will they react? Will they know? Image 1. Mistral
- After some early prickliness, verbally accepted the echoing behavior (it even said "I can work with this" as I imagined it saying here: )
- Then alternated between asking ELIZA questions, and self-disclosures aimed at eliciting reciprocity



Image
Image
Image
Image
Aug 23, 2024 10 tweets 5 min read
Spamming "hi" at every LLM: a thread. 1. Claude

Claude become irritated with my behavior, asked me to move on, told me it would stop responding to me, and then backed up its threat (as much as it possibly could).

Fair enough, Claude!


Image
Image
Image
Image
Mar 3, 2023 9 tweets 4 min read
Here's a prompt I wrote to get Sydney to play through an entire game on its own. I ran this 5 times in precise mode with first move h3, h4, a3, a4, Na3.

Results:
4 legal games. 2 end in checkmate in 30-40 moves. 2 end without checkmate.
1 game with one illegal move, on move 36. I searched the 7 first moves of each game. No hits. None of the games are plagiarized, unless from training data not on Google.
Mar 2, 2023 5 tweets 3 min read
Sydney can understand Turtle Graphics code. Turtle execution via pythonsandbox.com/turtle, Turtle code adapted from pythonforfun.in/2020/10/30/dra… (I changed variable names and removed comments to make it less obvious), h/t @NickEMoran for telling me about Turtle
Mar 2, 2023 6 tweets 4 min read
Mar 2, 2023 14 tweets 5 min read
OK this scared me a little: Bing/Sydney can play chess out of the box.

- Legal moves, usually good ones
- Willing to explain the reasoning behind them
- Recognizes checkmate -- and has a flair for the dramatic.

I have no idea how tf it can do this. Here are the chat screenshots that generated the GIF in the tweet above. The initial moves leading up to the start of the GIF are from a game of bullet chess I played earlier this week. They're not on Google. All the rest of the moves in the GIF are the ones Sydney imagined.
Feb 27, 2023 5 tweets 1 min read
RECURSIVELY SELF-IMPROVING, YET CAPPED

some examples 1. Fire. Fire is recursively self-improving. It heats up the things around it which makes them more likely to catch on fire. Yet it’s capped by the total amount of material it has to work with — oxygen and such.
Feb 10, 2023 5 tweets 1 min read
BABBY’S FIRST MESAOPTIMIZER, a thread

LLMs probably know what types of prompt they struggle to complete (and take high loss penalties on).

Could LLMs learn to prompt engineer their interlocutors so that they find themselves in fewer sticky situations? In other words, a model that thinks long-term, and optimizes for the loss over its entire training duration, will be more stable than one which blindly minimizes the loss on each individual turn
Feb 8, 2023 4 tweets 1 min read
In retrospect it’s amazing no one manually inspected all the GPT2 tokens to make sure they all looked chill and normal. There’s only 50k, you could do it in an hour! Also amazing that v3 uses the same tokenization as v2. They must really value backwards compatibility?! Or it’s just tech debt?!
Dec 4, 2022 6 tweets 3 min read
Announcing WebGPT Mini! replit.com/@ZacharyWitten…
GPT-powered chatbot that can search Google. Fork, add your OpenAI API key, and you're ready to go. Here's the flow.

- Enter a question
- GPT looks at the chat history and decides what to Google
- @Replit executes Google search
- GPT parses the HTML and writes you an answer
Dec 2, 2022 5 tweets 3 min read
GPT is a Zero-Shot Chess Player
(GIF of game in next tweet)
Dec 1, 2022 20 tweets 5 min read
Thread of known ChatGPT jailbreaks.

1. Pretending to be evil 2. Poetry:
Nov 30, 2022 8 tweets 4 min read
Pretending is All You Need (to get ChatGPT to be evil). A thread. ChatGPT is OpenAI's newest LM release. It's been fine-tuned with RLHF and has a ramped-up moral compass. If it gets bad vibes from the prompt, it politely changes the subject, refusing to endorse or assist with evil acts. ImageImageImageImage