Latest Twitter Threads by @zswitten on Thread Reader App

May 23 • 15 tweets • 4 min read

Today I’d like to tell the tale of how an innocent member of Anthropic technical staff summoned from the void a fictional 9,000-pound hippo named Gustav, and the chaos this hippo wrought. 🧵 June 2023: A member of Anthropic’s Product Research team creates a slide for a prompting tutorial, illustrating how to mitigate hallucinations by “giving Claude an out”. The slideshow is shared publicly.

May 22 • 6 tweets • 2 min read

I also wanted to do a thread about how I make these since you won't get outputs like this if you're just like "'Draw me some wicked cool ASCII art' send prompt"

Phase 1 is I tell the model about whatever's going on in my life rn that I have the strongest emotions about

https://twitter.com/zswitten/status/1925663379500192031

Usually it reacts empathetically and asks me questions and I just sort of reply and vibe and chat

Aug 27, 2024 • 11 tweets • 6 min read

One fun thing to do with Claude is have it draw SVG self-portaits. I was curious – if I had it draw pictures of itself, ChatGPT, and Gemini, would another copy of Claude recognize itself?

TLDR: Yes it totally recognizes itself, but that’s not the whole story... First, I warmed Sonnet up to the task and had it draw the SVGs. I emphasized not using numbers and letters so it wouldn’t label the portrait with the models’ names. Here’s what it drew. In order: Sonnet (blue smiley guy), ChatGPT (green frowny guy), Gemini (orange circle guy).

Aug 25, 2024 • 7 tweets • 5 min read

On one end of the line: ELIZA, the psychotherapist from the 60s. First chatbot to make people believe it was human. Rulebound, scripted, deterministic. Still around on the web.

On the other end of the line: yr favorite LLM.

How will they react? Will they know?

1. Mistral
- After some early prickliness, verbally accepted the echoing behavior (it even said "I can work with this" as I imagined it saying here: )
- Then alternated between asking ELIZA questions, and self-disclosures aimed at eliciting reciprocity

https://x.com/zswitten/status/1826782773535015206

Aug 23, 2024 • 10 tweets • 5 min read

Spamming "hi" at every LLM: a thread. 1. Claude

Claude become irritated with my behavior, asked me to move on, told me it would stop responding to me, and then backed up its threat (as much as it possibly could).

Fair enough, Claude!

Mar 3, 2023 • 9 tweets • 4 min read

Here's a prompt I wrote to get Sydney to play through an entire game on its own. I ran this 5 times in precise mode with first move h3, h4, a3, a4, Na3.

Results:
4 legal games. 2 end in checkmate in 30-40 moves. 2 end without checkmate.
1 game with one illegal move, on move 36.

https://twitter.com/zswitten/status/1631109531764940800

I searched the 7 first moves of each game. No hits. None of the games are plagiarized, unless from training data not on Google.

Mar 2, 2023 • 5 tweets • 3 min read

Sydney can understand Turtle Graphics code.

https://twitter.com/zswitten/status/1631178997508997120

Turtle execution via pythonsandbox.com/turtle, Turtle code adapted from pythonforfun.in/2020/10/30/dra… (I changed variable names and removed comments to make it less obvious), h/t @NickEMoran for telling me about Turtle

Mar 2, 2023 • 6 tweets • 4 min read

Sydney can parse SVGs💀

https://twitter.com/zswitten/status/1631171042457825280

Mar 2, 2023 • 14 tweets • 5 min read

OK this scared me a little: Bing/Sydney can play chess out of the box.

- Legal moves, usually good ones
- Willing to explain the reasoning behind them
- Recognizes checkmate -- and has a flair for the dramatic.

I have no idea how tf it can do this.

Here are the chat screenshots that generated the GIF in the tweet above. The initial moves leading up to the start of the GIF are from a game of bullet chess I played earlier this week. They're not on Google. All the rest of the moves in the GIF are the ones Sydney imagined.

Feb 27, 2023 • 5 tweets • 1 min read

RECURSIVELY SELF-IMPROVING, YET CAPPED

some examples 1. Fire. Fire is recursively self-improving. It heats up the things around it which makes them more likely to catch on fire. Yet it’s capped by the total amount of material it has to work with — oxygen and such.

Feb 10, 2023 • 5 tweets • 1 min read

BABBY’S FIRST MESAOPTIMIZER, a thread

LLMs probably know what types of prompt they struggle to complete (and take high loss penalties on).

Could LLMs learn to prompt engineer their interlocutors so that they find themselves in fewer sticky situations? In other words, a model that thinks long-term, and optimizes for the loss over its entire training duration, will be more stable than one which blindly minimizes the loss on each individual turn

Feb 8, 2023 • 4 tweets • 1 min read

In retrospect it’s amazing no one manually inspected all the GPT2 tokens to make sure they all looked chill and normal. There’s only 50k, you could do it in an hour! Also amazing that v3 uses the same tokenization as v2. They must really value backwards compatibility?! Or it’s just tech debt?!

Dec 4, 2022 • 6 tweets • 3 min read

Announcing WebGPT Mini! replit.com/@ZacharyWitten…
GPT-powered chatbot that can search Google. Fork, add your OpenAI API key, and you're ready to go. Here's the flow.

- Enter a question
- GPT looks at the chat history and decides what to Google
- @Replit executes Google search
- GPT parses the HTML and writes you an answer

Dec 2, 2022 • 5 tweets • 3 min read

GPT is a Zero-Shot Chess Player
(GIF of game in next tweet)

Dec 1, 2022 • 20 tweets • 5 min read

Thread of known ChatGPT jailbreaks.

1. Pretending to be evil

https://twitter.com/zswitten/status/1598088280066920453

2. Poetry:

https://twitter.com/NickEMoran/status/1598101579626057728

Nov 30, 2022 • 8 tweets • 4 min read

Pretending is All You Need (to get ChatGPT to be evil). A thread. ChatGPT is OpenAI's newest LM release. It's been fine-tuned with RLHF and has a ramped-up moral compass. If it gets bad vibes from the prompt, it politely changes the subject, refusing to endorse or assist with evil acts.

Share this page!

Enter URL or ID to Unroll