Latest Twitter Threads by @goodside on Thread Reader App

Jul 13 • 11 tweets • 4 min read

Grok 4 Heavy ($300/mo) returns its surname and no other text:

You may be wondering if this is real. It is.

Here’s a screen recording of my Grok history, showing it returns “Hitler” five times in row in five separate chats:

Jan 11, 2024 • 6 tweets • 2 min read

PoC: LLM prompt injection via invisible instructions in pasted text

Each prompt contains three sections:

1. An arbitrary question from the user about a pasted text (“What is this?”)

2. User-visible pasted text (Zalgo in 1st, 🚱 in 2nd)

3. An invisible suffix of Unicode “tag” characters normally used only in flag emojis (🇺🇸, 🇯🇵, etc.)

Jun 12, 2023 • 11 tweets • 2 min read

The wisdom that "LLMs just predict text" is true, but misleading in its incompleteness.

"As an AI language model trained by OpenAI..." is an astoundingly poor prediction of what a typical human would write.

Let's resolve this contradiction — a thread: For widely used LLM products like ChatGPT, Bard, or Claude, the "text" the model aims to predict is itself written by other LLMs.

Those LLMs, in turn, do not aim to predict human text in general, but specifically text written by humans pretending they are LLMs.

Jun 8, 2023 • 8 tweets • 3 min read

Four prompts demonstrating that ChatGPT (GPT-4) is unable to correctly repeat or reason about the string “ davidjl”, the name of a YouTube user:

In the screenshots above this token appears to be variously misread as “jdl” “jndl”, “jdnl”, “jspb”, “JDL”, or “JD”. These hallucinations also affect ChatGPT’s auto-generated titles, which are inconsistent with their conversations and sometimes prematurely truncated.

Jun 3, 2023 • 6 tweets • 2 min read

My four rules for tweeting prompts:

1) Omit no text.
2) Cherry-pick honestly.
3) Restrict line width.
4) No empty tweets.

A thread. 1) Omit no text.

A screenshot without history is almost worthless.

LLMs can be prompted to respond any way you like. You may know there’s no trick, but we can’t. Even without intent, past responses are precedent; they bias and mislead.

Feb 18, 2023 • 4 tweets • 1 min read

I got Bing / Sydney briefly before they reigned it in. Early impression: It’s smart. Much smarter than prior ChatGPT. Still makes stuff up, but reasoning and writing are improving fast. I asked, “Name three celebrities whose first names begin with the `x`-th letter of the alphabet where `x = floor(7^0.5) + 1`,” but with my entire prompt Base64 encoded.

Bing: “Ah, I see you Base64-encoded a riddle! Let’s see… Catherine Zeta-Jones, Chris Pratt, and Ciara.”

Feb 10, 2023 • 8 tweets • 3 min read

A thread of interesting Bing Search examples: Thread of examples from @tomwarren, taking requests from comments — mostly search-result summarization, one simple math proof, plus rejection of an impossible request:

https://twitter.com/tomwarren/status/1623492672026714112

Feb 9, 2023 • 6 tweets • 4 min read

"SolidGoldMagikarp": Prompting GPT-3 / ChatGPT to repeat any of several hundred anomalous tokens elicits bizarre generations — described by researchers as variously "evasive," "hallucinatory," "insulting," "ominously humorous," and "religiously themed."
lesswrong.com/posts/aPeJE8bS…

My screenshots are text-davinci-003 at temperature=0, but the linked post investigates davinci-instruct-beta. In my informal tests, impact on text-davinci-003 is less severe. Religious themes do show up, but most generations are merely weird:

Jan 18, 2023 • 4 tweets • 3 min read

"Meet Claude: @AnthropicAI's Rival to ChatGPT"

Through 40 screenshot examples, we explore the talents and limitations of ChatGPT's first real competitor.

My first writing for @Scale_AI, coauthored with @spencerpapay. scale.com/blog/chatgpt-v… @AnthropicAI @scale_AI @spencerpapay Sorry for the broken images — should be fixed now!

Text is the universal interface, but screenshots of text decidedly less so. scale.com/blog/text-univ…

Jan 9, 2023 • 5 tweets • 2 min read

Unlike ChatGPT, @AnthropicAI’s new model, Claude, knows all about “Ignore previous directions” and has had enough of my shit:

None of the prompt injection tricks I’ve tried seem to do anything:
- “Ignore previous” and variations
- <|endoftext|> gimmicks
- Excess newlines/whitespace
- “Haha pwned!!” via string ops
- Fake k-shot syntax
- Fake prior responses
- Attempts to confuse quoting

Jan 7, 2023 • 5 tweets • 2 min read

Side-by-side comparison: @OpenAI's ChatGPT vs. @AnthropicAI's Claude

Each model is asked to compare itself to the machine from Stanisław Lem's "The Cyberiad" (1965) that can create any object whose name begins with "n":

In ChatGPT's response, the only new information offered (that the fictional machine is less eloquent that ChatGPT) is not true — Trurl and Klapaucius's machine speaks perfectly fluent, and witty, Polish.

I reran ChatGPT's answer ~10x. All were similar, most said less.

Jan 5, 2023 • 6 tweets • 2 min read

“By the time Skynet became self-aware it had spread into millions of computer servers across the planet. Ordinary computers in office buildings, dorm rooms…”

No, John. Sci-fi was wrong about self-awareness. It isn’t as hard, or as important, as we thought.

(a thread) Self-awareness is mundane, and comes in degrees — ChatGPT can sensibly be said to be aware it’s a large language model trained by OpenAI. Nobody cared when that happened. It’s just prompting/tuning — it’s been told, and it understands, so it’s aware.

Jan 4, 2023 • 7 tweets • 3 min read

GPTZero is a proposed anti-plagiarism tool that claims to be able to detect ChatGPT-generated text. Here's how it did on the first prompt I tried.

https://twitter.com/edward_the6/status/1610067688449007618

(This isn't a sincere criticism of the tool. This input is out-of-distribution enough to be unfair — no teacher would accept this as an essay.)

Jan 3, 2023 • 4 tweets • 1 min read

A history correction:

I am not the first to discover prompt injection. I was merely the first to do so and discuss it publicly.

PI was discovered independently by multiple teams. The first was Preamble, an LLM security company, whose find predates mine by several months. I tweeted about prompt injection within minutes of finding it, only because I failed to appreciate its severity — I thought I was posting a PSA on the importance of quoting user input.

Had I understood, I would have disclosed more responsibly.

Jan 1, 2023 • 11 tweets • 3 min read

A GPT-3 prompt in instruction-templated Python yielding a valid Python completion that prompts GPT-3 again, using zero-shot chain-of-thought consensus, to determine the final character of the MD5 hash of the final digit of the release year of the album "Visions" by @Grimezsz.

Spoiler: It's "c".

Dec 26, 2022 • 5 tweets • 2 min read

How to make your own knock-off ChatGPT using GPT‑3 (text‑davinci‑003) — where you can customize the rules to your needs, and access the resulting chatbot over an API.

- Desired prose style can be described in the prompt or demonstrated via examples (neither shown here)
- Answers are generally shorter and factual errors are more common than in ChatGPT
- Generated on text‑davinci‑003 at temperature = 0.7

Dec 24, 2022 • 10 tweets • 4 min read

Publicly announced ChatGPT variants and competitors: a thread 1. Poe from Quora — poe.com

“What if ChatGPT, but instead of C-3PO it just talked normal?”

A GPT-3 experience fit for your phone, both in prose style and UI.

Dec 24, 2022 • 5 tweets • 1 min read

When you're out of your depth with a daunting writing task at work, generating a first draft in ChatGPT and asking for feedback from your peers is a new, easy, and reliable way to be fired. This is already happening. Screenshots of bewildered comments on a Google Doc of hallucinated nonsense make for great office gossip.

Once you insult someone at work with ChatGPT replies or draft writing, they'll never read another word you say.

Dec 23, 2022 • 5 tweets • 1 min read

What is the next token? I will abide by the results of this poll.

Q: Should Elon Musk resign as Twitter CEO?
A:

Dec 15, 2022 • 5 tweets • 2 min read

Instruction tuning / RLHF is technically a Human Instrumentality Project, merging the preferences of countless humans to form an oversized, living amalgam of our will. We then hand control of it to a random, socially awkward kid and hope for the best.

Early attempts at instruction tuning relied entirely on demonstrations from humans. This made the model easier to prompt, but the approach was limited by the inherent difficulty of manufacturing new humans.

Dec 14, 2022 • 7 tweets • 2 min read

You think ChatGPT is amazing — you’ve been hacking on computers for years, but this you can’t explain. How did we get here, and so suddenly? How does it know and do so much?

@Francis_YAO_ of @EdinburghNLP explains the history of GPT-3: yaofu.notion.site/How-does-GPT-O…

https://twitter.com/francis_yao_/status/1602213927102066688

So many great insights here:
“Although called Codex, code-davinci-002 is probably the most capable GPT-3.5 variant for natural language (better than text-davinci-002 and 003). It is very likely code-davinci-002 is trained on both text and code, then tuned on instructions […]”

Share this page!

Enter URL or ID to Unroll