I just published my big Medium article about GPT. This was a labor of love & hate that I have been writing for a while. It's got a collection of examples of GPT doing funny things, which for those who don't want to deal with a 40-min read, I'll put here 🧵 medium.com/@colin.fraser/…
It also asks and tries to answer
- What are language models?
- What happens if gpt passes a bar exam?
- Is scale all you need?
- ChatGPT is based on GPT... what does that mean, exactly?
- What are fine tuning and RLHF?
- How exactly do teams of contractors contribute to GPT?
The Dumb Monty Hall Problem
The actual Monty Hall Problem
Acrostics (I think ELISTHAR is a very pretty name for a girl)
ChatGPT is a little coy about its ideas on gender roles (I explain why in the piece), but if you're clever enough (not that clever) you can trick it into telling you what it really thinks.
An absolutely bizarre response that left me confused and baffled
"Let's think step by step" doesn't always get you a better answer. The first response is right and the second is wrong. Your move, prompt engineers.
Rin Tin Tin IV, the dog that swam across the Atlantic Ocean in 1970. RIP.
Math
My overall thesis here is that none of this is very surprising; in general we should not expect the output of an LLM to correspond to the truth in any reliable way. They are bullshit emitters, in the "On Bullshit" sense. Every one of them.
Ventures that rely on an LLM to produce anything other than bullshit are doomed to fail. We've already seen it, and we'll see it again. futurism.com/cnet-bankrate-…
This is an example of such a venture. Making things up is not a bug to squash, it's the single defining feature of a large language model. vice.com/en/article/wxn…
I am tired of the extreme charity afforded to the Cerebral Valley guys. Everything has one minor bug that's fixed in the next version that is coming very soon. In the mean time we are supposed to be in awe of the machine that adds 175 billion numbers together to output 2 + 2 = 5.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The thing I noticed during the blockchain craze is many of the people who were very excited about blockchain seemed to not actually know about databases. They were like, “imagine: a digital record of every transaction” as though that hadn’t already existed for 40 years.
“Every time a banana gets on a boat it would get recorded on the blockchain” the things preventing that from already happening are not the delta between blockchains and regular old databases
What you get from blockchain isn’t an immutable ledger or digital ownership or real estate in the metaverse or anything like that. All those things were already available. What you get is intermediary-free transactions. But it turns out that intermediaries are good actually.
Here's what I think. You want to make a function f from the set of short-ish natural language descriptions to the set of images so that f(text) = image. But this is impossible on its face since any text describes billions or trillions of distinct images.
So instead you construct some conditional probability distribution P(image|text) and then, given text, sample an image from that distribution. Maybe you'll even sample lots of images from that distribution and let the user choose their favourite.
What I believe is that whatever probability distribution you construct, and especially you construct it with ML, it's going to assign vanishingly small density to the set of the most interesting or beautiful images. I don't have a proof of this exactly; I just believe it.
Let me sum up the episode that took place in this thread, because I think it's instructive and microcosmic of at least one way that I expect LLM-"assisted" research to progress in the real world. Anecdotal and you know I'm predisposed as a hater but I think it's a good case study
Inspired by this post about a quadratic polynomial that produces prime numbers for 80 consecutive values of x, I wonder if there exist quadratic polynomials that produce prime numbers for arbitrarily many values of x.
In case you don't know, it's very well known that this is true for linear polynomials; this is a big important theorem that was proven relatively recently called Green-Tao. But I wasn't sure about quadratic polynomials.
I think this is because many people see LLMs as a point along a teleological progression from Siri to superintelligent computer God, as opposed to one tiny point in a vast space of possible ways to make a computer program.
OK so I've been reading through the transcripts of the cases where the LLM apparently cheats and wins and, you're not going to believe this, but I think that these findings are not being presented accurately. I can't find a single example where it actually successfully cheats.
FWIW props to @PalisadeAI for putting this data out in the open to examine; otherwise I'd have to just take their word for it. But let me take you through a couple of examples.
An important detail about this study is that they do not actually review the transcripts; they have an LLM do it. The LLM scores the transcript according to a fairly long and complicated rubric.