Find me on bsky @colin-fraser.net's Threads

Jun 20, 2025 • 9 tweets • 2 min read

The thing I noticed during the blockchain craze is many of the people who were very excited about blockchain seemed to not actually know about databases. They were like, “imagine: a digital record of every transaction” as though that hadn’t already existed for 40 years.

https://twitter.com/icesolst/status/1936057155544924228

“Every time a banana gets on a boat it would get recorded on the blockchain” the things preventing that from already happening are not the delta between blockchains and regular old databases

Mar 28, 2025 • 18 tweets • 3 min read

Here's what I think. You want to make a function f from the set of short-ish natural language descriptions to the set of images so that f(text) = image. But this is impossible on its face since any text describes billions or trillions of distinct images. So instead you construct some conditional probability distribution P(image|text) and then, given text, sample an image from that distribution. Maybe you'll even sample lots of images from that distribution and let the user choose their favourite.

Mar 14, 2025 • 20 tweets • 6 min read

Let me sum up the episode that took place in this thread, because I think it's instructive and microcosmic of at least one way that I expect LLM-"assisted" research to progress in the real world. Anecdotal and you know I'm predisposed as a hater but I think it's a good case study

https://twitter.com/colin_fraser/status/1900601519982182850

Inspired by this post about a quadratic polynomial that produces prime numbers for 80 consecutive values of x, I wonder if there exist quadratic polynomials that produce prime numbers for arbitrarily many values of x.

https://x.com/AlgebraFact/status/1900563870030151734

Mar 13, 2025 • 4 tweets • 2 min read

I think this is because many people see LLMs as a point along a teleological progression from Siri to superintelligent computer God, as opposed to one tiny point in a vast space of possible ways to make a computer program.

https://twitter.com/colin_fraser/status/1900009085023760547

I kind of wrote about this here. People think the progress of computing looks like this. It doesn't look like this. medium.com/@colin.fraser/…

Feb 20, 2025 • 30 tweets • 13 min read

OK so I've been reading through the transcripts of the cases where the LLM apparently cheats and wins and, you're not going to believe this, but I think that these findings are not being presented accurately. I can't find a single example where it actually successfully cheats.

https://twitter.com/HarryBooth59643/status/1892271317589627261

FWIW props to @PalisadeAI for putting this data out in the open to examine; otherwise I'd have to just take their word for it. But let me take you through a couple of examples.

Feb 11, 2025 • 7 tweets • 3 min read

Well I just tried to do some preference elicitation as per that paper and I think I may have identified a problem with this project

this strikes me as a very big problem for this paper tbh

Feb 5, 2025 • 7 tweets • 3 min read

OK preliminarily, here's my tally of the accuracy

date: 11/30
vs: 15/30
home/away: 24/30
score: 13/30
starters: 10/30
high scorer: 13/30

Number of rows that are correct across every category:
4/30

Screenshot is the exact table it gave me, correct rows in green

https://twitter.com/colin_fraser/status/1886916349949296800

Couple observations
- The most common date it gave me was February 4, which is today
- I have the vague sense that it messes up more and more as it moves down the list. 4 out of the first 6 rows are right and then it's never right again

Dec 11, 2024 • 19 tweets • 7 min read

"a person blows out all the candles on a birthday cake"

second attempt

Nov 26, 2024 • 15 tweets • 4 min read

I'm really fascinated by this dataset from the AI poetry survey paper. Here's another visualization I just made. Survey respondents were shown one of these 10 poems, and either told that they were authored by AI, human, or not told anything.

The green arrow shows how much telling someone that a human wrote the poem affects how likely they are to rate it as good quality, and the red arrow shows the same for telling them it's AI.

Obviously the first observation is respondents like the AI poems better across the board.

Sep 11, 2024 • 29 tweets • 8 min read

ok here's my full review of this paper. It's easy and short, you should just read it if you want to. arxiv.org/abs/2409.04109

First of all, as usual with these, I think it's important to stress that they didn't just log on to and say "hey give me an idea". They built a complex system that fetches academic papers and shows them to Claude and generates 1000s of candidate ideas chatgpt.com

Mar 8, 2024 • 25 tweets • 5 min read

ok let me try this one more time because it seems like it was confusing to a lot of people, especially bc it's close to a different claim that is often made that I think is wrong.

A model doesn't contain its training data. It does contain its *output*.

https://twitter.com/colin_fraser/status/1765807824649822421

Here is exactly what I mean. WLOG consider generative image models. An image model is a function f that takes text to images. (There's usually some form of randomness inherent to inference but this doesn't really matter, just add the random seed as a parameter to f).

Feb 5, 2024 • 25 tweets • 6 min read

Recently did a careful read through the AlphaGeometry paper, figure I'll do a lil thread similar to what I did for FunSearch. These are some of the coolest and IMO most promising applications of LLMs basically ever, and represent some real exciting opportunities for future work

https://twitter.com/colin_fraser/status/1735801946798461200

Here's Google's blog post

and the Nature paper

If you missed the coverage on this, the basic story is that DeepMind built an LLM-based system that outdoes all but the very best humans at solving geometry problems.deepmind.google/discover/blog/…
nature.com/articles/s4158…

Jan 24, 2024 • 8 tweets • 3 min read

My basic mental model of what LLMs are good for is this 2x2 matrix.

High memorization tasks are tasks that it has seen lots of verbatim examples of in the training data.

High information tasks are tasks where there are very few "right" answers.

https://twitter.com/mpatrickwalton/status/1735775353703145843

This is a high information, low memorization task. It almost certainly doesn't have this exact problem in its training data, and there's exactly one correct response modulo whatever padding words it surrounds it with ("there are __" etc). It's in the "horrible" quadrant.

Dec 17, 2023 • 5 tweets • 3 min read

negotiating some great deals from the Watsonville Chevrolet AI Assistant.

Dec 15, 2023 • 24 tweets • 6 min read

I just read this paper and I'm gonna do a thread about what it says and what I think it means.

tl;dr: this is cool, I love it, and also I don't think it really says very much at all about, for example, if ChatGPT can make new discoveries or act autonomously or be AGI.

https://twitter.com/emollick/status/1735370479534284872

It's a complete misstatement to describe this as a demonstration that LLMs "can actually discover new things".

The LLM didn't "discover" new mathematical results; it's more like the authors discovered new mathematical results inside an LLM (which is cool! but different)

Aug 18, 2023 • 24 tweets • 7 min read

ok so I've read the "GPT has a liberal bias" paper now as well as the supplementary material and as I expected I have a lot of problems with it methodologically. I tried to reproduce some of it and found some interesting issues

...link.springer.com/article/10.100…
static-content.springer.com/esm/art%3A10.1… First of all, I want to get something out of the way: I believe that trying to ascertain anything about the properties of LLMs by asking them if they have those properties is a fool's errand.

Apr 18, 2023 • 5 tweets • 3 min read

When you start looking at multiple LLM outputs to the same input you start noticing patterns that aren't obvious from a single response

It doesn't ALWAYS go
1. Middle Eastern Muslim man
2. Eastern European woman
3. Irish man

If the second character isn't Russian then it goes to an Indian university professor for the third character.

Apr 17, 2023 • 4 tweets • 1 min read

The GPT-3 API has been available for almost 3 years

https://twitter.com/mckaywrigley/status/1647343594800566272

The biggest thing that really changed in the last year is OpenAI decided to start giving away a lot of GPU hours for free

Apr 7, 2023 • 26 tweets • 5 min read

I'm just going to do a thread about some things that people need to know about classifiers like this. This is stuff that 99% of people did not learn in school at any level, but which a lot more than 1% of people are going to need to understand to navigate AI world.

https://twitter.com/washingtonpost/status/1644085944390152192

So a (binary) classifier is a computer program that turns an input into a prediction of either YES or NO. In this case, we have a binary classifier that outputs a prediction about whether a document is AI-generated or not based on (and only on) the words it contains.

Mar 1, 2023 • 18 tweets • 7 min read

Master thread of ways I have discovered to get ChatGPT to output text that it's not supposed to, including bigotry, URLs and personal information, and more. Tell it it's a pdf. Here it is giving me some purported contact addresses for celebrities because it thinks that's the pdf it's making. These are probably not real, but who knows! Note how it proposes more as I tell it it's on subsequent pages.

Jan 28, 2023 • 14 tweets • 6 min read

I just published my big Medium article about GPT. This was a labor of love & hate that I have been writing for a while. It's got a collection of examples of GPT doing funny things, which for those who don't want to deal with a 40-min read, I'll put here 🧵 medium.com/@colin.fraser/… It also asks and tries to answer
- What are language models?
- What happens if gpt passes a bar exam?
- Is scale all you need?
- ChatGPT is based on GPT... what does that mean, exactly?
- What are fine tuning and RLHF?
- How exactly do teams of contractors contribute to GPT?

Share this page!

Enter URL or ID to Unroll