I've recently got in on the act of getting AI to solve open problems in mathematics. More precisely, I gave some questions asked by Melvyn Nathanson to ChatGPT 5.5 Pro, to which I have been given access, and it answered them. 🧵
I write about this in more detail in a blog post with a guest contribution from Isaac Rajagopal, a student at MIT on whose work ChatGPT built, who gives his assessment of the level of mathematical ability displayed by the model. gowers.wordpress.com/2026/05/08/a-r…
But the tl;dr version is that the model proved a result that in my assessment would have made a perfectly reasonable chapter in a PhD thesis. It did this in a total of a couple of hours, with a few prompts from me that contained no mathematical input whatsoever.
All I did was say things like, "Yes, it would be great if you could explore that idea and see whether you can get it to work," or "Could you rewrite that argument as a LaTeX file in the style of a standard mathematical preprint?"
Of course, this raises all sorts of questions about what is going to happen to mathematical research, with the impact on PhD students being particularly urgent. I give a few thoughts on this in the blog post, but I don't have anything like complete answers.
But if AI mathematics continues to progress at anything like its current rate -- which is what I expect to happen -- then we will face a crisis very soon, and mathematics departments, who owe a duty of care to their students, should be urgently preparing for it.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
An exciting result has just appeared on arXiv, concerning the following simple-seeming problem: if A is a set of n positive integers, then how large a sum-free subset B must it contain? That means that if x, y and z belong to B, then x + y should not equal z. 🧵
A beautiful argument of Erdős shows that you can get n/3. To do so, you observe that if x + y = z, then rx + ry = rz modulo m for any positive integers r and m. So you pick some large prime p and a random r between 1 and p-1, and you note that on average ...
for a third of the elements x of A we have that rx lies between p/3 and 2p/3 mod p. Taking B to be the set of all such x from A gives us a sum-free subset, and its average size is n/3, so it must at least sometimes have that size.
Occasionally in mathematics a statement that just has to be true turns out to be false. A paper appeared on arXiv today that disproves a well-known conjecture in probability called the bunkbed conjecture. 🧵
Here's what it says. You start with a connected graph G, which you think of as the bottom "bunk". You then take a copy G' of G, which is the top bunk. For each vertex x in G, let x' be the corresponding vertex of G'.
Obviously the top bunk can't just lie on top of the bottom bunk or it would be impossible to sleep in the latter, so we now choose some of the vertices x of G and join them to the corresponding vertices x' of G'. We call these connecting edges posts.
I've just tried out a maths problem on ChatGPT4o on which it failed, with a failure mode interestingly similar to its well-known failure on problems such as "I have a goat and a sheep and a large boat. How many crossings do I need to get them both across the river?" 1/5
I asked it, "I have a 7x7 grid with one corner removed. Can I tile it with 3x1 rectangles?" It is not hard to see that the answer is yes. But ChatGPT4o told me no, since there will be an unequal number of black and white squares. 2/5
First I pointed out that that didn't matter, since a 3x1 rectangle does not cover an equal number of black and white squares. It conceded that point but still claimed (with some bogus reasoning) that it was impossible to create the necessary imbalance. 3/5
Google DeepMind have produced a program that in a certain sense has achieved a silver-medal peformance at this year's International Mathematical Olympiad. 🧵
It did this by solving four of the six problems completely, which got it 28 points out of a possible total of 42. I'm not quite sure, but I think that put it ahead of all but around 60 competitors.
However, that statement needs a bit of qualifying.
The main qualification is that the program needed a lot longer than the human competitors -- for some of the problems over 60 hours -- and of course much faster processing speed than the poor old human brain.
I'm very sad to hear that Daniel Dennett has died. I greatly enjoyed his books Consciousness Explained and Elbow Room, and I hope I won't annoy too many people if I express my opinion that what he said in those books was basically right. 1/
For instance, I agree with him that computers could in principle be conscious (but would be very cautious about making such a claim of an actual computer), and also that free will can be reasonably defined in a way that makes it entirely compatible with determinism. 2/
I briefly met him once, after lunch at Trinity where he had been a guest of Simon Blackburn. IIRC he had been giving a talk on why religion exists: he politely listened to my suggestion that confusing correlation with causation might have had a lot to do with it. 3/
Today I start my seventh decade, so here are a few reflections on what it's like to reach that milestone. 🧵
1. I've had an extremely fortunate life so far. Of course, nobody reaches the age of 60 without some bad things happening, but I've had a lot less bad than my fair share and a lot more good.
2. With each passing decade I get that much more aware of the finiteness of my life, but turning 60 is a big step up in that respect from turning 50. I have people basically my age talking about retirement, for example.