Claim: gpt-5-pro can prove new interesting mathematics.
Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.
Details below.
The paper in question is this one which studies the following very natural question: in smooth convex optimization, under what conditions on the stepsize eta in gradient descent will the curve traced by the function value of the iterates be convex?arxiv.org/pdf/2503.10138…
In the v1 of the paper they prove that if eta is smaller than 1/L (L is the smoothness) then one gets this property, and if eta is larger than 1.75/L then they construct a counterexample. So the open problem was: what happens in the range [1/L, 1.75/L].
As you can see in the top post, gpt-5-pro was able to improve the bound from this paper and showed that in fact eta can be taken to be as large as 1.5/L, so not quite fully closing the gap but making good progress. Def. a novel contribution that'd be worthy of a nice arxiv note.
Now the only reason why I won't post this as an arxiv note, is that the humans actually beat gpt-5 to the punch :-). Namely the arxiv paper has a v2 with an additional author and they closed the gap completely, showing that 1.75/L is the tight bound.arxiv.org/pdf/2503.10138…
By the way this is the proof it came up with:
And yeah the fact that it proves 1.5/L and not the 1.75/L also shows it didn't just search for the v2. Also the above proof is very different from the v2 proof, it's more of an evolution of the v1 proof.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner!
Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer!
Indeed it hits all the right things: the connection with self-contracted curves, the dimension-dependent bound, and even cites a relevant paper!
This is not cherry picked at all and literally the first query I made. Here is the second query in a completely different topic:
Interestingly though the reference it gives “Bubeck and Ganguly” is not quite the correct one, but it is very closely related! In general I have found that references are "fuzzily correct", giving some mixed up version of authors/journals/titles, but surprisingly still useful!
The @TEDTalks by @YejinChoinka is both insightful & beautifully delivered! Totally agree with her that GPT-4 is simultaneously brilliant and incredibly stupid. Yejin gives 3 examples of common sense failing that are worth examining a bit more closely. 1/5
First example is shocking, GPT-4 does not seem to understand that one can dry clothes in parallel?!? However, w. a more open-ended formulation ("what would happen if") one gets a great answer! This example shows that GPT-4 remains quite brittle with short questions like this. 2/5
Second example is particularly interesting to me because it doesn't match at all my experience with GPT-4. What's going on? Turns out there is a simple explanation: this is pure pattern matching in action! GPT-4 memorized from online sources like this one: mathsisfun.com/puzzles/measur…
Posts can be found here iclr-blog-track.github.io/blog/. They fulfill original promise: they add/replicate experiments, they illuminate prior work w. a different theoretical framework, they add new insights,.. We strongly believe there is room for such a track in all ML conferences. 2/24
Our first post is an invited post by @karpathy on @ylecun's revolutionary 1989 paper on convolutional neural networks. This post illustrates perfectly what a blogpost track can do for a conference, which includes highlighting influential work! 3/24 iclr-blog-track.github.io/2022/03/26/lec…
New video! Probably best described as "a motivational speech to study deep learning mathematically" :-).
The ever so slightly more formal title is "Mathematical theory of deep learning: Can we do it? Should we do it?"
1/3
Context for this talk was an NSF Town Hall with goal to discuss successes of deep learning especially, in light of more traditional fields. Other talks by @tomgoldsteincs@joanbruna@ukmlv, Yuejie Chi, Guy Bresler, Rina Foygel Barber at this link: players.brightcove.net/679256133001/N…
2/3
But honestly I think my talk might as well have been just the title slide followed by the wise words in this video:
We may have found a solid hypothesis to explain why extreme overparametrization is so helpful in #DeepLearning, especially if one is concerned about adversarial robustness. arxiv.org/abs/2105.12806 1/7
With my student extraordinaire Mark Sellke @geoishard, we prove a vast generalization of our conjectured law of robustness from last summer, that there is an inherent tradeoff between # neurons and smoothness of the network (see *pre-solution* video). 2/7
If you squint hard enough (eg, like a physicist) our new universal law of robustness even makes concrete predictions for real data. For ex. we predict that on ImageNet you need at least 100 billion parameters (i.e., GPT-3-like scale) to possibly attain good robust guarantees. 3/7
Interesting thread! To me the ``reason" for CLT is simply high-dim geometry. Consider unit ball in dim n+1 & slice it at distance x from the origin to get a dim n ball of radius (1-x^2)^{1/2}. The volume of the slice is prop to (1-x^2)^{n/2}~exp(-(1/2)n x^2). Tada the Gaussian!!
In other words, for a random point in the ball, the marginal in any direction will converge to a Gaussian (one line calc!). Maybe this doesn't look like your usual CLT. But consider Bernoulli CLT: 1/sqrt(n) sum_i X_i = <X, u>, with X random in {-1,1}^n & u=1/sqrt(n)*(1,..,1).
That is, the Bernoulli CLT is just about the marginal in the direction u of a random point in the hypercube! So instead of geometry of the ball as in first tweet, we need to consider geometry of the cube. But it turns out that all geometries are roughly the same!