Latest Twitter Threads by @SebastienBubeck on Thread Reader App

Aug 20 • 7 tweets • 2 min read

Claim: gpt-5-pro can prove new interesting mathematics.

Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.

Details below.

The paper in question is this one which studies the following very natural question: in smooth convex optimization, under what conditions on the stepsize eta in gradient descent will the curve traced by the function value of the iterates be convex?arxiv.org/pdf/2503.10138…

Jan 31 • 6 tweets • 2 min read

o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner!

Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer!

Indeed it hits all the right things: the connection with self-contracted curves, the dimension-dependent bound, and even cites a relevant paper!

This is not cherry picked at all and literally the first query I made. Here is the second query in a completely different topic:

Apr 30, 2023 • 5 tweets • 3 min read

The @TEDTalks by @YejinChoinka is both insightful & beautifully delivered! Totally agree with her that GPT-4 is simultaneously brilliant and incredibly stupid. Yejin gives 3 examples of common sense failing that are worth examining a bit more closely. 1/5 First example is shocking, GPT-4 does not seem to understand that one can dry clothes in parallel?!? However, w. a more open-ended formulation ("what would happen if") one gets a great answer! This example shows that GPT-4 remains quite brittle with short questions like this. 2/5

Apr 20, 2022 • 24 tweets • 9 min read

We (@gauthier_gidel @velythyl @busycalibrating @vernadec & myself) would like to announce the accepted blog posts to @iclr_conf's 1st Blogpost Track. Experiment was a great success with 20 accepted posts out of 61 submissions, roughly the size of the 1st @iclr_conf itself! 1/24 Posts can be found here iclr-blog-track.github.io/blog/. They fulfill original promise: they add/replicate experiments, they illuminate prior work w. a different theoretical framework, they add new insights,.. We strongly believe there is room for such a track in all ML conferences. 2/24

Jan 20, 2022 • 4 tweets • 4 min read

New video! Probably best described as "a motivational speech to study deep learning mathematically" :-).

The ever so slightly more formal title is "Mathematical theory of deep learning: Can we do it? Should we do it?"

1/3

Context for this talk was an NSF Town Hall with goal to discuss successes of deep learning especially, in light of more traditional fields. Other talks by @tomgoldsteincs @joanbruna @ukmlv, Yuejie Chi, Guy Bresler, Rina Foygel Barber at this link:
players.brightcove.net/679256133001/N…

2/3

Jun 9, 2021 • 7 tweets • 3 min read

We may have found a solid hypothesis to explain why extreme overparametrization is so helpful in #DeepLearning, especially if one is concerned about adversarial robustness. arxiv.org/abs/2105.12806
1/7

With my student extraordinaire Mark Sellke @geoishard, we prove a vast generalization of our conjectured law of robustness from last summer, that there is an inherent tradeoff between # neurons and smoothness of the network (see *pre-solution* video). 2/7

Jan 26, 2021 • 5 tweets • 2 min read

Interesting thread! To me the ``reason" for CLT is simply high-dim geometry. Consider unit ball in dim n+1 & slice it at distance x from the origin to get a dim n ball of radius (1-x^2)^{1/2}. The volume of the slice is prop to (1-x^2)^{n/2}~exp(-(1/2)n x^2). Tada the Gaussian!!

https://twitter.com/shoyer/status/1353021554959872001

In other words, for a random point in the ball, the marginal in any direction will converge to a Gaussian (one line calc!). Maybe this doesn't look like your usual CLT. But consider Bernoulli CLT: 1/sqrt(n) sum_i X_i = <X, u>, with X random in {-1,1}^n & u=1/sqrt(n)*(1,..,1).

Share this page!

Enter URL or ID to Unroll