Sebastien Bubeck Profile picture
Interested in AGI (!), traditional ML, old-school TCS, classical OR, and venerable mathematical topics. Lead ML Foundations team at Microsoft Research.
Apr 30, 2023 5 tweets 3 min read
The @TEDTalks by @YejinChoinka is both insightful & beautifully delivered! Totally agree with her that GPT-4 is simultaneously brilliant and incredibly stupid. Yejin gives 3 examples of common sense failing that are worth examining a bit more closely. 1/5 First example is shocking, GPT-4 does not seem to understand that one can dry clothes in parallel?!? However, w. a more open-ended formulation ("what would happen if") one gets a great answer! This example shows that GPT-4 remains quite brittle with short questions like this. 2/5 ImageImage
Apr 20, 2022 24 tweets 9 min read
We (@gauthier_gidel @velythyl @busycalibrating @vernadec & myself) would like to announce the accepted blog posts to @iclr_conf's 1st Blogpost Track. Experiment was a great success with 20 accepted posts out of 61 submissions, roughly the size of the 1st @iclr_conf itself! 1/24 Posts can be found here iclr-blog-track.github.io/blog/. They fulfill original promise: they add/replicate experiments, they illuminate prior work w. a different theoretical framework, they add new insights,.. We strongly believe there is room for such a track in all ML conferences. 2/24
Jan 20, 2022 4 tweets 4 min read
New video! Probably best described as "a motivational speech to study deep learning mathematically" :-).

The ever so slightly more formal title is "Mathematical theory of deep learning: Can we do it? Should we do it?"

1/3

Context for this talk was an NSF Town Hall with goal to discuss successes of deep learning especially, in light of more traditional fields. Other talks by @tomgoldsteincs @joanbruna @ukmlv, Yuejie Chi, Guy Bresler, Rina Foygel Barber at this link:
players.brightcove.net/679256133001/N…

2/3
Jun 9, 2021 7 tweets 3 min read
We may have found a solid hypothesis to explain why extreme overparametrization is so helpful in #DeepLearning, especially if one is concerned about adversarial robustness. arxiv.org/abs/2105.12806
1/7 With my student extraordinaire Mark Sellke @geoishard, we prove a vast generalization of our conjectured law of robustness from last summer, that there is an inherent tradeoff between # neurons and smoothness of the network (see *pre-solution* video). 2/7
Jan 26, 2021 5 tweets 2 min read
Interesting thread! To me the ``reason" for CLT is simply high-dim geometry. Consider unit ball in dim n+1 & slice it at distance x from the origin to get a dim n ball of radius (1-x^2)^{1/2}. The volume of the slice is prop to (1-x^2)^{n/2}~exp(-(1/2)n x^2). Tada the Gaussian!! In other words, for a random point in the ball, the marginal in any direction will converge to a Gaussian (one line calc!). Maybe this doesn't look like your usual CLT. But consider Bernoulli CLT: 1/sqrt(n) sum_i X_i = <X, u>, with X random in {-1,1}^n & u=1/sqrt(n)*(1,..,1).