Post

More from @SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

Jan 31

o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner!

Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer!

Indeed it hits all the right things: the connection with self-contracted curves, the dimension-dependent bound, and even cites a relevant paper!

This is not cherry picked at all and literally the first query I made. Here is the second query in a completely different topic:

Interestingly though the reference it gives “Bubeck and Ganguly” is not quite the correct one, but it is very closely related! In general I have found that references are "fuzzily correct", giving some mixed up version of authors/journals/titles, but surprisingly still useful!

Read 6 tweets

Sebastien Bubeck

@SebastienBubeck

Apr 30, 2023

@TEDTalks

The @TEDTalks by @YejinChoinka is both insightful & beautifully delivered! Totally agree with her that GPT-4 is simultaneously brilliant and incredibly stupid. Yejin gives 3 examples of common sense failing that are worth examining a bit more closely. 1/5

First example is shocking, GPT-4 does not seem to understand that one can dry clothes in parallel?!? However, w. a more open-ended formulation ("what would happen if") one gets a great answer! This example shows that GPT-4 remains quite brittle with short questions like this. 2/5

Second example is particularly interesting to me because it doesn't match at all my experience with GPT-4. What's going on? Turns out there is a simple explanation: this is pure pattern matching in action! GPT-4 memorized from online sources like this one: mathsisfun.com/puzzles/measur…

Read 5 tweets

Sebastien Bubeck

@SebastienBubeck

Apr 20, 2022

@gauthier_gidel

We (@gauthier_gidel @velythyl @busycalibrating @vernadec & myself) would like to announce the accepted blog posts to @iclr_conf's 1st Blogpost Track. Experiment was a great success with 20 accepted posts out of 61 submissions, roughly the size of the 1st @iclr_conf itself! 1/24

Posts can be found here iclr-blog-track.github.io/blog/. They fulfill original promise: they add/replicate experiments, they illuminate prior work w. a different theoretical framework, they add new insights,.. We strongly believe there is room for such a track in all ML conferences. 2/24

@karpathy

Our first post is an invited post by @karpathy on @ylecun's revolutionary 1989 paper on convolutional neural networks. This post illustrates perfectly what a blogpost track can do for a conference, which includes highlighting influential work! 3/24
iclr-blog-track.github.io/2022/03/26/lec…

Read 24 tweets

Sebastien Bubeck

@SebastienBubeck

Jan 20, 2022

New video! Probably best described as "a motivational speech to study deep learning mathematically" :-).

The ever so slightly more formal title is "Mathematical theory of deep learning: Can we do it? Should we do it?"

1/3

@tomgoldsteincs

Context for this talk was an NSF Town Hall with goal to discuss successes of deep learning especially, in light of more traditional fields. Other talks by @tomgoldsteincs @joanbruna @ukmlv, Yuejie Chi, Guy Bresler, Rina Foygel Barber at this link:
players.brightcove.net/679256133001/N…

2/3

But honestly I think my talk might as well have been just the title slide followed by the wise words in this video:

"Deep Learning Theory, Just Do It" 🤣

3/3

Read 4 tweets

Sebastien Bubeck

@SebastienBubeck

Jun 9, 2021

We may have found a solid hypothesis to explain why extreme overparametrization is so helpful in #DeepLearning, especially if one is concerned about adversarial robustness. arxiv.org/abs/2105.12806
1/7

@geoishard

With my student extraordinaire Mark Sellke @geoishard, we prove a vast generalization of our conjectured law of robustness from last summer, that there is an inherent tradeoff between # neurons and smoothness of the network (see *pre-solution* video). 2/7

If you squint hard enough (eg, like a physicist) our new universal law of robustness even makes concrete predictions for real data. For ex. we predict that on ImageNet you need at least 100 billion parameters (i.e., GPT-3-like scale) to possibly attain good robust guarantees. 3/7

Read 7 tweets

Sebastien Bubeck

@SebastienBubeck

Jan 26, 2021

https://twitter.com/shoyer/status/1353021554959872001

Interesting thread! To me the ``reason" for CLT is simply high-dim geometry. Consider unit ball in dim n+1 & slice it at distance x from the origin to get a dim n ball of radius (1-x^2)^{1/2}. The volume of the slice is prop to (1-x^2)^{n/2}~exp(-(1/2)n x^2). Tada the Gaussian!!

https://twitter.com/shoyer/status/1353021554959872001

In other words, for a random point in the ball, the marginal in any direction will converge to a Gaussian (one line calc!). Maybe this doesn't look like your usual CLT. But consider Bernoulli CLT: 1/sqrt(n) sum_i X_i = <X, u>, with X random in {-1,1}^n & u=1/sqrt(n)*(1,..,1).

That is, the Bernoulli CLT is just about the marginal in the direction u of a random point in the hypercube! So instead of geometry of the ball as in first tweet, we need to consider geometry of the cube. But it turns out that all geometries are roughly the same!

Read 5 tweets

Share this page!

Enter URL or ID to Unroll

Sebastien Bubeck

Try unrolling a thread yourself!

More from @SebastienBubeck

Sebastien Bubeck

Sebastien Bubeck

Sebastien Bubeck

Sebastien Bubeck

Sebastien Bubeck

Sebastien Bubeck

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!