Damek Profile picture
Optimization and Machine Learning Assoc Prof @Wharton stats https://t.co/bfOIEx0lHj (not quite a) blog: https://t.co/RFKUB4qDKF
Jul 22 10 tweets 2 min read
Update:

On the first try of using chatgpt agent, it surfaced a reference from 2015 which contained a proof of the result in question.

The paper was on an unrelated topic. It was hidden in the appendix and was missing many relevant keywords. The bound had also been stated but not proven in a 1994 paper. The 2015 paper gave a decently simple self-contained proof.

This question had been asked to a few prominent people in this field. They didn't know, so it wasn't well known. I'm sure they could have figured it out.
Jul 14 4 tweets 1 min read
Update: I didn't solve the problem

Over the last 6 months, I've tried pretty hard to use LLMs to solve a (I think) not too difficult problem in a nearby field (learning theory).

The answers have gone from terrible to plausible, and I actually understand the problem better now. I now have references that are highly relevant and several pathways that might work.

Unfortunately, every argument proposed has had (in retrospect) a fairly obvious hole in it.
Dec 11, 2023 14 tweets 6 min read
Next formalization is the "gradient inequality" for smooth convex functions.

The proof (next tweet) is a few sentences of English.

My lean4 proof (below) is roughly 350 lines of code.

1/n Image The proof in English is short. It is a consequence of the simple convex analysis fact:

If f is a convex function such that f(0) = 0 and f(y)>= o(y), then f(y) \geq 0 globally. Image
Dec 2, 2023 9 tweets 2 min read
I proved a super simple helper lemma.
In English the proof is 1 line.
In lean4, it took me around 50 lines of code.
1/n Image The proof took me around 5 hours (lol).
Any help shortening would be appreciated.

2/n Image
Aug 29, 2023 4 tweets 2 min read
Cool technique. Reminds me of the the Cramer-Chernoff method, which applies a Markov type inequality and optimizes to find the "best" parameter....
Image Carrying out the optimization bounds the tail deviation as the exponential of the Fenchel conjugate of the log moment generating function of the r.v. Image