Damek Profile picture
Dec 2, 2023 9 tweets 2 min read Read on X
I proved a super simple helper lemma.
In English the proof is 1 line.
In lean4, it took me around 50 lines of code.
1/n Image
The proof took me around 5 hours (lol).
Any help shortening would be appreciated.

2/n Image
The trickiest things:
1. Can't take basic math -- e.g., commutativity, automatic casting of naturals as reals -- for granted.
2. Tactics feel simultaneously powerful and weak. E.g., sometimes 'linarith' easily proves inequalities, sometimes not.
3. GPT4 is not v helpful
3/n
Next up for me is figuring out support for convexity, differentiability, and the fundamental theorem of calculus. I'm also about 7 chapters in to the lean4 tutorial. I'll continue working through that in parallel.

I'm posting my code to a github repo, which i'll link below.
4/n
You can find the code above at the following link:

github.com/damek/gd-lean/…
Whoops forgot an assumption in the theorem statement above: The sequence a_k should be nonincreasing.
A ~30 line streamlined proof closer to my envisioned 1 line proof in English.
6/n
Terence Tao kindly sent me the following 3 line lean proof, which is exactly the proof I wanted to write! Image
Terry’s blog on writing the proof.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Damek

Damek Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @damekdavis

Jul 22
Update:

On the first try of using chatgpt agent, it surfaced a reference from 2015 which contained a proof of the result in question.

The paper was on an unrelated topic. It was hidden in the appendix and was missing many relevant keywords.
The bound had also been stated but not proven in a 1994 paper. The 2015 paper gave a decently simple self-contained proof.

This question had been asked to a few prominent people in this field. They didn't know, so it wasn't well known. I'm sure they could have figured it out.
I didn't know this literature at all. I tried not to read it except through interaction with llms.

The solutions given by all llms always had a flaw. The approach didn't look like the one in the 2015 paper.
Read 10 tweets
Jul 14
Update: I didn't solve the problem

Over the last 6 months, I've tried pretty hard to use LLMs to solve a (I think) not too difficult problem in a nearby field (learning theory).

The answers have gone from terrible to plausible, and I actually understand the problem better now.
I now have references that are highly relevant and several pathways that might work.

Unfortunately, every argument proposed has had (in retrospect) a fairly obvious hole in it.
they all produce a series of creative steps and simplifications that move us closer to solving the problem.

but when you look carefully and backtrack, you see claims that just fall apart because of unjustified 'trivial claims.'
Read 4 tweets
Dec 11, 2023
Next formalization is the "gradient inequality" for smooth convex functions.

The proof (next tweet) is a few sentences of English.

My lean4 proof (below) is roughly 350 lines of code.

1/n Image
The proof in English is short. It is a consequence of the simple convex analysis fact:

If f is a convex function such that f(0) = 0 and f(y)>= o(y), then f(y) \geq 0 globally. Image
To formalize this proof, I first had to learn whether mathlib had support for gradients. Interestingly this was just added in October:
leanprover.zulipchat.com/#narrow/stream…
Image
Read 14 tweets
Aug 29, 2023
Cool technique. Reminds me of the the Cramer-Chernoff method, which applies a Markov type inequality and optimizes to find the "best" parameter....
Image
Carrying out the optimization bounds the tail deviation as the exponential of the Fenchel conjugate of the log moment generating function of the r.v. Image
You can then use this to do the standard tensorization trick for sums of i.i.d random variables Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(