**
This Thread may be Removed Anytime!**

Twitter may remove this content at anytime! Save it as PDF for later use!

- Follow @ThreadReaderApp to mention us!
- From a Twitter thread mention us with a keyword "unroll"

`@threadreaderapp unroll`

Practice here first or read more on our help page!

Sep 5
Read 8 tweets

Random matrices are very important in modern statistics and machine learning, not to mention physics

A model about which much less is known is uniformly sampled matrices from the set of doubly stochastic matrices: Uniformly Distributed Stochastic Matrices

A thread -

1/n

A model about which much less is known is uniformly sampled matrices from the set of doubly stochastic matrices: Uniformly Distributed Stochastic Matrices

A thread -

1/n

Sep 1
Read 10 tweets

The perpetually undervalued least-squares:

minₓ‖y−Ax‖²

can teach a lot about some complex ideas in modern machine learning including overfitting & double-descent.

Let's assume A is n-by-p. So we have n data points and p parameters

1/10

minₓ‖y−Ax‖²

can teach a lot about some complex ideas in modern machine learning including overfitting & double-descent.

Let's assume A is n-by-p. So we have n data points and p parameters

1/10

If n ≥ p (“under-fitting” or “over-determined" case) the solution is

x̃ = (AᵀA)⁻¹ Aᵀ y

But if n < p (“over-fitting” or “under-determined” case), there are infinitely many solutions that give *zero* training error. We pick min‖x‖² norm solution:

x̃ = Aᵀ(AAᵀ)⁻¹ y

2/10

x̃ = (AᵀA)⁻¹ Aᵀ y

But if n < p (“over-fitting” or “under-determined” case), there are infinitely many solutions that give *zero* training error. We pick min‖x‖² norm solution:

x̃ = Aᵀ(AAᵀ)⁻¹ y

2/10

Aug 18
Read 7 tweets

Two basic concepts are often conflated:

Sample Standard Deviation (SD) vs Standard Err (SE)

Say you want to estimate m=𝔼(x) from N independent samples xᵢ. A typical choice is the average or "sample" mean m̂

But how stable is this? That's what Standard Error tells you:

1/6

Sample Standard Deviation (SD) vs Standard Err (SE)

Say you want to estimate m=𝔼(x) from N independent samples xᵢ. A typical choice is the average or "sample" mean m̂

But how stable is this? That's what Standard Error tells you:

1/6

Aug 10
Read 7 tweets

Image-to-image models have been called 'filters' since the early days of comp vision/imaging. But what does it mean to filter an image?

If we choose some set of weights and apply them to the input image, what loss/objective function does this process optimize (if any)?

1/7

If we choose some set of weights and apply them to the input image, what loss/objective function does this process optimize (if any)?

1/7

Jul 21
Read 4 tweets

Images aren’t arbitrary collections of pixels -they have complicated structure, even small ones. That’s why it’s hard to generate images well. Let me give you an idea:

3×3 gray images represented as points in ℝ⁹ lie approximately on a 2-D manifold: the Klein bottle!

1/3

3×3 gray images represented as points in ℝ⁹ lie approximately on a 2-D manifold: the Klein bottle!

1/3

Apr 3
Read 5 tweets

We often assume bigger generative models are better. But when practical image generation is limited by compute budget is this still true? Answer is no

By looking at latent diffusion models across different scales our paper sheds light on the quality vs model size tradeoffs

1/5

By looking at latent diffusion models across different scales our paper sheds light on the quality vs model size tradeoffs

1/5