Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Simone Scardapane

@s_scardapane

Jun 16, 2021 • 13 tweets • 12 min read • Read on X

Scrolly

*Score-based diffusion models*

An emerging approach in generative modelling that is gathering more and more attention.

If you are interested, I collected some introductive material and thoughts in a small thread. 👇

Feel free to weigh in with additional material!

/n

An amazing property of diffusion models is simplicity.

You define a probabilistic chain that gradually "noise" the input image until only white noise remains.

Then, generation is done by learning to reverse this chain. In many cases, the two directions have similar form.

/n

@jaschasd

The starting point for diffusion models is probably "Deep Unsupervised Learning using Nonequilibrium Thermodynamics" by @jaschasd Weiss @niru_m @SuryaGanguli

Classic paper, definitely worth reading: arxiv.org/abs/1503.03585

/n

@hojonathanho

A cornerstone in diffusion models is the introduction of "denoising" versions by @hojonathanho @ajayj_ @pabbeel

They showed how to make diffusion models perform close to the state-of-the-art using a suitable reformulation of their training objective.

/n

It turns out that the improved version is also simpler than the original one!

Roughly, it works by adding noise to an image, and learning to denoise the image itself.

In this way, training is connected to denoising autoencoders, and sampling remains incredibly easy.

/n

@YSongStanford

Denoising diffusion turns out to be similar to "score-based" models, pioneered by @YSongStanford and @StefanoErmon

@YSongStanford has written an outstanding blog post on these ideas, so I'll just skim some of the most interesting connections: yang-song.github.io/blog/2021/scor…

/n

Score-based models work by learning an estimator for the score function of the distribution (ie, the gradient of the log).

Langevin dynamics allows to sample from p(x) having only access to the estimator of the score function.

Reference paper here: arxiv.org/abs/1907.05600

/n

Naive score-based models are uncommon, because sampling starts in poorly approximated regions.

The solution is noise-conditional score models, that perturb the original input, and generate data using "annealed" Langevin dynamics.

arxiv.org/abs/2006.09011

@prafdhar

Noise-conditional score-based models and denoised diffusion models are almost equivalent, basically a single family of models.

A few additional improvements obtain performance close to BigGAN on complex datasets, as shown by @prafdhar @unixpickle

arxiv.org/abs/2105.05233

/n

@YSongStanford

Interestingly, when the noise variance goes from discrete values to a continuous distribution, score-based models connect to neural SDEs and continuous normalizing flows!

This was shown in a #ICLR2021 paper by @YSongStanford @jaschasd @dpkingma Kumar @StefanoErmon @poolio

/n

@YSongStanford

The field is exploding, too many interesting papers to cite!

For example, a recent one by @YSongStanford @conormdurkan @driainmurray @StefanoErmon shows that a formulation of score-based models is upper-bounding a maximum likelihood objective.

arxiv.org/pdf/2101.09258…

/n

@emiel_hoogeboom

Another personal favorite: multinomial diffusion and argmax flows extend score-based models and flows to discrete data distributions!

by @emiel_hoogeboom @nielsen_didrik @priyankjaini
Forré @wellingmax

arxiv.org/abs/2102.05379

/n

@StefanoErmon

I could go on with my new love, but I'll stop. 🙃

Another nice blog post on score-based models: ajolicoeur.wordpress.com/the-new-conten…

Introductive video by @StefanoErmon:

Lots of code in the blog post by @YSongStanford!

Or you can play w/ github.com/lucidrains/den…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @s_scardapane

Simone Scardapane

@s_scardapane

Sep 20, 2022

Gather round, Twitter folks, it's time for our beloved
**Alice's adventures in a differentiable wonderland**, our magical tour of autodiff and backpropagation. 🔥

Slides below 1/n 👇

It all started from her belief that "very few things indeed were really impossible". Could AI truly be below the corner? Could differentiability be the only ingredient that was needed?

2/n

@ylecun

Wondering were to start, Alice discovered a paper by pioneer @ylecun promising "a path towards autonomous intelligent agents".

Intelligence would arise, it was argued, by several interacting modules, were everything was assumed to be *differentiable*.

3/n

Read 17 tweets

Simone Scardapane

@s_scardapane

Mar 10, 2022

*Generative Flow Networks*

A new method to sample structured objects (eg, graphs, sets) with a formulation inspired to the state space of reinforcement learning.

I have collected a few key ideas and pointers below if you are interested. 👀

1/n

👇

@folinoid

*Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation*
#NeurIPS paper by @folinoid @JainMoksh et al. introducing the method.

The task is learning to sample objects that can be built 1 piece at a time ("lego-style").

2/n

arxiv.org/abs/2106.04399

For example: a complex molecule can be built by adding one atom at a time; an image by colouring one pixel per iteration; etc.

If you formalize this process, you get a state space where you move from an "empty" object to a complete object by traversing a graph.

3/n

Read 13 tweets

Simone Scardapane

@s_scardapane

Dec 28, 2021

*Neural networks for data science* lecture 8 is out!

And it's already the last lecture! 🙀

What lies beyond classical supervised learning? It turns out, _way_ too many subfields!

/n

Here is my overview of everything that can happen when we have > 1 "task": fine-tuning, pre-training, meta learning, continual learning...

The slides have my personal selection of material. 😎

/n

The slides are here: sscardapane.it/assets/files/n…

All the material, as always, is here: sscardapane.it/teaching/nnds-…

/n

Read 4 tweets

Simone Scardapane

@s_scardapane

Nov 3, 2021

*Neural networks for data science* lecture 4 is out! 👇

aka "here I am talking about convolutional neural networks while everyone asks me about transformers"

/n

CNNs are a great way to show how considerations about the data can guide the design of the model.

For example, only assuming locality (and not transl. invariance) we get locally-connected networks.

/n

Everything else is a pretty standard derivation of CNN ideas (stride, global pooling, receptive field, ...).

/n

Read 7 tweets

Simone Scardapane

@s_scardapane

Aug 2, 2021

*Reproducible deep learning*: Time for exams!

To a practical course, a practical exam: I asked each student to include a new branch in the repository showcasing additional tools and libraries.

The result? *Everyone* loves some hyper-parameter optimization. 😄

/n

@OptunaAutoML

Thanks to their work, you'll find practical examples of fine-tuning parameters using @OptunaAutoML, AX (from @facebookai), @raydistributed Tune, and Auto-PyTorch and Talos coming soon.

So many ideas for next year! 😛

github.com/sscardapane/re…

/n

@DVCorg

You will also find additional exercises on:

- Serving the model with TorchServe;
- Managing experiments with @DVCorg 2.0;
- Set up cron jobs for re-training.

BTW, if you'd like to add something, feel free to contact me or open a pull request. 🙂

github.com/sscardapane/re…

/n

Read 4 tweets

Simone Scardapane

@s_scardapane

Jun 14, 2021

@esiamid

*LocoProp: Enhancing BackProp via Local Loss Optimization*
by @esiamid @_arohan_ & Warmuth

Interesting approach to bridge the gap between first-order, second-order, and "local" optimization approaches. 👇

/n

The key idea is to use a single GD step to define auxiliary local targets for each layer, either at the level of pre- or post-activations.

Then, optimization is done by solving local "matching" problems wrt these new variables.

/n

What is intriguing is that the framework interpolates between multiple scenarios: first solution step is the original GD, while closed-form solution (in one case) is similar to a pre-conditioned GD model. Optimization is "local" in the sense that it decouples across layers.

/n

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Simone Scardapane

Try unrolling a thread yourself!

More from @s_scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!