Tweet

@D2L_ai

More from @s_scardapane

Simone Scardapane

@s_scardapane

2 Aug

*Reproducible deep learning*: Time for exams!

To a practical course, a practical exam: I asked each student to include a new branch in the repository showcasing additional tools and libraries.

The result? *Everyone* loves some hyper-parameter optimization. 😄

/n

@OptunaAutoML

Thanks to their work, you'll find practical examples of fine-tuning parameters using @OptunaAutoML, AX (from @facebookai), @raydistributed Tune, and Auto-PyTorch and Talos coming soon.

So many ideas for next year! 😛

github.com/sscardapane/re…

/n

@DVCorg

You will also find additional exercises on:

- Serving the model with TorchServe;
- Managing experiments with @DVCorg 2.0;
- Set up cron jobs for re-training.

BTW, if you'd like to add something, feel free to contact me or open a pull request. 🙂

github.com/sscardapane/re…

/n

Read 4 tweets

Simone Scardapane

@s_scardapane

16 Jun

*Score-based diffusion models*

An emerging approach in generative modelling that is gathering more and more attention.

If you are interested, I collected some introductive material and thoughts in a small thread. 👇

Feel free to weigh in with additional material!

/n

An amazing property of diffusion models is simplicity.

You define a probabilistic chain that gradually "noise" the input image until only white noise remains.

Then, generation is done by learning to reverse this chain. In many cases, the two directions have similar form.

/n

@jaschasd

The starting point for diffusion models is probably "Deep Unsupervised Learning using Nonequilibrium Thermodynamics" by @jaschasd Weiss @niru_m @SuryaGanguli

Classic paper, definitely worth reading: arxiv.org/abs/1503.03585

/n

Read 13 tweets

Simone Scardapane

@s_scardapane

14 Jun

@esiamid

*LocoProp: Enhancing BackProp via Local Loss Optimization*
by @esiamid @_arohan_ & Warmuth

Interesting approach to bridge the gap between first-order, second-order, and "local" optimization approaches. 👇

/n

The key idea is to use a single GD step to define auxiliary local targets for each layer, either at the level of pre- or post-activations.

Then, optimization is done by solving local "matching" problems wrt these new variables.

/n

What is intriguing is that the framework interpolates between multiple scenarios: first solution step is the original GD, while closed-form solution (in one case) is similar to a pre-conditioned GD model. Optimization is "local" in the sense that it decouples across layers.

/n

Read 4 tweets

Simone Scardapane

@s_scardapane

11 May

@Hydra_Framework

*Reproducible Deep Learning*

The first two exercises are out!

We start quick and easily, with some simple manipulation on Git branches, scripting, audio classification, and configuration with @Hydra_Framework.

Small thread with all information 🙃 /n

Reproducibility is associated to production environments and MLOps, but it is a major concern today also in the research community.

My biased introduction to the issue is here: docs.google.com/presentation/d…

@PyTorchLightnin

The local setup is on the repository: github.com/sscardapane/re…

The use case for the course is a small audio classification model trained on event detection with the awesome @PyTorchLightnin library.

Feel free to check the notebook if you are unfamiliar with the task. /n

Read 8 tweets

Simone Scardapane

@s_scardapane

11 May

@CristianBodnar

*Weisfeiler and Lehman Go Topological*

Fantastic #ICLR2021 paper by @CristianBodnar @ffabffrasca @wangyg85 @kneppkatt Montúfar @pl219_Cambridge @mmbronstein

Graph networks are limited to pairwise interactions. How to include higher-order components?

Read more below 👇 /n

The paper considers simplicial complexes, nice mathematical objects where having a certain component (e.g., a 3-way interaction in the graph) means also having all the lower level interactions (e.g., all pairwise interactions between the 3 objects). /n

Simplicial complexes have many notions of "adjacency" (four in total), considering lower- and upper- interactions.

They first propose an extension of the Weisfeiler-Lehman test that includes all four of them, showing it is slightly more powerful than standard WL. /n

Read 5 tweets

Simone Scardapane

@s_scardapane

8 May

*MLP-Mixer: An all-MLP Architecture for Vision*

It's all over Twitter!

A new, cool architecture that mixes several ideas from MLPs, CNNs, ViTs, trying to keep it as simple as possible.

Small thread below. 👇 /n

The idea is strikingly simple:

(i) transform an image into a sequence of patches;
(ii) apply in alternating fashion an MLP on each patch, and on each feature wrt all patches.

Mathematically, it is equivalent to applying an MLP on rows and columns of the matrix of patches. /n

@ylecun

There has been some discussion (and memes!) sparked from this tweet by @ylecun, because several components can be interpreted (or implemented) using convolutive layers (eg, 1x1 convolutions).

So, not a CNN, but definitely not a "simple MLP" either. /n

https://twitter.com/ylecun/status/1390543133474234368

Read 7 tweets

Share this page!

Simone Scardapane

Try unrolling a thread yourself!

More from @s_scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Simone Scardapane

Did Thread Reader help you today?

Like this author's thread?