The energy πŸ”‹ saga complete index ☝🏻
πŸ’œπŸ’šπŸ’œ

Episode I

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Alfredo Canziani

Alfredo Canziani Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alfcnz

19 May
The fifth episode (of five) of the energy πŸ”‹ saga is out! 🀩

In this last episode of the energy saga we code up an AE, DAE, and VAE in @PyTorch. Then, we learn about GAN, where a cost net C is trained contrastively with samples generated by another net.
A GAN is simply a contrastive technique where a cost net C is trained to assign low energy to samples y (blue, cold πŸ₯Ά, low energy) from the data set and high energy to contrastive samples Ε· (red, hot πŸ₯΅, where the β€œhat” points upward indicating high energy).
y comes from the data set Y.
Ε· is produced by the generating network G, which maps a random vector to the input space Ε· = G(z).
To train G we simply minimise C(G(z)).

And that's it.
No fooling around with discriminators. πŸ₯Έ
It's *simply* contrastive energy learning. πŸ˜‡
Read 5 tweets
11 May
The fourth episode (of five) of the energy πŸ”‹ saga is out! 🀩

From LV EBM to target prop(agation) to vanilla autoencoder, and then denoising, contractive, and variational autoencoders. Finally, we learn about the VAE's bubble-of-bubbles interpretation.
Edit: updating a thumbnail and adding one more.

In this episode I *really* changed the content wrt last year. Being exposed to EBMs for several semesters now made me realise how all these architectures (and more to come) are connected to each other.
In the companion lecture (which will soon come online), @ylecun goes over a more powerful interpretation of VAE, which I still struggle to understand. As you can imagine, another tweak to my deck will occur when I'll actually get it. (Yeah, I'm slow, yet persistent.)
Read 5 tweets
8 Apr
β€” Context β€”

Speaking about the transformer architecture, one may incorrectly talk about an encoder-decoder architecture. But this is *clearly* not the case.
The transformer architecture is an example of encoder-predictor-decoder architecture, or a conditional language-model.
The classical definition of an encoder-decoder architecture is the autoencoder (AE). The (blue / cold / low-energy) target y is auto-encoded. (The AE slides are coming out later today.)
Now, the main difference between an AE and a language-model (LM) is that the input is delayed by one unit. This means that a predictor is necessary to estimate the hidden representation of a *future* symbol.
It's similar to a denoising AE, where there is a temporal corruption.
Read 7 tweets
23 Nov 20
#AcademicChatter

Coming from engineering, I'm a former @MATLAB user, moved to @TorchML and @LuaLang, then to @PyTorch and @ThePSF @RealPython, and now I'm exploring @WolframResearch @stephen_wolfram.
For learning, one would prefer knowledge packed frameworks and documentation.
In that regard, @MATLAB and @WolframResearch are ridiculously compelling. The user manuals are just amazing, with everything organised and available at your disposal. Moreover, the language syntax is logical, much closer to math, and aligned to your mental flow.
In Mathematica I can write y = 2x (implicit multiplication), x = 6, and y will be now equal 12. y is a variable.
Or I can create a function of x with y[x_] := 2x (notice that x_ means I don't evaluate y right now). Later, I can execute y[x] and get 12, as above.
Read 7 tweets
31 Oct 20
This week we went through the second part of my lecture on latent variable πŸ‘» energy πŸ”‹ based models. πŸ€“

We've warmed up a little the temperature 🌑, moving from the freezing πŸ₯Ά zero-temperature free energy Fβ‚’β‚’(y) (you see below spinning) to a warmer πŸ₯° Fᡦ(y).
Be careful with that thermostat! If it's gonna get too hot πŸ₯΅ you'll end up killing ☠️ your latents πŸ‘» and end up with averaging them all out, indiscriminately, ending up with plain boring MSE (fig 1.3)! πŸ€’
From fig 2.1–3, you can see how more z's contribute to Fᡦ(y).
This is nice, 'cos during training (fig 3.3, bottom) *The Force* will be strong with a wider region of your manifold, and no longer with the single Jedi. This in turns will lead to a more even pull and will avoid overfitting (fig 3.3, top). Still, we're fine here because z ∈ ℝ.
Read 6 tweets
21 Oct 20
This week we've learnt how to perform inference with a latent variable πŸ‘» energy πŸ”‹ based model. πŸ€“
These models are very convenient when we cannot use a standard feed-forward net that maps vector to vector, and allow us to learn one-to-many and many-to-one relationships.
Take the example of the horn πŸ“― (this time I drew it correctly, i.e. points do not lie on a grid 𐄳). Given an x there are multiple correct y's, actually, there is a whole ellipse (∞ nb of points) that's associated with it!
Or, forget the x, even considering y alone…
there are (often) two values of yβ‚‚ per a given y₁! Use MSE and you'll get a point in the middle… which is WRONG.

What's a β€œlatent variable” you may ask now.
Well, it's a ghost πŸ‘» variable. It was indeed used to generate the data (ΞΈ) but we don't have access to (z).
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(