Tweet

Alfredo Canziani

Follow @alfcnz

21 Oct, 7 tweets, 3 min read

This week we've learnt how to perform inference with a latent variable 👻 energy 🔋 based model. 🤓
These models are very convenient when we cannot use a standard feed-forward net that maps vector to vector, and allow us to learn one-to-many and many-to-one relationships.

Take the example of the horn 📯 (this time I drew it correctly, i.e. points do not lie on a grid 𐄳). Given an x there are multiple correct y's, actually, there is a whole ellipse (∞ nb of points) that's associated with it!
Or, forget the x, even considering y alone…

there are (often) two values of y₂ per a given y₁! Use MSE and you'll get a point in the middle… which is WRONG.

What's a “latent variable” you may ask now.
Well, it's a ghost 👻 variable. It was indeed used to generate the data (θ) but we don't have access to (z).

So, it went missing.
How to recover it?
Well, we can simply find the one that minimises our energy.
Then, what's this “energy”?
Okay, okay, I'm getting there. It represents the level of compatibility between x, y, z. x being your input, y the target, and z the latent.

So, given that we have access to this energy E, we can find a value for z (blue ❌ below) that minimises the degree of annoying the model. The value of E at that location is called “free energy” or Fₒₒ (this is also called the zero-temperature limit of E).

This lesson first part has been recorded and will come up online, together with the slides, next week or so. You'll find it on the class website together with a transcript put together by this semester students.

Next week we'll cover the second part, where we'll learn about the latent marginalisation, training for unconditional and conditional cases, and we'll have a look at the notebook I've put together to craft this lecture.

Thanks for reading. 👀
Keep learning! 🎓

🤓😋❤️

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @alfcnz

Alfredo Canziani

@alfcnz

29 Apr

🥳 NEW LECTURE 🥳
Graph Convolutional Networks… from attention!
In attention 𝒂 is computed with a [soft]argmax over scores. In GCNs 𝒂 is simply given, and it's called "adjacency vector".
Slides: github.com/Atcold/pytorch…
Notebook: github.com/Atcold/pytorch…

Summary of today's class.

Slide 1: *shows title*.
Slide 2: *recalls self-attention*.
Slide 3: *shows 𝒂, points out it's given*.
The end.

Literally!
I've spent the last week reading everything about these GCNs and… LOL, they quickly found a spot in my mind, next to attention!

The key concept here is the *sparsity* (constraints) of the graph.
In self-attention, every element in the set looks at each and every other element.
If a sparse graph is given, we limit each element (node / vertex) to look only at a few other elements (nodes / vertices).

Read 6 tweets

Alfredo Canziani

@alfcnz

22 Apr

🥳 NEW LECTURE 🥳
“Set to set” and “set to vector” mappings using self/cross hard/soft attention. We combined a (two) attention module(s) with a (two) k=1 1D convolution to get a transformer encoder (decoder).
Slides: github.com/Atcold/pytorch…
Notebook: github.com/Atcold/pytorch…

This week's slides were quite dense, but we've been building up momentum since the beginning of class, 3 months ago.
We recalled concepts from:
• Linear Algebra (Ax as lin. comb. of A's columns weighted by x's components, or scalar products or A's rows against x)

@PyTorch

• Recurrent Nets (stacking x[t] with h[t–1] and concatenating W_x and W_h)
• Autoencoders (encoder-decoder architecture)
• k=1 1D convolutions (that does not assume correlation between neighbouring features and act as a dim. adapter)
and put in practice with @PyTorch.

Read 9 tweets

Alfredo Canziani

@alfcnz

15 May 18

Impressive, especially the TikZ diagrams 🙇‍♂️
Would you mind using $\mathbb{R}$ for the blackboard R?
Thanks for using $\bm{}$ for vectors and $^\top$ for the transposition.
What about $\mathbb{I}$ for the identity matrix?
Use $\varnothing$ instead of $\emptyset$?

Also, isn't the gradient (column vector) the transposed of the Jacobian (row vector)?
What about having the differential operator d upright, and not like a variable?

Figure 5.6 uses bold upright font, instead of the italic one.

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Alfredo Canziani

Try unrolling a thread yourself!

More from @alfcnz

Alfredo Canziani

Alfredo Canziani

Alfredo Canziani

Did Thread Reader help you today?

Like this author's thread?