My Authors
Read all threads
🥳 NEW LECTURE 🥳
“Set to set” and “set to vector” mappings using self/cross hard/soft attention. We combined a (two) attention module(s) with a (two) k=1 1D convolution to get a transformer encoder (decoder).
Slides: github.com/Atcold/pytorch…
Notebook: github.com/Atcold/pytorch…
This week's slides were quite dense, but we've been building up momentum since the beginning of class, 3 months ago.
We recalled concepts from:
• Linear Algebra (Ax as lin. comb. of A's columns weighted by x's components, or scalar products or A's rows against x)
• Recurrent Nets (stacking x[t] with h[t–1] and concatenating W_x and W_h)
• Autoencoders (encoder-decoder architecture)
• k=1 1D convolutions (that does not assume correlation between neighbouring features and act as a dim. adapter)
and put in practice with @PyTorch.
Notice how you can smoothly go from hard to soft attention by switching between argmax and softargmax (which most of you still call “softmax”). Hard attention implies a one-hot vector a, while soft attention gives your pseudo probabilities.
Once again, this architecture deals with *sets* of symbols!
There is no order. Therefore, computations can be massively parallelised (they are just a bunch of matrix products, afterall).
Just be aware of that t × t A matrix that could blow up, if t (your set length) is large.
Just a final recap, there is *one* and *only one* _query_ (I'd like to cook a lasagna) that I'm going to check against *all* _keys_ (recipes titles) in order to retrieve *one* (if hard) or *a mixed* (if soft) _value_ (recipe to prepare my dinner with).
Me, hungry, during class = decoder.
My granny, knowing all recipes names (keys) and preparations (values) = encoder.
Me, figuring out what I want = self-attention.
Me, asking granny = cross-attention.
Dinner = yay!
I'm done.

Next week: Graph Neural Nets (if it's taking me less than a week to learn about them).
[…] with *one* k=1 1D convolution […]

@Twitter, where's the edit button???
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Alfredo Canziani

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!