Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Alfredo Canziani

@alfcnz

Jun 28, 2021 • 8 tweets • 6 min read • Read on X

Scrolly

@imisra_

Learn all about self-supervised learning for vision with @imisra_!

In this lecture, Ishan covers pretext invariant rep learning (PIRL), swapping assign. of views (SwAV), audiovisual discrimination (AVID + CMA), and Barlow Twins redundancy reduction.

@MLStreetTalk

Here you can find the @MLStreetTalk's interview, where these topics are discussed in a conversational format.

https://twitter.com/MLStreetTalk/status/1406884357185363974

@imisra_

Here, instead, you can read an accessible blog post about these topics, authored by @imisra_ and @ylecun.

https://twitter.com/ylecun/status/1367516830542270467

We can organise different classes of joint-embeddings methods in 4 main categories.
• Contrastive (explicit use of negative samples)
• Clustering
• Distillation
• Redundancy reduction

«Contrastive»

Related embeddings (same colour) should be closer than unrelated embeddings (different colour).
Good negatives samples are *very* important. E.g.
• SimCLR has a *very large* batch size;
• Wu2018 uses an offline memory bank;
• MoCo uses an “online mem bank”.

«Clustering»

Contrastive learning ⇒ grouping in feature space.
We may simply want to assign an embedding to a given cluster. Examples are:
• SwAV performs online clustering using optimal transport;
• DeepClustering;
• SeLA.

«Distillation»

Similarity maximisation through a student-teacher distillation process. Trivial solution avoided by using asymmetries: learning rule and net's architecture.
• BYOL's student has a predictor on top, the teacher is a slow student;
• SimSiam shares weights.

«Redundancy reduction»

Each neuron's representation should be invariant under input data augmentation and independent from other neurons. Everything's done *without* looking at negative examples!
E.g. Barlow Twins makes the covariance close to an identity matrix.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @alfcnz

Alfredo Canziani

@alfcnz

Jun 2, 2022

⚠️ Long post warning ⚠️
5 years ago, for my birthday, out of the blue (this was so much a prank) *The Yann LeCun* texted me (no, we didn't know each other) on Messenger offering me a life changing opportunity, which I failed to obtain the ‘proper’ way, but got it by accident. 🤷🏼‍♂️

Why did I fail? I'm not that smart.
Don't even start telling me I'm humble. I can gauge far too well the brain-power of NYU PhD students surrounding me, let alone my colleagues.
Did I manage to make it after years of faking it? Not in the slightest.

So, did he make a mistake picking this quirky Italian? I'd say no.
While working on an autonomous driving project, as instructed, I went out of my way to help with teaching for as much as I could.
My dream was to teach world wide, and YouTube let me just do that.

Read 9 tweets

Alfredo Canziani

@alfcnz

Sep 27, 2021

Let's try this. Hopefully, I won't regret it, haha. 😅😅😅
Sat 2 Oct 2021 @ 9:00 EST, live stream of my latest lecture.
Prerequisites: practica 1 and 2 from DLSP21.

https://twitter.com/alfcnz/status/1440043558988234760

① Gentle introduction to EBM for classification.

https://twitter.com/alfcnz/status/1440043558988234760

@PyTorch

② Basic @PyTorch training instructions and rationale behind them.

https://twitter.com/alfcnz/status/1440412086257930247

Read 5 tweets

Alfredo Canziani

@alfcnz

Sep 16, 2021

@kchonyc

Yesterday, in @kchonyc's NLP class, we've learnt about the input (word and sentence) and class embeddings, and how these are updated using the gradient of the log-probability of the correct class, i.e. log p(y* | x).

Say x is a sentence of T words: x = {w₁, w₂, …, w_T}.
1h(w) is the 1-hot representation of w (its index in a dictionary).
e(w) is the dense representation associated with w.
ϕ(x) = ∑ e(wₜ) bag-of-word sentence representation.

∇e(w) = ∇ϕ(x) = u_y* − 𝔼_{y|x}[u_y]
We'll add to e(w) the correct class embedding u_y* while removing what the network thinks it should be instead 𝔼_{y|x}[u_y]. *If* these two are the same, then the gradient will be zero, and nothing will be added or subtracted.

Read 6 tweets

Alfredo Canziani

@alfcnz

Aug 12, 2021

📣 NYU Deep Learning SP21 📣
Theme 4 / 3: EBMs, advanced

Website: atcold.github.io/NYU-DLSP21/
Lecture 7:
Lecture 8:
Lecture 9:

Learn about regularised EBMs: from prediction with latent variables to sparse coding. From temporal regularisation methods to (conditional) variational autoencoders.

We think that not only babies find peekaboo funny.
You let us know, okay?
😅😅😅

Read 8 tweets

Alfredo Canziani

@alfcnz

Jun 25, 2021

@awnihannun

Learn about modern speech recognition and the Graph Transformer Networks with @awnihannun!

In this lecture, Awni covers the connectionist temporal classification (CTC) loss, beam search decoding, weighted finite-state automata and transducers, and GTNs!

«Graph Transformer Networks are deep learning architectures whose states are not tensors but graphs.
You can back-propagate gradients through modules whose inputs and outputs are weighted graphs.
GTNs are very convenient for end-to-end training of speech recognition and NLP sys.»

«They can be seen as a differentiable form of WFST (weighted finite-state transducers) widely used in speech recognition.

Awni is the lead author of libgtn, a GTN library for PyTorch.»

Read 8 tweets

Alfredo Canziani

@alfcnz

May 25, 2021

https://twitter.com/alfcnz/status/1379112137587630087

The energy 🔋 saga complete index ☝🏻
💜💚💜

Episode I

https://twitter.com/alfcnz/status/1379112137587630087

https://twitter.com/alfcnz/status/1382178290375520256

Episode II

https://twitter.com/alfcnz/status/1382178290375520256

https://twitter.com/alfcnz/status/1388654284221337613

Episode III

https://twitter.com/alfcnz/status/1388654284221337613

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Alfredo Canziani

Try unrolling a thread yourself!

More from @alfcnz

Alfredo Canziani

Alfredo Canziani

Alfredo Canziani

Alfredo Canziani

Alfredo Canziani

Alfredo Canziani

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!