Discover and read the best of Twitter Threads about #ICLR2021

Most recents (15)

*Score-based diffusion models*

An emerging approach in generative modelling that is gathering more and more attention.

If you are interested, I collected some introductive material and thoughts in a small thread. 👇

Feel free to weigh in with additional material!

/n
An amazing property of diffusion models is simplicity.

You define a probabilistic chain that gradually "noise" the input image until only white noise remains.

Then, generation is done by learning to reverse this chain. In many cases, the two directions have similar form.

/n
The starting point for diffusion models is probably "Deep Unsupervised Learning using Nonequilibrium Thermodynamics" by @jaschasd Weiss @niru_m @SuryaGanguli

Classic paper, definitely worth reading: arxiv.org/abs/1503.03585

/n
Read 13 tweets
*Weisfeiler and Lehman Go Topological*

Fantastic #ICLR2021 paper by @CristianBodnar @ffabffrasca @wangyg85 @kneppkatt Montúfar @pl219_Cambridge @mmbronstein

Graph networks are limited to pairwise interactions. How to include higher-order components?

Read more below 👇 /n
The paper considers simplicial complexes, nice mathematical objects where having a certain component (e.g., a 3-way interaction in the graph) means also having all the lower level interactions (e.g., all pairwise interactions between the 3 objects). /n
Simplicial complexes have many notions of "adjacency" (four in total), considering lower- and upper- interactions.

They first propose an extension of the Weisfeiler-Lehman test that includes all four of them, showing it is slightly more powerful than standard WL. /n
Read 5 tweets
📢SPICEs: Survey Papers as Interactive Cheat-sheet Embeddings at #ICLR2021 @rethinkmlpapers workshop

Portal: bit.ly/3vLSdJx
Github: bit.ly/3utDHFT
Video:

Credits: @omarsar0 for the inspiration.
Co-authors: @joewhaley @MatthewMcAteer0
(2/n) Examples: Want to know more about what Machine learners mean when they say: "X- is all you need'?
We surveyed all 80-odd of them here.
Dataset: github.com/vinayprabhu/X-…
(3/3) X-former architecture survey you ask?
PDF: github.com/vinayprabhu/SP…
Read 3 tweets
The #ICLR2021 Workshop on Enormous Language Models (WELM) is tomorrow, May 7th!

Full info: welmworkshop.github.io
Livestream: welmworkshop.github.io/livestream/
gathertown info for ICLR registrants: iclr.cc/virtual/2021/w…

Thread summarizing the talks & panels ⬇️ (1/14)
Our first talk will be by Thomas Margoni, who will provide some legal perspective on the use of web data for training large language models. He'll touch on topics like copyright law, rights, and licenses, as they pertain to training data for LMs. (2/14)
Then, @JesseDodge will give a talk on how to document datasets and improve reproducibility of research. He'll discuss the NLP reproducibility checklist, a recent study on documenting C4, and a framework for modeling bias in data. (3/14)
Read 14 tweets
🚨 Our #ICLR2021 paper shows that KG-augmented models are surprisingly robust to KG perturbation! 🧐

arXiv: arxiv.org/abs/2010.12872
Code: github.com/INK-USC/deceiv…

To learn more, come find us at Poster Session 9 (May 5, 5-7PM PDT): iclr.cc/virtual/2021/p….

🧵[1/n]
KGs have helped neural models perform better on knowledge-intensive tasks and even “explain” their predictions, but are KG-augmented models really using KGs in a way that makes sense to humans?

[2/n]
We primarily investigate this question by measuring how the performance of KG-augmented models changes when the KG’s semantics and/or structure are perturbed, such that the KG becomes less human-comprehensible.

[3/n]
Read 10 tweets
Fantastic talk and Q&A by @timnitGebru at #ICLR2021

Among other things I really appreciate how Timnit is unerasing the contribution of our retracted co-authors and how key their contributions & perspectives were to the Stochastic Parrots paper.
@timnitGebru And so much else: @timnitGebru is absolutely brilliant at drawing connections between the research milieu, research content, geopolitics and individual, situated lived experience.
@timnitGebru On interdisciplinarity and the hierarchy of knowledge:

“If you have all the money, you don’t have to listen to anybody” —@timnitgebru
Read 9 tweets
Come to our talks and posters at #ICLR2021 to discuss our findings on understanding and improving deep learning! Talks and posters are available now! Links to the talks, posters, papers and codes in the thread:

1/7
When Do Curricula Work? (Oral at #ICLR2021)
with @XiaoxiaWShirley and @ethansdyer

Paper: openreview.net/forum?id=tW4QE…
Code: github.com/google-researc…
Video and Poster: iclr.cc/virtual/2021/p…

2/7
Sharpness-Aware Minimization for Efficiently Improving Generalization (Spotlight at #ICLR2021 )
with @Foret_p, Ariel Kleiber and @TheGradient

Paper: openreview.net/forum?id=6Tm1m…
Code: github.com/google-researc…
Video and Poster: iclr.cc/virtual/2021/p…

3/7
Read 7 tweets
Ecstatic to see "Machine learning research communication via illustrated and interactive web articles" published at @rethinkmlpapers workshop at #ICLR2021

In it, I describe my workflow for communicating ML to millions of readers.

Paper: openreview.net/pdf?id=WUrcJoy…

1/5
I discuss five key ML communication artifacts:
1- The hero image
2- The Twitter thread
3- The illustrated article
4- The interactive article
5- Interpretability software

Here are excellent examples of 1 and 2 from @ch402, @karpathy , and @maithra_raghu.

2/5
For illustrated/animated articles, I discuss the importance of empathy towards the reader, putting intuition first, the importance of iteratively creating visual language to describe concepts, and reflect on pedagogical considerations.

3/5
Read 5 tweets
Model-based planning is often thought to be necessary for deep reasoning & generalization. But the space of choices in model-based deep RL is huge. Which work well and which don't? In our new paper (accepted to #ICLR2021), we investigate! arxiv.org/abs/2011.04021 1/
Spoiler: our findings really challenged some deeply-held assumptions we had about what planning is useful for and how much planning is really needed in popular MBRL benchmarks---even some "strategic" ones like Sokoban. 2/
This is joint work with @theophaneweber, Abe Friesen, @FeryalMP, Arthur Guez, @fabiointheuk, @simswitherspoon, Thomas Anthony, Lars Buesing, and @PetarV_93 . 3/
Read 10 tweets
#ICLR2021 cam-ready II: "LiftPool: Bidirectional ConvNet Pooling" w/ Jiaojiao Zhao is now available: isis-data.science.uva.nl/cgmsnoek/pub/z… No more lossy down- and upsampling when pooling! 1/n Image
LiftPool adopts the philosophy of the classical #Lifting #Scheme from #signal #processing. LiftDownPool decomposes a feature map into various downsized sub-bands, each of which contains information with different frequencies. Because of its invertible properties, ... 2/n
by performing LiftDownPool backwards, a corresponding up-pooling layer #LiftUpPool is able to generate a refined upsampled feature map using the detail sub-bands, which is useful for #image-#to-#image #translation challenges. 3/n Image
Read 4 tweets
Excited to share our #ICLR2021 paper w/ CS & math depts @Stanford 🎊

Evaluating the Disentanglement of Deep Generative Models through Manifold Topology!

w/ @ericzelikman Fred Lu @AndrewYNg Gunnar Carlsson @StefanoErmon. Acknowledging @torbjornlundh Samuel Bengmark.

Thread 🧵
Before I start: camera-ready 📸 & math-inclined R5 burn 🔥 are here
openreview.net/forum?id=djwS0…

Huge appreciation for all reviewers esp R5 in making our work better.

My goal in 🧵: Explain our work in my simplest terms to you. Don't worry if you get lost, it's admittedly dense :)
Disentanglement in your generative model means dimensions in its latent space can change a corresponding feature in its data space, e.g. adapting just 1️⃣ dim can make the output "sunnier" ☁️→🌥→⛅️→🌤→☀️ Contrast w/ this entangled mess ☁️→🌥→🌩→🌪→☀️
Read 17 tweets
1/ I'm very happy to give a little thread today on our paper accepted at ICLR 2021!

🎉🎉🎉

In this paper, we show how to build ANNs that respect Dale's law and which can still be trained well with gradient descent. I will expand in this thread...

openreview.net/forum?id=eU776…
2/ Dale's law states that neurons release the same neurotransmitter from all of their axonal terminals.

en.wikipedia.org/wiki/Dale%27s_…

Practically speaking, this implies that neurons are either all excitatory or inhibitory. It's not 100%, nothing is in biology, but it's roughly true.
3/ You may have wondered, "Why don't more people use ANNs that respect Dale's law?"

The rarely discussed reason is this:

When you try to train an ANN that respects Dale's law with gradient descent, it usually doesn't work as well -- worse than an ANN that ignores Dale's law.
Read 16 tweets
Some people say that one shouldn't care about publication and the quality matters. However, the job market punishes those who don’t have publications in top ML venues. I empathize with students and newcomers to ML whose good papers are not getting accepted. #ICLR2021
1/
Long thread at the risk of being judged:

I just realized that in the last 6 years, 21 of my 24 papers have been accepted to top ML conf in their FIRST submission even though the majority of them were hastily-written borderline papers (not proud of this). How is this possible?
2/
At this point, I'm convinced that this cannot be explained by a combination of luck and quality of the papers. My belief is that the current system has lots of unnecessary and sometimes harmful biases which is #unfair to new comers and anyone who is outside of the "norm".
3/
Read 17 tweets
Semi-supervised learning with consistency regularization and pseudo-labeling works great for CLASSIFICATION.

But how about STRUCTURED PREDICTION tasks? 🤔

Check out @ylzou_Zack's #ICLR2021 paper on designing pseudo-labels for semantic segmentation.
yuliang.vision/pseudo_seg/
How do we get pseudo labels from unlabeled images?

Unlike classification, directly thresholding the network outputs for dense prediction doesn't work well.

Our idea: start with weakly sup. localization (Grad-CAM) and refine it with self-attention for propagating the scores.
Using two different prediction mechanisms is great bc they make errors in different ways. With our fusion strategy, we get WELL-CALIBRATED pseudo labels (see the expected calibration errors in E below) and IMPROVED accuracy under 1/4, 1/8, 1/16 of labeled examples.
Read 6 tweets
So here is an analysis of #ICLR2021 decisions.

860 accepted out of 2997 -> 29% acceptance rate
53 Orals, 114 Spotlights, 693 Posters, 1756 Rejected, 381 Withdrawn.

Thread 🧵

All decisions in one table: docs.google.com/spreadsheets/d…
Distribution of decisions based on average rating.
Orals: top-6% of accepted papers, top-2% of all papers.
Average score: 7.5, Min score: 6.67

Spotlight: top-13% of accepted papers, top-4% of all papers.
Average score: 7, Min score: 6
Read 5 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!