Discover and read the best of Twitter Threads about #ICML2021

Most recents (11)

After a hiatus, a new series of blogs posts. Do differential geometry and algebraic topology sound too exotic for ML? In recent works, we show that tools from these fields bring a new perspective on graph neural networks

First post in the series:

towardsdatascience.com/graph-neural-n…
Based on recent works with @CristianBodnar @ffabffrasca @kneppkatt @wangyg85 @pl219_Cambridge @guidomontufar @b_p_chamberlain @migorinova @stefan_webb @emaros96 @aittalam James Rowbottom, Jake Topping, Xiaowen Dong, Francesco Di Giovanni
Cool animation of Cora graph evolution by James Rowbottom
Read 6 tweets
RAIL will be presenting a number of exciting late breaking poster results at the RL4RealLife WS #ICML2021 (8 pm PT today!): sites.google.com/view/RL4RealLi…

Algorithms for real-world RL w/ mobile manipulators, lifelong meta-learning methods, principled multi-task data sharing.

A thread:
We'll show how RL can control robots that learn to clean up a room, entirely in the real world. By Charles Sun, @ColinearDevin, @abhishekunique7, @jendk3r, @GlenBerseth. Image
We'll present CoMPS, an algorithm for online continual meta-learning, where an agent meta-learns tasks one by one, with each task accelerating future tasks. By @GlenBerseth, WilliamZhang365, @chelseabfinn Image
Read 4 tweets
In RL, "implicit regularization" that helps deep learning find good solutions can actually lead to huge instability. See @aviral_kumar2 talk on DR3:
7/23 4pm PT RL for real: icml.cc/virtual/2021/w…
7/24 5:45pm PT Overparameterization WS talk icml.cc/virtual/2021/w…
#ICML2021

🧵> Image
You can watch the talk in advance here:
And then come discuss the work with Aviral at the poster sessions! This work is not released yet, but it will be out shortly.

We're quite excited about this result, and I'll try to explain why.
Deep networks are overparameterized, meaning there are many parameter vectors that fit the training set. So why does it not overfit? While there are many possibilities, they all revolve around some kind of "implicit regularization" that leads to solutions that generalize well.
Read 8 tweets
Can we devise a more tractable RL problem if we give the agent examples of successful outcomes (states, not demos)? In MURAL, we show that uncertainty-aware classifiers trained with (meta) NML make RL much easier. At #ICML2021
arxiv.org/abs/2107.07184

A (short) thread:
The website has a summary: sites.google.com/view/mural-rl

If the agent gets some examples of high reward states, we can train a classifier to automatically provide shaped rewards (this is similar to methods like VICE). A standard classifier is not necessarily well shaped.
This is where the key idea in MURAL comes in: use normalized max likelihood (NML) to train a classifier that is aware of uncertainty. Label each state as either positive (success) or negative (failure), and use the ratio of likelihoods from these classifiers as reward!
Read 7 tweets
Data-driven design is a lot like offline RL. Want to design a drug molecule, protein, or robot? Offline model-based optimization (MBO) tackles this, and our new algorithm, conservative objective models (COMs) provides a simple approach: arxiv.org/abs/2107.06882

A thread:
The basic setup: say you have prior experimental data D={(x,y)} (e.g., drugs you've tested). How to use it to get the best drug? Well, you could train a neural net f(x) = y, then pick the best x. This is a *very* bad idea, because you'll just get an adversarial example!
This is very important: lots of recent work shows how to train really good predictive models in biology, chemistry, etc. (e.g., AlphaFold), but using these for design runs into this adversarial example problem. This is actually very similar to problems we see in offline RL!
Read 8 tweets
(1/9) Presenting: Bayesian Algorithm Execution (BAX) and the InfoBAX algorithm.

Bayesian optimization finds global optima of expensive black-box functions. But what about other function properties?

w/ @KAlexanderWang @StefanoErmon at #ICML2021

URL: willieneis.github.io/bax-website
(2/9) Methods like Bayes opt / quadrature can be viewed as estimating properties of a black-box function (e.g. global optima, integrals). But in many applications we also care about local optima, level sets, top-k optima, boundaries, integrals, roots, graph properties, and more Example properties of black...
(3/9) For a given property, we can often find an algorithm that computes it, in the absence of any budget constraint. We then reframe the task as: how do we adaptively sample to estimate *the output of this algorithm*?

We call this task “Bayesian algorithm execution” or BAX. Figure describing the task ...
Read 9 tweets
Many models bake in domain knowledge to control how input data is processed. This means models must be redesigned to handle new types of data.

Introducing the Perceiver, an architecture that works on many kinds of data - in some cases all at once: dpmd.ai/perceiver (1/)
Like Transformers, Perceivers process inputs using attention. But unlike Transformers, they first map inputs to a small latent space where processing is cheap & doesn’t depend on the input size. This allows us to build deep networks even when using large inputs like images. (2/)
Perceivers can learn a different attention pattern for each type of data (shown for images and video), making it easy for them to adapt to new data and unexplored problems where researchers may not know what kinds of patterns they should be looking for. (3/)
Read 4 tweets
Excited to share our paper arxiv.org/abs/2105.12221 on neural net overparameterization to appear at #ICML2021 💃🏻We asked why can’t training find a minimum in mildly overparameterized nets. Below, a 4-4-4 net can achieve a zero-loss, but any of 5-5-5 nets trained with GD can not🤨
We investigated the training failures in mild overparameterization vs. successful training in vast overparameterization from a simple perspective of permutation symmetries!
The catch is that all critical points of small nets turn into subspaces of critical points in bigger nets. We gave precise numbers of such critical subspaces using combinatorics 😋
Read 8 tweets
Really excited to share our latest progress on few-shot classification on the Meta-Dataset benchmark (w/ @hugo_larochelle, @zemelgroup and @dumoulinv) that will appear in #ICML2021: arxiv.org/abs/2105.07029

Keep reading to find out more 👇🧵
Despite exciting progress on Meta-Dataset’s ‘weak generalization’ tasks where the goal is to learn held-out classes of *seen* datasets from few examples, the improvement of recent work is much smaller on the ‘strong generalization’ ones that present classes from *unseen* datasets Image
In this work, we focus on those challenging tasks. Specifically, our aim is to leverage a large and diverse training set consisting of several datasets, for the purpose of creating a flexible model that is then able to few-shot learn new classes from unseen datasets.
Read 8 tweets
New paper: Neural Rough Differential Equations !

Greatly increase performance on long time series, by using the mathematics of rough path theory.

arxiv.org/abs/2009.08295
github.com/jambo6/neuralR…

Accepted at #ICML2021!

🧵: 1/n
(including a lot about what makes RNNs work) Image
(So first of all, yes, it's another Neural XYZ Differential Equations paper.

At some point we're going to run out of XYZ differential equations to put the word "neural" in front of.)

2/n
As for what's going on here!

We already know that RNNs are basically differential equations.

Neural CDEs are the example closest to my heart. These are the true continuous-time limit of generic RNNs:
arxiv.org/abs/2005.08926
github.com/patrick-kidger…

3/n Image
Read 22 tweets
New @ #ICML2021: When a trained model fits clean (training) data well but randomly labeled (training) data (added in) poorly, its generalization (to the population) is guaranteed!

Paper: arxiv.org/abs/2105.00303

by ACMI PhD @saurabh_garg67, Siva B, @zicokolter, & @zacharylipton
This result makes deep connections between label noise, early learning, and generalization. Key takeaways: 1) the early learning phenomenon can be leveraged to produce post-hoc generalization certificates; 2) can be leveraged by adding unlabeled training data (randomly labeled)
The work translates the early learning into a generalization guarantee *without ever explicitly invoking the complexity of the hypothesis class* & we hope others will dig into this result and go deeper.
Read 5 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!