To make an existing model more robust at test time: augment a single test image in many ways, finetune model so that predictions on augmented images "agree", minimizing marginal entropy. This is the idea behind MEMO (w/ Marvin Zhang & @chelseabfinn):
MEMO is a simple test-time adaptation method that takes any existing model (no change during training), and finetunes it on one image: 1. generate augmentations of test image 2. make predictions on all of them 3. minimize marginal entropy of these predictions (make them similar)
This can significantly improve a model's robustness to OOD inputs. Here are examples on ImageNet-C where MEMO fixes mistakes the model would have made without MEMO. This doesn't involve additional assumptions, the training is exactly the same, and it operates on one test image.
I'm excited about these kinds of approaches, because they show that by "thinking harder" at test time, even existing models can get better accuracy. Just like people might misidentify something if they glance at it quickly, but if they look at it long enough, they get it right.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
An "RL" take on compression: "super-lossy" compression that changes the image, but preserves its downstream effect (i.e., the user should take the same action seeing the "compressed" image as when they saw original)…
The idea is pretty simple: we use a GAN-style loss to classify whether the user would have taken the same downstream action upon seeing the compressed image or not. Action could mean button press when playing a video game, or a click/decision for a website.
The compression itself is done with a generative latent variable model (we use styleGAN, but VAEs would work great too, as well as flows). PICO basically decides to throw out those bits that it determines (via its GAN loss) won't change the user's downstream decision.
We'll present CoMPS, an algorithm for online continual meta-learning, where an agent meta-learns tasks one by one, with each task accelerating future tasks. By @GlenBerseth, WilliamZhang365, @chelseabfinn
You can watch the talk in advance here:
And then come discuss the work with Aviral at the poster sessions! This work is not released yet, but it will be out shortly.
We're quite excited about this result, and I'll try to explain why.
Deep networks are overparameterized, meaning there are many parameter vectors that fit the training set. So why does it not overfit? While there are many possibilities, they all revolve around some kind of "implicit regularization" that leads to solutions that generalize well.
Can we devise a more tractable RL problem if we give the agent examples of successful outcomes (states, not demos)? In MURAL, we show that uncertainty-aware classifiers trained with (meta) NML make RL much easier. At #ICML2021
If the agent gets some examples of high reward states, we can train a classifier to automatically provide shaped rewards (this is similar to methods like VICE). A standard classifier is not necessarily well shaped.
This is where the key idea in MURAL comes in: use normalized max likelihood (NML) to train a classifier that is aware of uncertainty. Label each state as either positive (success) or negative (failure), and use the ratio of likelihoods from these classifiers as reward!
Since many people were interested in our recent offline MBO work, I'll also write about a recent paper on MBO by Justin Fu, which trains forward models for each possible objective value and uses them to compute a posterior via NML:
A thread:
The basic idea, unlike COMs (which learn pessimistic models) is to get a posterior over values for a new design x. Justin's method (NEMO) trains a separate model *for every possible value y* for the design x (discretized), and uses the likelihood from these to get the posterior.
This corresponds to the normalized maximum likelihood (NML) distribution, which has appealing regret guarantees, which we extend in NEMO to provide regret guarantees on offline MBO as well! This is more complex than COMs, but potentially more powerful as we get a full posterior.
Data-driven design is a lot like offline RL. Want to design a drug molecule, protein, or robot? Offline model-based optimization (MBO) tackles this, and our new algorithm, conservative objective models (COMs) provides a simple approach:
A thread:
The basic setup: say you have prior experimental data D={(x,y)} (e.g., drugs you've tested). How to use it to get the best drug? Well, you could train a neural net f(x) = y, then pick the best x. This is a *very* bad idea, because you'll just get an adversarial example!
This is very important: lots of recent work shows how to train really good predictive models in biology, chemistry, etc. (e.g., AlphaFold), but using these for design runs into this adversarial example problem. This is actually very similar to problems we see in offline RL!