Despite exciting progress on Meta-Dataset’s ‘weak generalization’ tasks where the goal is to learn held-out classes of *seen* datasets from few examples, the improvement of recent work is much smaller on the ‘strong generalization’ ones that present classes from *unseen* datasets
In this work, we focus on those challenging tasks. Specifically, our aim is to leverage a large and diverse training set consisting of several datasets, for the purpose of creating a flexible model that is then able to few-shot learn new classes from unseen datasets.
We introduce Few-shot Learning with a Universal TEmplate (FLUTE 🎵) that captures a useful inductive bias for this: we learn a partially-parameterized model (the ‘template’) that can define a wide array of feature extractors for diverse tasks, if filled-in appropriately for each
We train the template (gray sub-network) jointly across the training datasets (showing only 2 for illustration), allowing the feature extractors of the different datasets to differ only in their FiLM (batch normalization) parameters
At test time we reuse the template, but plug in new FiLM params for each task, learned via gradient descent on the support set, starting from a task-dependent init that blends previous sets of FiLM parameters based on their “compatibility” to the task at hand
We show that FLUTE outperforms previous methods by a large margin on this challenging problem. It is also more adaptable and scales better with the number of training datasets compared to the previous strong-performing Universal Representation approaches.
Looking forward to seeing you all virtually in July! 🙂
• • •
Missing some Tweet in this thread? You can try to
force a refresh