How could you even begin to find theoretical guarantees for zero-shot learning? Our team takes a step in this direction, providing the first non-trivial theoretical guarantees for zero-shot learning with attributes.
Alessio, @CriMenghini, and the team show how to quantify the quality of the information in attribute-based descriptions of unseen classes. They show how to analyze such descriptions and give non-trivial lower bounds such that *no* algorithm can guarantee better performance. [2/3]
P.S. @CriMenghini is on the job market this year! She's looking for industry research positions, and she has top-notch skills in both computational social science *AND* machine learning!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Recently, there has been a lot of interest in compositionality in large pre-trained models. We’re excited to share work led by Nihal Nayak and Peilin Yu on making learned prompts more compositional: arxiv.org/abs/2204.03574
A 🧵👇
We focus on compositional zero-shot learning. The task is to label classes composed of primitive concepts that represent objects and attributes e.g., "old cat" vs "young cat" vs "young dog". Perhaps unsurprisingly, CLIP doesn't do a great job out of the box on this task.
(2/8)
To that end, we propose compositional soft prompting (CSP), a new prompt-tuning method to represent primitive concepts in a composable way.
If you’re at #ICLR2022, hope you’ll check out our spotlighted poster: “Multitask Prompted Training Enables Zero-Shot Task Generalization.” arxiv.org/abs/2110.08207
Poster session 5, Tue 1:30-3:30 ET
This work was a big undertaking from many at the @BigscienceW Workshop, particularly @SanhEstPasMoi, @albertwebson, @colinraffel, and @srush_nlp. It’s been awesome to see all the people already using and building on the T0 family of models for zero-shot learning.
There’s rightly been a lot of excitement around the zero-shot performance of T0 and similar, concurrent approaches like FLAN (ai.googleblog.com/2021/10/introd…).
I also want to highlight the data-centric side of the T0 work.