5th Year Ph.D. @BerkeleyNLP, Columbia'19. Previous intern @MSFTResearch . Supervising machines to do what I can't do.
Mar 2, 2023 • 10 tweets • 3 min read
I dream of building AI that can do research🤖🔬
We are still far from this, but in our most recent paper, we formalize the D5 task: tell GPT-3 your research goal and provide two large text datasets. It may discover patterns you didn’t notice!😉
We propose the D5 task – goal driven discovery of distributional differences via language descriptions.
Each input problem = a research goal + a pair of large corpora. The output is a natural language predicate (discovery) that describes corpus-level differences.
Oct 12, 2021 • 6 tweets • 3 min read
Why does model often attend to salient words even though it's not required by the training loss? To understand this inductive bias we need to analyze the optimization trajectory🧐
Sharing our preprint "Approximating How Single Head Attention Learns" #NLProc
We approximate with 2 stages: early in training when attentions are uniform, the model learns to translate individual input word `i` to `o` if they co-occur frequently. Later, the model learns to attend to `i` while the correct output is o because it knows `i` translates to `o`.
Aug 30, 2021 • 9 tweets • 4 min read
We can prompt language models for 0-shot learning ... but it's not what they are optimized for😢.
Our #emnlp2021 paper proposes a straightforward fix: "Adapting LMs for 0-shot Learning by Meta-tuning on Dataset and Prompt Collections".
Many Interesting takeaways below 👇 1. Prompting a language model out of the box can be highly suboptimal. For example, GPT-3 (175B parameters) gets 80% on SST-2 zero-shot, while UnifiedQA (700M) get 92% 🤔 so even being adapted to generic question answering can make a 200x smaller model better ...