Kawin Ethayarajh Profile picture
PhD student @stanfordnlp; @facebook Fellow in NLP
Dec 7, 2023 13 tweets 4 min read
📢The problem in model alignment no one talks about — the need for preference data, which costs $$$ and time!

Enter Kahneman-Tversky Optimization (KTO), which matches or exceeds DPO without paired preferences.

And with it, the largest-ever suite of feedback-aligned LLMs. 🧵 Image But first, what makes alignment work? Among methods that directly optimize preferences, the majority of gains <30B come from SFT.

Even a dummy one-step PPO that uses +1/-1 rewards works very well.

DPO is uniquely good at the 30B scale, however. 2/ Image
Feb 22, 2023 10 tweets 6 min read
📢 Models like #ChatGPT are trained on tons of human feedback. But collecting this costs $$$!

That's why we're releasing the Stanford Human Preferences Dataset (🚢SHP), a collection of 385K *naturally occurring* *collective* human preferences over text.
huggingface.co/datasets/stanf… Given some context and two possible responses, SHP preferences reflect the helpfulness of one response over another.

The preferences are over responses to questions/instructions in 18 domains, from cooking to legal advice, drawn from Reddit.
Jun 5, 2021 8 tweets 2 min read
Is there a connection between Shapley Values and attention-based explanations in NLP?

Yes! Our #ACL2021NLP paper proves that **attention flows** can be Shapley Value explanations, but regular attention and leave-one-out cannot.



w/ @jurafsky @stanfordnlp arxiv.org/abs/2105.14652
Image Shapley Values are a solution to the credit assignment problem in cooperative games -- if 10 people work together to win some reward, how can it be equitably distributed?

For this reason, they've become a popular kind of explanation in ML. 2/
Sep 22, 2020 6 tweets 2 min read
There's been some confusion over what Microsoft's "exclusive license" really means here.

While I can't speak for OpenAI, exclusive licenses generally grant exclusivity *within some specific context*. So no, Microsoft won't be the only one able to use GPT3. That said ... My guess is that only MS will have access to the underlying model, while everyone will have to go through the API and be at the whims of whatever terms are set by OpenAI.
Jul 12, 2020 6 tweets 6 min read
Inspired by @yoavgo 's poll, I looked at the views for papers in three tracks -- Ethics, Summarization, and Theme (69 papers in total).

The median views per paper was 104.

In these three tracks, the most-viewed papers at time of writing are ... Image 1. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data by @emilymbender and @alkoller (961 views)

2. How Can We Accelerate Progress Towards Human-like Linguistic Generalization? by @tallinzen (410 views)
Jun 23, 2020 8 tweets 4 min read
Is your NLP classifier actually (un)biased? Or is your diagnosis based on too little data?

It might be the latter!

In my #ACL2020 paper, I discuss why we need bigger datasets for conclusively identifying classification bias in NLP.

arxiv.org/abs/2004.12332 1/ Background: Large NLP datasets don't come with annotations for protected attributes (e.g., gender). To test for classification bias, one typically annotates a small sample of data (typically < 5K). WinoBias and WinoGender are great examples of these bias-specific datasets. 2/
Feb 7, 2020 7 tweets 4 min read
How contextual are contextualized word representations? In a long-overdue blog post, I discuss my EMNLP paper on how ELMo, BERT, and GPT-2 contextualize words — how they’re alike, and how they’re different.

blog: kawine.github.io/blog/nlp/2020/…
paper: aclweb.org/anthology/D19-… Image Key findings:

1. In all layers of BERT, ELMo, and GPT-2, the representations of all words are anisotropic: they occupy a narrow cone in the embedding space instead of being distributed throughout.
Oct 4, 2019 5 tweets 3 min read
What causes bias in word embedding associations? Biased training data, the embedding model, or just noise?

In a long-overdue blog post, I discuss our ACL paper on understanding undesirable associations. #nlproc

blog: kawine.github.io/blog/nlp/2019/…
paper: aclweb.org/anthology/P19-…

1/5 Image Key Takeaways:

The vast majority of words are not, on average, any more gendered in a word2vec embedding space than they are in the training corpus. Exceptions are words that are gender-stereotyped (e.g., ‘nurse’) or gender-specific by definition (e.g., ‘queen’). 2/5
Jun 23, 2019 4 tweets 3 min read
When and why does king - man + woman = queen? In my #ACL2019 paper with @DavidDuvenaud and Graeme Hirst, we explain what conditions need to be satisfied by a training corpus for word analogies to hold in a GloVe or skipgram embedding space. 1/4

blog: bit.ly/2X18QEd Image paper: arxiv.org/abs/1810.04882

In turn, our theory provides
1. An information theoretic interpretation of Euclidean distance in skipgram and GloVe embedding spaces.
2. Novel justification for the surprising effectiveness of using addition to compose word vectors. 2/4