Post

@yoavgo

@emilymbender

@JamieBHenderson

@sulin_blodgett

@MaartenSap

More from @ethayarajh

Kawin Ethayarajh

@ethayarajh

Dec 7, 2023

📢The problem in model alignment no one talks about — the need for preference data, which costs $$$ and time!

Enter Kahneman-Tversky Optimization (KTO), which matches or exceeds DPO without paired preferences.

And with it, the largest-ever suite of feedback-aligned LLMs. 🧵

But first, what makes alignment work? Among methods that directly optimize preferences, the majority of gains <30B come from SFT.

Even a dummy one-step PPO that uses +1/-1 rewards works very well.

DPO is uniquely good at the 30B scale, however. 2/

But *why* do they work?

We find that alignment methods impute a utility function to humans.

These imputed functions have many qualities of those empirically derived by Kahneman & Tversky in their Nobel Prize-winning work on how humans make decisions about uncertain outcomes. 3/

Read 13 tweets

Kawin Ethayarajh

@ethayarajh

Feb 22, 2023

📢 Models like #ChatGPT are trained on tons of human feedback. But collecting this costs $$$!

That's why we're releasing the Stanford Human Preferences Dataset (🚢SHP), a collection of 385K *naturally occurring* *collective* human preferences over text.
huggingface.co/datasets/stanf…

Given some context and two possible responses, SHP preferences reflect the helpfulness of one response over another.

The preferences are over responses to questions/instructions in 18 domains, from cooking to legal advice, drawn from Reddit.

They were inferred from the simple observation that if comment A was written after B but has a higher score despite getting less visibility, then ostensibly A > B.

If A was written before B, then we can't conclude this -- the higher score could have come from more visibility!

Read 10 tweets

Kawin Ethayarajh

@ethayarajh

Jun 5, 2021

Is there a connection between Shapley Values and attention-based explanations in NLP?

Yes! Our #ACL2021NLP paper proves that **attention flows** can be Shapley Value explanations, but regular attention and leave-one-out cannot.

w/ @jurafsky @stanfordnlp arxiv.org/abs/2105.14652

Shapley Values are a solution to the credit assignment problem in cooperative games -- if 10 people work together to win some reward, how can it be equitably distributed?

For this reason, they've become a popular kind of explanation in ML. 2/

Shapley Values have been used to explain the importance of individual features, embeddings, and neurons.

@GhorbaniAmirata and @james_y_zou have even used them to value training data points.

In NLP though, attention-based explanations and leave-one-out still predominate. 3/

Read 8 tweets

Kawin Ethayarajh

@ethayarajh

Sep 22, 2020

https://twitter.com/kevin_scott/status/1308438898553638912

There's been some confusion over what Microsoft's "exclusive license" really means here.

While I can't speak for OpenAI, exclusive licenses generally grant exclusivity *within some specific context*. So no, Microsoft won't be the only one able to use GPT3. That said ...

https://twitter.com/kevin_scott/status/1308438898553638912

My guess is that only MS will have access to the underlying model, while everyone will have to go through the API and be at the whims of whatever terms are set by OpenAI.

This is big -- if you build a product on top of GPT3, your ability to scale will depend on OpenAI's willingness to increase your throughput, which in turn will depend on the terms of their agreement with MS. Not a great situation to be in if you're directly competing with MS.

Read 6 tweets

Kawin Ethayarajh

@ethayarajh

Jun 23, 2020

Is your NLP classifier actually (un)biased? Or is your diagnosis based on too little data?

It might be the latter!

In my #ACL2020 paper, I discuss why we need bigger datasets for conclusively identifying classification bias in NLP.

arxiv.org/abs/2004.12332 1/

Background: Large NLP datasets don't come with annotations for protected attributes (e.g., gender). To test for classification bias, one typically annotates a small sample of data (typically < 5K). WinoBias and WinoGender are great examples of these bias-specific datasets. 2/

Intuitively, the less data we annotate, the less certain we are that our estimate is close to the true bias. But how can we quantify this uncertainty? 3/

Read 8 tweets

Kawin Ethayarajh

@ethayarajh

Feb 7, 2020

How contextual are contextualized word representations? In a long-overdue blog post, I discuss my EMNLP paper on how ELMo, BERT, and GPT-2 contextualize words — how they’re alike, and how they’re different.

blog: kawine.github.io/blog/nlp/2020/…
paper: aclweb.org/anthology/D19-…

Key findings:

1. In all layers of BERT, ELMo, and GPT-2, the representations of all words are anisotropic: they occupy a narrow cone in the embedding space instead of being distributed throughout.

2. In BERT, ELMo, and GPT-2, upper layers produce more context-specific representations than lower layers; however, the models contextualize words very differently from one another.

Read 7 tweets

Share this page!

Enter URL or ID to Unroll

Kawin Ethayarajh

Try unrolling a thread yourself!

More from @ethayarajh

Kawin Ethayarajh

Kawin Ethayarajh

Kawin Ethayarajh

Kawin Ethayarajh

Kawin Ethayarajh

Kawin Ethayarajh

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!