Yann Dubois (sigmoid.social/@yanndubs)'s Threads

Mar 20, 2024 • 8 tweets • 3 min read

AlpacaEval is now length-controlled (LC)!

✅ highest correlation with Chat Arena (0.98)
✅ no reannotation
✅ simple interpretation: win rate if model length = baseline length
✅ robust to length gamification

0.98 that’s essentially evaluation on Arena but in 3min and <$10.

Key: predict what the win rate would be if model length=baseline length

We:
1. fit GLM: model | length | instruction -> preference
2. predict preference conditioned on baseline length

Benefits:
✅easily extendible to other biases
✅nice math properties
✅no reannotation needed

Jan 6, 2023 • 9 tweets • 8 min read

I played with @AnthropicAI assistant (AA) and compared it to @OpenAI ChatGPT

TLDR: both are similar but AA is
+ Harder to jailbreak
+ Tries to be more helpful
+ Follows closer what we ask for
+ ~Better for writing in English
- Worst for code
- Worst in French
- Longer resp.
🧵

**Coding**
CGPT is better

Quantitative (leetcode hard in python/c/javascript):
- CGPT: 3/3
- AA: 1/3 only got python correct

Qualitative:
- CGPT: more reliable/concise/efficient
- AA: more comments + emphasizes explainability

both are wrong when asked for impossible algo
2/8

Nov 28, 2022 • 11 tweets • 10 min read

#NeurIPS2022
What are ideal representations for self-sup. learning (SSL)?

🤓We give simple optimality conditions and use them to improve/understand/derive SSL methods!

🔥outperform baselines on ImageNet

arxiv.org/abs/2011.10566
w. @tatsu_hashimoto @StefanoErmon @percyliang
🧵

Goal: ideally representations should allow linear probes to perfectly predict any task that is invariant to augmentations in the most sample-efficient way

Q: Which of the following representation is optimal?

2/8

Share this page!

Enter URL or ID to Unroll