Yann Dubois (sigmoid.social/@yanndubs) Profile picture
PhD student @stanfordAILab | Prev: AI resident @metaai, @vectorinst, @CambridgeMLG
Mar 20 8 tweets 3 min read
AlpacaEval is now length-controlled (LC)!

✅ highest correlation with Chat Arena (0.98)
✅ no reannotation
✅ simple interpretation: win rate if model length = baseline length
✅ robust to length gamification

0.98 that’s essentially evaluation on Arena but in 3min and <$10. Image Key: predict what the win rate would be if model length=baseline length

We:
1. fit GLM: model | length | instruction -> preference
2. predict preference conditioned on baseline length

Benefits:
✅easily extendible to other biases
✅nice math properties
✅no reannotation needed Image
Jan 6, 2023 9 tweets 8 min read
I played with @AnthropicAI assistant (AA) and compared it to @OpenAI ChatGPT

TLDR: both are similar but AA is
+ Harder to jailbreak
+ Tries to be more helpful
+ Follows closer what we ask for
+ ~Better for writing in English
- Worst for code
- Worst in French
- Longer resp.
🧵 **Coding**
CGPT is better

Quantitative (leetcode hard in python/c/javascript):
- CGPT: 3/3
- AA: 1/3 only got python correct

Qualitative:
- CGPT: more reliable/concise/efficient
- AA: more comments + emphasizes explainability

both are wrong when asked for impossible algo
2/8
Nov 28, 2022 11 tweets 10 min read
#NeurIPS2022
What are ideal representations for self-sup. learning (SSL)?

🤓We give simple optimality conditions and use them to improve/understand/derive SSL methods!

🔥outperform baselines on ImageNet

arxiv.org/abs/2011.10566
w. @tatsu_hashimoto @StefanoErmon @percyliang
🧵 Goal: ideally representations should allow linear probes to perfectly predict any task that is invariant to augmentations in the most sample-efficient way

Q: Which of the following representation is optimal?

2/8