I lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
2 subscribers
Apr 30 • 13 tweets • 5 min read
It is critical for scientific integrity that we trust our measure of progress.
The @lmarena_ai has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions.
We spent 5 months analyzing 2.8M battles on the Arena, covering 238 models across 43 providers.
We show that preferential policies engaged in by a handful of providers lead to overfitting to Arena-specific metrics rather than genuine AI progress.
Oct 4, 2024 • 11 tweets • 3 min read
One of the biggest open questions is what is the limit of synthetic data.
Does training of synthetic data lead to mode collapse?
Or is there a path forward that could outperform current models?
What is missing from this conversation is that the success of synthetic data hinges on how you optimize in the data space.
A few recent papers highlight this tension well, on the side of dangers of synthetic data -- excellent paper released in Nature.
How do you distinguish between sources of uncertainty?
This is important because the downstream remedies for atypical and noisy examples are very different.
Two of our workshop papers explore this from different perspectives.
In subset ML network tomorrow, Neil Hu and Xinyu Hu explore where simply prioritizing challenging examples fails -- motivating a more nuanced distinction between sources of uncertainty.
Very excited to share our recent work w Aaron Courville, Yann Dauphin and @DreFrome
weightpruningdamage.github.io
At face value, deep neural network pruning appears to promise you can (almost) have it all — remove the majority of weights with minimal degradation to top-1 accuracy. In this work, we explore this trade-off by asking whether certain classes are disproportionately impacted.