@ Google Deepmind. Past: @MetaAI, @OpenAI, @unitygames, @losalamosnatlab, @Princeton etc. Always hungry for intelligence.
Jun 9, 2023 • 6 tweets • 2 min read
There seems to be a high correlation between folks who think "open source LLMs will win" & folks who
1) haven't developed any large-scale infra "in the cloud" themselves
or
2) have never priced out the cost of scaling-out "in the cloud" (for the most powerful models)
1/6
or
3) underestimate the cost of investing in data infrastructure and tooling
(have you seen why Databricks exists?)
2/6
Jan 22, 2023 • 8 tweets • 5 min read
Piling on to the pile-on (sorry - it's always easy to criticize 😛), here's a rant about benchmarks for LLMs that are used to back claims of "stronger" or "better" models.
Let's start with a tour through GPT-3's Appendix G... 1/8
First up: BoolQ. If you download the actual benchmark, it's true/false completions. GPT-3 swaps in yes/no instead. Why? Well when we did the same swap to yes/no, we saw a +10% accuracy jump on this benchmark.
Wonderful. Clearly on track for a better model already. 2/8
Jan 21, 2023 • 7 tweets • 3 min read
After ignoring the details in all these "lets-fit-a-cloud-of-points-to-a-single-line" papers (all likely wrong when you really extrapolate), @stephenroller finally convinced me to work through the math in the Chinchilla paper and as expected, this was a doozy. [1/7]
First thing to make me eye-roll a bit was this fancy equation (4) that seems to re-parameterize the key exponent terms (a,b) into (alpha,beta) to define a coefficient term G. Why this level of indirection just to define a scalar-coefficient? No idea. [2/7]