Susan Zhang Profile picture
@ Google Deepmind. Past: @MetaAI, @OpenAI, @unitygames, @losalamosnatlab, @Princeton etc. Always hungry for intelligence.
Jun 9, 2023 6 tweets 2 min read
There seems to be a high correlation between folks who think "open source LLMs will win" & folks who

1) haven't developed any large-scale infra "in the cloud" themselves

or

2) have never priced out the cost of scaling-out "in the cloud" (for the most powerful models)

1/6
or

3) underestimate the cost of investing in data infrastructure and tooling

(have you seen why Databricks exists?)

2/6
Jan 22, 2023 8 tweets 5 min read
Piling on to the pile-on (sorry - it's always easy to criticize 😛), here's a rant about benchmarks for LLMs that are used to back claims of "stronger" or "better" models.

Let's start with a tour through GPT-3's Appendix G... 1/8 First up: BoolQ. If you download the actual benchmark, it's true/false completions. GPT-3 swaps in yes/no instead. Why? Well when we did the same swap to yes/no, we saw a +10% accuracy jump on this benchmark.

Wonderful. Clearly on track for a better model already. 2/8 ImageImage
Jan 21, 2023 7 tweets 3 min read
After ignoring the details in all these "lets-fit-a-cloud-of-points-to-a-single-line" papers (all likely wrong when you really extrapolate), @stephenroller finally convinced me to work through the math in the Chinchilla paper and as expected, this was a doozy. [1/7] First thing to make me eye-roll a bit was this fancy equation (4) that seems to re-parameterize the key exponent terms (a,b) into (alpha,beta) to define a coefficient term G. Why this level of indirection just to define a scalar-coefficient? No idea. [2/7]