Latest Twitter Threads by @suchenzang on Thread Reader App

Jun 9, 2023 • 6 tweets • 2 min read

There seems to be a high correlation between folks who think "open source LLMs will win" & folks who

1) haven't developed any large-scale infra "in the cloud" themselves

or

2) have never priced out the cost of scaling-out "in the cloud" (for the most powerful models)

1/6 or

3) underestimate the cost of investing in data infrastructure and tooling

(have you seen why Databricks exists?)

2/6

Jan 22, 2023 • 8 tweets • 5 min read

Piling on to the pile-on (sorry - it's always easy to criticize 😛), here's a rant about benchmarks for LLMs that are used to back claims of "stronger" or "better" models.

Let's start with a tour through GPT-3's Appendix G... 1/8

https://twitter.com/drjwrae/status/1617033514037411847

First up: BoolQ. If you download the actual benchmark, it's true/false completions. GPT-3 swaps in yes/no instead. Why? Well when we did the same swap to yes/no, we saw a +10% accuracy jump on this benchmark.

Wonderful. Clearly on track for a better model already. 2/8

Jan 21, 2023 • 7 tweets • 3 min read

After ignoring the details in all these "lets-fit-a-cloud-of-points-to-a-single-line" papers (all likely wrong when you really extrapolate), @stephenroller finally convinced me to work through the math in the Chinchilla paper and as expected, this was a doozy. [1/7]

https://twitter.com/stephenroller/status/1616605686141435905

First thing to make me eye-roll a bit was this fancy equation (4) that seems to re-parameterize the key exponent terms (a,b) into (alpha,beta) to define a coefficient term G. Why this level of indirection just to define a scalar-coefficient? No idea. [2/7]

Share this page!

Enter URL or ID to Unroll