More from @suchenzang

Susan Zhang

@suchenzang

Jun 9, 2023

There seems to be a high correlation between folks who think "open source LLMs will win" & folks who

1) haven't developed any large-scale infra "in the cloud" themselves

or

2) have never priced out the cost of scaling-out "in the cloud" (for the most powerful models)

1/6

or

3) underestimate the cost of investing in data infrastructure and tooling

(have you seen why Databricks exists?)

2/6

OSS is needed to justify the value of continuing to push the limits of scale (Sutton's bitter lesson) in enabling quick prototypes and demos of possible applications.

But no single software library will solve manual integration glue with existing systems, and...

3/6

Read 6 tweets

Susan Zhang

@suchenzang

Jan 22, 2023

https://twitter.com/drjwrae/status/1617033514037411847

Piling on to the pile-on (sorry - it's always easy to criticize 😛), here's a rant about benchmarks for LLMs that are used to back claims of "stronger" or "better" models.

Let's start with a tour through GPT-3's Appendix G... 1/8

https://twitter.com/drjwrae/status/1617033514037411847

First up: BoolQ. If you download the actual benchmark, it's true/false completions. GPT-3 swaps in yes/no instead. Why? Well when we did the same swap to yes/no, we saw a +10% accuracy jump on this benchmark.

Wonderful. Clearly on track for a better model already. 2/8

Next up: formatting. Why does CB get prompted for true/false and RTE with True/False?

Why does WebQA use "Q/A", WiC use "question/answer", and ARC use "Question/Answer"?

Could it be... that you simply get better results switching it up? 🤔

It just keeps going... 3/8

Read 8 tweets

Susan Zhang

@suchenzang

Jan 21, 2023

@stephenroller

After ignoring the details in all these "lets-fit-a-cloud-of-points-to-a-single-line" papers (all likely wrong when you really extrapolate), @stephenroller finally convinced me to work through the math in the Chinchilla paper and as expected, this was a doozy. [1/7]

https://twitter.com/stephenroller/status/1616605686141435905

First thing to make me eye-roll a bit was this fancy equation (4) that seems to re-parameterize the key exponent terms (a,b) into (alpha,beta) to define a coefficient term G. Why this level of indirection just to define a scalar-coefficient? No idea. [2/7]

So then you naturally start wondering what A/B/a/b could be. First stop: (a,b) is set to different values for 3 different "Approaches" in Table 2, each seeming to differ by just a hair: (0.5,0.5) vs (0.49,0.51) vs (0.46,0.54). Ok, sure, why not.

Now for A,B... [3/7]

Read 7 tweets

Share this page!

Enter URL or ID to Unroll

Susan Zhang

Try unrolling a thread yourself!

More from @suchenzang

Susan Zhang

Susan Zhang

Susan Zhang

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!