ex-anthropic, scale, YC X25, berkeley AI. tweets about evals, sf tech, and lao gan ma
Nov 2 • 4 tweets • 3 min read
From Ilya’s deposition—
• Ilya plotted over a year with Mira to remove Sam
• Dario wanted Greg fired and himself in charge of all research
• Mira told Ilya that Sam pitted her against Daniela
• Ilya wrote a 52 page memo to get Sam fired and a separate doc on Greg
• OpenAI is paying Ilya’s legal bills
Feb 18 • 9 tweets • 3 min read
I looked at ZeroBench. I didn't like any of the examples I looked at. I would not interpret a significant improvement on this eval as a significant improvement in models' visual reasoning.
(1/8)
The main issues are:
(1) The visual reasoning tested is too simple. Many questions are essentially counting different classes of objects and then summing or multiplying the counts. For example, counting the number of pens that have caps.
[2/8]
Sep 16, 2022 • 11 tweets • 4 min read
My predictions for GPT-4:
• Bigger context window (16k-32k)
• Tool use: can browse web, write code
• Massively scaling human feedback
• Incorporating user-generated data
• More data curation
• Improved scaling laws
• Only 200-400B
Read the post: thomasliao.com/forecasting-gp…
Why GPT-4 in particular, not another model? OpenAI's announced a new GPT every year so far, but not this year... yet
So some predictions are more specific to OpenAI versus what I might say for Google, FAIR, DeepMind, etc