toucan Profile picture
ex-anthropic, scale, YC X25, berkeley AI. tweets about evals, sf tech, and lao gan ma
Nov 2 4 tweets 3 min read
From Ilya’s deposition—

• Ilya plotted over a year with Mira to remove Sam
• Dario wanted Greg fired and himself in charge of all research
• Mira told Ilya that Sam pitted her against Daniela
• Ilya wrote a 52 page memo to get Sam fired and a separate doc on Greg Image
Image
Image
Image
• OpenAI is paying Ilya’s legal bills Image
Feb 18 9 tweets 3 min read
I looked at ZeroBench. I didn't like any of the examples I looked at. I would not interpret a significant improvement on this eval as a significant improvement in models' visual reasoning.

(1/8) Image The main issues are:

(1) The visual reasoning tested is too simple. Many questions are essentially counting different classes of objects and then summing or multiplying the counts. For example, counting the number of pens that have caps.

[2/8]
Sep 16, 2022 11 tweets 4 min read
My predictions for GPT-4:

• Bigger context window (16k-32k)
• Tool use: can browse web, write code
• Massively scaling human feedback
• Incorporating user-generated data
• More data curation
• Improved scaling laws
• Only 200-400B

Read the post: thomasliao.com/forecasting-gp… Why GPT-4 in particular, not another model? OpenAI's announced a new GPT every year so far, but not this year... yet

So some predictions are more specific to OpenAI versus what I might say for Google, FAIR, DeepMind, etc Image