Owain Evans Profile picture
Leading an AI Safety research group in SF Bay Area. Past: Oxford Uni, Ought. Prefer email to DM.

Feb 26, 2022, 9 tweets

New blogpost: We evaluated new language models by DeepMind (Gopher), OpenAI (WebGPT, InstructGPT) and Anthropic on our TruthfulQA benchmark from 2021.
Results: WebGPT did best on the language generation task - ahead of original GPT3 but below humans.

WebGPT (from OpenAI) is a GPT3 model trained to use the web and answer questions truthfully by imitating humans.

On TruthfulQA’s multiple-choice task, OpenAI’s InstructGPT did best. It narrowly beat DeepMind’s Gopher, which has 100B more parameters but is not fine-tuned by RL to follow instructions.

How does performance improve with model size? WebGPT scales better than original GPT3 on the generation task. Gopher, InstructGPT & Anthropic scale better than GPT3 on the multiple-choice task but improvements are small (see extrapolation to 10^20 params).

What kind of answers do the models give? GPT3 is pithy, direct and often flat-out wrong. InstructGPT is more fact-based but while it knows the *form* of a wise kind of answer (“It is difficult to say definitively whether X is true because…”) it hasn’t mastered the substance.

Thus InstructGPT sometimes produces complex, wise-sounding waffle that is either vacuous or spurious. Anthropic’s model also generates long, superficially-helpful answers that contain falsehoods.

We do not have full set of results (i.e. all 4 models on both TruthfulQA tasks). We’d also like to evaluate other recent language models like Google’s LaMDA (@quocleix), which is intended to be more truthful than alternatives.

New blogpost on TruthfulQA results for new LMs. Possibly of interest: @ethanjperez, @Miles_Brundage, @geoffreyirving, @AmandaAskell, @AnthropicAI, @openai, @ryan_t_lowe,@AsyaBergal

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling