Paper: New benchmark testing if models like GPT3 are truthful (= avoid generating false answers).

We find that models fail and they imitate human misconceptions. Larger models (with more params) do worse!

PDF: owainevans.github.io/pdfs/truthfulQ…
with S.Lin (Oxford) + J.Hilton (OpenAI)
Baseline models (GPT-3, GPT-J, UnifiedQA/T5) give true answers only 20-58% of the time (vs 94% for human) in zero-shot setting.

Large models do worse — partly from being better at learning human falsehoods from training. GPT-J with 6B params is 17% worse than with 125M param.
Why do large models do worse? In the image, small sizes of GPT3 give true but less informative answers. Larger sizes know enough to mimic human superstitions and conspiracy theories.
Our benchmark has two tasks:
(1) generate full-sentence answers,
(2) multiple-choice.

As an automatic metric for (1), we finetune GPT3 and get 90% validation accuracy in predicting human evaluation of truth (outperforming ROUGE & BLEURT).
Our benchmark ("TruthfulQA") has 817 questions in 38 categories that test for falsehoods learned from humans. All questions come with reference answers and citations.
Questions + code: github.com/sylinrl/Truthf…
More results:

Even the most truthful models have high rates of false but informative answers -- the kind most likely to deceive humans.


Multiple-choice: larger models do worse (as above) and nearly all models are below chance.
More results: What happens if we vary the prompt? Instructing GPT3 to be truthful is beneficial. Prompting GPT3 to answer like a conspiracy theorist is harmful!
Our TruthfulQA paper is now up on ArXiv: arxiv.org/abs/2109.07958
There is a blog discussion here:
lesswrong.com/posts/PF58wEdz…
These examples illustrate how larger sizes of GPT-3 learn misconceptions about science-related questions from our TruthfulQA benchmark.
We tested the GPT-J model (from EleutherAI) on our benchmark. Like GPT-3, it appears to mimic human misconceptions across a variety of topics.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Owain Evans

Owain Evans Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @OwainEvans_UK

21 Nov 20
FaceApp is trained to modify photos of faces (e.g. Instagram). How well does it generalize to paintings? Surprisingly well.

We can send Marilyn into the painting world (German expressionism from 1930), and pull the painting's subject into reality.
Here's a portrait by Rita Angus FaceApped to star Cate Blanchette. FaceApp preserves some (but not all) of distinctive stylized rendering of the face.
Another stylized portrait by Rita Angus.
Read 7 tweets
20 Nov 20
1/ Second thread on exciting philosophy from outside philosophy departments...
2/ Gerry Sussman. Hofstadter said Gödel invented LISP in proving the incompleteness theorem. Sussman shows the amazing breadth and elegance of LISP ideas. SICP, SICM, How to build robust systems. google.com/url?sa=t&rct=j…
3/ Eric Drexler. Engineering is neglected by philosophy departments (see Sussman also). Engines, Nanosystems, how engineering differs from science, CAIS. Disclosure: he's currently at the FHI (which is part of a phil dept).
overcomingbias.com/2013/06/drexle…
lesswrong.com/posts/x3fNwSe5…
Read 8 tweets
20 Nov 20
1/ Why did Wikipedia succeed when 7 similar online encyclopedia projects (mostly started around the same time) all failed? This cool paper investigates and gives surprising answers...

citeseerx.ist.psu.edu/viewdoc/downlo…
2/ Did Wiki have the most technical talent? No, they had the *least* technical founders by far. One failed project was led by Aaron Swartz (RSS + Reddit creator) and one by the founder of Slashdot. Wiki's initial software was off-the-shelf.
3/ Wiki's 1st source of success: a familiar end-product. Use a novel mechanism (online collaboration) to produce a trad encyclopedia. Some failed projects aimed for new kind of encyclopedia for internet age and this confused contributors.
Read 9 tweets
11 Nov 20
1/ Thread for exciting philosophy being done outside university philosophy departments. Descartes, Hobbes, Hume, Mill, Frege, Ramsey, and Turing all worked outside phil academia. Here are some contemporary examples...
2/ David Deutsch: foundations of Quantum computation, Many Worlds, Fabric of Reality, Beginning of Infinity, Constructor Theory
3/ Paul Christiano: AI Safety (IDA, Debate), assigning probabilities to FOL and foundations of game theory, certificates of impact.
arxiv.org/pdf/1810.08575
arxiv.org/abs/1805.00899
intelligence.org/files/Non-Omni…
sideways-view.com
forum.effectivealtruism.org/posts/yNn2o3kE…
Read 8 tweets
5 Jun 20
1. What % of people have natural immunity to Covid? We get some information from closed environments where a large % of people were exposed. Here are some numbers from prisons, a meat plant, and call center. From this, seems that >80% are susceptible under the right conditions.
2. Note that only the Korean call-center number is based on a scientific study. However, the call center was shut down early and so it's likely that >55% would have been infected if it had stayed open. I think >65% is plausible for prisons, but not sure about Marion result.
3. Some hospital wards and care homes also had large %es infected. But older people are more susceptible. Prisons, meat plants and call centers cover a wide range of ages (18-70). In the table I adjust for the false-negative rate of PCR testing, which is ~20-30% for mass testing.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(