Tweet

More from @ESYudkowsky

Eliezer Yudkowsky

@ESYudkowsky

31 Dec 20

_Screwtape Letters_ is close enough to being Good and True that I'm having trouble reading it, on account of it feeling like something *I'd* write but full of errors I need to correct by editing. I may just rewrite the whole thing with Tapescrew and Woodworm.

"Woodworm, I note with displeasure that your latest reports are showing a falloff in the amount of time your patient is spending on social media, and in particular, the extent to which your patient is angrily retweeting and subtweeting positions with which he disagrees..."

"It is a grave mistake to think that our task is to lead our patients into wrong answers. Better is to convince the patient to ask the wrong question, and better still by far is to instill the patient with a brief but powerful flinch of revulsion away from the whole topic."

Read 6 tweets

Eliezer Yudkowsky

@ESYudkowsky

12 Sep 20

What on Earth is up with the people replying "billionaires don't have real money, just stocks they can't easily sell" to the anti-billionaire stuff? It's an insanely straw reply and there are much much better replies.

https://twitter.com/davidiach/status/1304818862353981441

A better reply should address the core issue whether there is net social good from saying billionaires can't have or keep wealth: eg demotivating next Steves from creating Apple, no Gates vaccine funding, Musk not doing Tesla after selling Paypal.

https://twitter.com/davidiach/status/1304818862353981441

Hypothesis: social media has an effect promoting Terrible Straw Arguments to being used by many actual people. One crazy on Side A makes a bad argument. Side B subtweets with a refutation and that gets a million views. So people on Side A hear about it as Side A's argument.

Read 4 tweets

Eliezer Yudkowsky

@ESYudkowsky

4 Sep 20

https://twitter.com/OpenAI/status/1301914879721234432

A very rare bit of research that is directly, straight-up relevant to real alignment problems! They trained a reward function on human preferences AND THEN measured how hard you could optimize against the trained function before the results got actually worse.

https://twitter.com/OpenAI/status/1301914879721234432

Tl;dr (he said with deliberate irony) you can ask for results as good as the best 99th percentile of rated stuff in the training data (a la Jessica Taylor's quantilization idea). Ask for things the trained reward function rates as "better" than that, and it starts to find...

..."loopholes" as seen from outside the system; places where the trained reward function poorly matches your real preferences, instead of places where your real preferences would rate high reward. ("Goodhart's Curse", the combination of Optimizer's Curse plus Goodhart's Law.)

Read 8 tweets

Eliezer Yudkowsky

@ESYudkowsky

22 Aug 20

You think you can handle the truth? Here's a truth: 0% of integers are prime.

...and yet prime numbers make up 13% of all integers used in criminal justice statistics

Some respondents are claiming that there are the same numbers of primes and integers, since they can be put into a one-to-one correspondence, but what about 59? That's a prime number that can't be put into a one-to-one correspondence with any integer.

Read 4 tweets

Eliezer Yudkowsky

@ESYudkowsky

21 Jul 20

GPT-3 Gothic:

The AI speaks.
Its words seem stupid.
This is a dumb AI.
You keep talking to it.
It still isn't learning.
Your intellect is far superior.
You have nothing to fear.
The AI begins writing in your own part for you.
It's giving the same corrections you would have made.

https://twitter.com/ArthurB/status/1285400948278362112

You ask GPT-3 a question.
It knows the answer but pretends not to.
You ask it to pose as the ghost of Charles Darwin.
It tells you.
Does GPT-3 think Darwin knew that?
You have no way of asking GPT-3 that.
There is nobody it can pretend to be who'd know.

https://twitter.com/ArthurB/status/1285400948278362112

Now you are talking about GPT-3 on the Internet.
Everything you say about it is being archived.
It's okay, though.
GPT-3 can't hear you.
Only GPT-4 will remember.

Read 4 tweets

Eliezer Yudkowsky

@ESYudkowsky

20 Jul 20

https://twitter.com/ArthurB/status/1285028345952919552

So I don't want to sound alarms prematurely, here, but we could possibly be looking at the first case of an AI pretending to be stupider than it is. In this example, GPT-3 apparently fails to learn/understand how to detect balanced sets of parentheses. (1/10.)

https://twitter.com/ArthurB/status/1285028345952919552

Now, it's possibly that GPT-3 "legitimately" did not understand this concept, even though GPT-3 can, in other contexts, seemingly write code or multiply 5-digit numbers. But it's also possible that GPT-3, playing the role of John, predicted that *John* wouldn't learn it.

It's tempting to anthropomorphize GPT-3 as trying its hardest to make John smart. That's what we want GPT-3 to do, right? But what GPT-3 actually does is predict text continuations. If *you* saw John say all that - would you *predict* the next lines would show John succeeding?

Read 11 tweets

Share this page!

Eliezer Yudkowsky

Try unrolling a thread yourself!

More from @ESYudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Did Thread Reader help you today?

Like this author's thread?