Follow @ESYudkowsky

12,399 views

Eliezer Yudkowsky

Follow @ESYudkowsky

, 10 tweets, 3 min read

My Authors

https://twitter.com/ArthurB/status/1285028345952919552

https://twitter.com/ArthurB/status/1285028345952919552

So I don't want to sound alarms prematurely, here, but we could possibly be looking at the first case of an AI pretending to be stupider than it is. In this example, GPT-3 apparently fails to learn/understand how to detect balanced sets of parentheses. (1/10.)

https://twitter.com/ArthurB/status/1285028345952919552

Now, it's possibly that GPT-3 "legitimately" did not understand this concept, even though GPT-3 can, in other contexts, seemingly write code or multiply 5-digit numbers. But it's also possible that GPT-3, playing the role of John, predicted that *John* wouldn't learn it.

It's tempting to anthropomorphize GPT-3 as trying its hardest to make John smart. That's what we want GPT-3 to do, right? But what GPT-3 actually does is predict text continuations. If *you* saw John say all that - would you *predict* the next lines would show John succeeding?

So it could be that GPT-3 straight-up can't recognize balanced parentheses. Or it could be that GPT-3 could recognize them given a different prompt. Or it could be that the cognition inside GPT-3 does see the pattern, but play-acts the part of 'John' getting it wrong.

The scariest feature of this whole incident? We have no idea if that happened. Nobody has any idea what GPT-3 is 'thinking'. We have no idea whether this run of GPT-3 contained a more intelligent cognition that faked a less intelligent cognition.

@openAI

@openAI

Now, I *could* be wrong about that last part! @openAI could be storing a record of all inputs and randseeds used in GPT-3 instances, so that they can reconstruct any interesting runs. And though it seems less likely, @openAI could somehow have any idea what a GPT-3 is thinking.

@openAI

@openAI

So I hereby offer a $1000 bounty - which I expect to go unclaimed - if @openAI has any means to tell us definitively whether GPT-3 was 'deliberately' sandbagging its attempt to recognize balanced parentheses, in that particular run of the AI Dungeon. With an exception for...

...answering merely by showing that, despite a lot of other attempts at prompting under more flexible circumstances, GPT-3 could not learn to balance parentheses as complicated as those tried by Breitman. (Which does answer the question, but in a less interesting way.)

@openAI

@openAI

If @openAI can't claim that bounty, I encourage them to develop tools for recording inputs, recording randseeds, and making sure all runs of GPTs are exactly reproducible; and much more importantly and difficultly, getting greater internal transparency into future AI processes.

@openAI

@openAI

Regardless, I unironically congratulate @openAI on demonstrating something that could plausibly be an alignment failure of this extremely-important-in-general type, thereby sharply highlighting the also-important fact that now we have no idea whether that really happened. (END.)

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Try unrolling a thread yourself!

More from @ESYudkowsky see all

Embed code for your website

Did Thread Reader help you today?