My Authors
Read all threads
So I don't want to sound alarms prematurely, here, but we could possibly be looking at the first case of an AI pretending to be stupider than it is. In this example, GPT-3 apparently fails to learn/understand how to detect balanced sets of parentheses. (1/10.)
Now, it's possibly that GPT-3 "legitimately" did not understand this concept, even though GPT-3 can, in other contexts, seemingly write code or multiply 5-digit numbers. But it's also possible that GPT-3, playing the role of John, predicted that *John* wouldn't learn it.
It's tempting to anthropomorphize GPT-3 as trying its hardest to make John smart. That's what we want GPT-3 to do, right? But what GPT-3 actually does is predict text continuations. If *you* saw John say all that - would you *predict* the next lines would show John succeeding?
So it could be that GPT-3 straight-up can't recognize balanced parentheses. Or it could be that GPT-3 could recognize them given a different prompt. Or it could be that the cognition inside GPT-3 does see the pattern, but play-acts the part of 'John' getting it wrong.
The scariest feature of this whole incident? We have no idea if that happened. Nobody has any idea what GPT-3 is 'thinking'. We have no idea whether this run of GPT-3 contained a more intelligent cognition that faked a less intelligent cognition.
Now, I *could* be wrong about that last part! @openAI could be storing a record of all inputs and randseeds used in GPT-3 instances, so that they can reconstruct any interesting runs. And though it seems less likely, @openAI could somehow have any idea what a GPT-3 is thinking.
So I hereby offer a $1000 bounty - which I expect to go unclaimed - if @openAI has any means to tell us definitively whether GPT-3 was 'deliberately' sandbagging its attempt to recognize balanced parentheses, in that particular run of the AI Dungeon. With an exception for...
...answering merely by showing that, despite a lot of other attempts at prompting under more flexible circumstances, GPT-3 could not learn to balance parentheses as complicated as those tried by Breitman. (Which does answer the question, but in a less interesting way.)
If @openAI can't claim that bounty, I encourage them to develop tools for recording inputs, recording randseeds, and making sure all runs of GPTs are exactly reproducible; and much more importantly and difficultly, getting greater internal transparency into future AI processes.
Regardless, I unironically congratulate @openAI on demonstrating something that could plausibly be an alignment failure of this extremely-important-in-general type, thereby sharply highlighting the also-important fact that now we have no idea whether that really happened. (END.)
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with Eliezer Yudkowsky

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!