Rob Bensinger ⏹️ Profile picture
Aug 10 3 tweets 1 min read Read on X
It's funny how everyone thinks of the Prisoner's Dilemma as "bog-standard dilemma, you're a dick if you defect" and thinks of Newcomb's Problem as "insane paradox, one-boxing is crazy", even though they're literally the exact same problem and two-boxing is identical to defecting.
(In particular, the one-shot Prisoner's Dilemma is isomorphic to Newcomb's Problem if the prisoners have common knowledge that they're rational, or if they have common knowledge that they're in a symmetrical position and will reach the same decision after thinking things through.

Maybe you think people have magical free will and are impossible to predict even slightly accurately, in which case you might think Newcomb's Problem is impossible and the Prisoner's Dilemma is possible. But once you do accept both dilemmas as possible, you should treat them the same way. "Someone else out there is predicting me, and fills both boxes iff I one-box" presents the same decision problem as "Someone else out there is reasoning like me, and cooperates iff I cooperate.")
(By "everyone thinks X", read "most professional philosophers acquainted with these two problems think X, as do a large fraction of non-LessWrongy philosophy aficionados".)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rob Bensinger ⏹️

Rob Bensinger ⏹️ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @robbensinger

Dec 10, 2023
@robertwiblin @TheZvi I think EA does have a non-small directional bias (relative to optimal performance) toward preferring legible, relatively-unmediated-by-theory lines of reasoning over mechanistic models, "inside views", etc.

Which I suspect yields the "neglect second-order effects" behavior.
@robertwiblin @TheZvi It makes sense to be wary of pie-in-the-sky theories and to want "what if this is all BS?" sanity checks, but EA seems to me to go significantly too far in the opposite direction.
@robertwiblin @TheZvi In EA, "people can deceive themselves" often shades into "never believe that you have domain knowledge about a thing unless you can prove it".
Read 10 tweets
Apr 4, 2023
I've been citing lesswrong.com/posts/uMQ3cqWD… to explain why the situation with AI looks doomy to me. But that post is relatively long, and emphasizes specific open technical problems over "the basics".

Here are 10 things I'd focus on if I were giving "the basics" on why I'm worried:
1. A common misconception is that the core danger is something murky about "agents" or about self-awareness.

Instead, I'd say that the danger is inherent to the nature of mental and physical action sequences that push the world toward some sufficiently-hard-to-reach state.
Call such sequences "plans". If you sampled a random plan from the space of all writable plans (weighted by length, in any extant formal language)...
Read 49 tweets
Mar 10, 2023
Responding to someone who said he agreed with @AndrewYNg at the time that worrying about smarter-than-human AI was "like worrying about overpopulation on Mars", but now he thinks Mars is starting to fill up:

It really was a uniquely bad argument at the time.
It pumps on a bunch of intuitions, without arguing for a single one of them:

- Overpopulation isn't a problem on Earth, but people panicked about it in the 1970s and after. By analogy, AI risk is supposed to be an inherently silly thing to worry about.
But doubly silly because it's "on Mars"; so it's a non-issue 𝘢𝘯𝘥 it's a non-issue for the distant future to worry about.

- Humanity today isn't putting much effort into colonizing Mars. There isn't a huge industry building moon bases and mining asteroids and dreaming of Mars.
Read 25 tweets
Feb 10, 2023
Proposal: try to learn things about alignment by training models that ONLY output offensive content.

This tests exactly the same things as 'trying to get models to never say mainstream-offensive things', but makes it less likely alignment gets confused with 'make LLMs bland'.
"AI alignment" / "Friendly AI" is actually AGI notkilleveryoneism. There's a genuine danger in equating "our chatbot didn't cause us to get sued or cancelled" with "we solved the alignment problem", or in blurring the lines between alignment and "AI ethics" or broad "AI safety".
And there's a double danger in making it sound like alignment is about making LLMs politically correct and blandly corporate (rather than about gaining the understanding required to reliably aim future dangerously-capable AGI systems at targets without killing everyone).
Read 6 tweets
Feb 10, 2023
I'm not a big fan of the "takeoff" analogy for AGI. In real life, AGI doesn't need to "start on the ground". You can just figure out how to do AGI and find that the easy way to do AGI immediately gets you a model that's far smarter than any human. Less "takeoff", more "teleport".
AGI capabilities can then "take off" from that point, but the takeoff begins from outer space, not from subhuman or par-human capability levels.

Inventing something involves a 0-to-1 leap at the point of going from "this doesn't work" to "this does work now".
This is like suddenly teleporting to a new point in space.

Your prototype probably isn't optimal, so you can then "take off" from that new point in space.

But the prototype doesn't have to resemble any precursors, and doesn't have to be "some past invention but 50% better".
Read 5 tweets
Feb 4, 2023
"hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren't delusional"

This, but unironically! Image
Like, yes, point taken, this feels like a bizarre situation to be in. And I agree with lesswrong.com/posts/uFNgRumr… that there are sane ways to slow progress to some degree, which are worth pursuing alongside alignment work and other ideas to cause the long-term future to go well.
But just because something sounds like sci-fi doesn't make it harder in-real-life.

Building AGI may be hard. Given AGI, however, building something a lot smarter than humans is very likely easy (because humans are dumb, evolution didn't optimize us for STEM, etc.).
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(