It's funny how everyone thinks of the Prisoner's Dilemma as "bog-standard dilemma, you're a dick if you defect" and thinks of Newcomb's Problem as "insane paradox, one-boxing is crazy", even though they're literally the exact same problem and two-boxing is identical to defecting.
(In particular, the one-shot Prisoner's Dilemma is isomorphic to Newcomb's Problem if the prisoners have common knowledge that they're rational, or if they have common knowledge that they're in a symmetrical position and will reach the same decision after thinking things through.
Maybe you think people have magical free will and are impossible to predict even slightly accurately, in which case you might think Newcomb's Problem is impossible and the Prisoner's Dilemma is possible. But once you do accept both dilemmas as possible, you should treat them the same way. "Someone else out there is predicting me, and fills both boxes iff I one-box" presents the same decision problem as "Someone else out there is reasoning like me, and cooperates iff I cooperate.")
(By "everyone thinks X", read "most professional philosophers acquainted with these two problems think X, as do a large fraction of non-LessWrongy philosophy aficionados".)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
@robertwiblin @TheZvi I think EA does have a non-small directional bias (relative to optimal performance) toward preferring legible, relatively-unmediated-by-theory lines of reasoning over mechanistic models, "inside views", etc.
Which I suspect yields the "neglect second-order effects" behavior.
@robertwiblin @TheZvi It makes sense to be wary of pie-in-the-sky theories and to want "what if this is all BS?" sanity checks, but EA seems to me to go significantly too far in the opposite direction.
@robertwiblin @TheZvi In EA, "people can deceive themselves" often shades into "never believe that you have domain knowledge about a thing unless you can prove it".
I've been citing lesswrong.com/posts/uMQ3cqWD… to explain why the situation with AI looks doomy to me. But that post is relatively long, and emphasizes specific open technical problems over "the basics".
Here are 10 things I'd focus on if I were giving "the basics" on why I'm worried:
1. A common misconception is that the core danger is something murky about "agents" or about self-awareness.
Instead, I'd say that the danger is inherent to the nature of mental and physical action sequences that push the world toward some sufficiently-hard-to-reach state.
Call such sequences "plans". If you sampled a random plan from the space of all writable plans (weighted by length, in any extant formal language)...
Responding to someone who said he agreed with @AndrewYNg at the time that worrying about smarter-than-human AI was "like worrying about overpopulation on Mars", but now he thinks Mars is starting to fill up:
It really was a uniquely bad argument at the time.
It pumps on a bunch of intuitions, without arguing for a single one of them:
- Overpopulation isn't a problem on Earth, but people panicked about it in the 1970s and after. By analogy, AI risk is supposed to be an inherently silly thing to worry about.
But doubly silly because it's "on Mars"; so it's a non-issue 𝘢𝘯𝘥 it's a non-issue for the distant future to worry about.
- Humanity today isn't putting much effort into colonizing Mars. There isn't a huge industry building moon bases and mining asteroids and dreaming of Mars.
Proposal: try to learn things about alignment by training models that ONLY output offensive content.
This tests exactly the same things as 'trying to get models to never say mainstream-offensive things', but makes it less likely alignment gets confused with 'make LLMs bland'.
"AI alignment" / "Friendly AI" is actually AGI notkilleveryoneism. There's a genuine danger in equating "our chatbot didn't cause us to get sued or cancelled" with "we solved the alignment problem", or in blurring the lines between alignment and "AI ethics" or broad "AI safety".
And there's a double danger in making it sound like alignment is about making LLMs politically correct and blandly corporate (rather than about gaining the understanding required to reliably aim future dangerously-capable AGI systems at targets without killing everyone).
I'm not a big fan of the "takeoff" analogy for AGI. In real life, AGI doesn't need to "start on the ground". You can just figure out how to do AGI and find that the easy way to do AGI immediately gets you a model that's far smarter than any human. Less "takeoff", more "teleport".
AGI capabilities can then "take off" from that point, but the takeoff begins from outer space, not from subhuman or par-human capability levels.
Inventing something involves a 0-to-1 leap at the point of going from "this doesn't work" to "this does work now".
This is like suddenly teleporting to a new point in space.
Your prototype probably isn't optimal, so you can then "take off" from that new point in space.
But the prototype doesn't have to resemble any precursors, and doesn't have to be "some past invention but 50% better".
"hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren't delusional"
This, but unironically!
Like, yes, point taken, this feels like a bizarre situation to be in. And I agree with lesswrong.com/posts/uFNgRumr… that there are sane ways to slow progress to some degree, which are worth pursuing alongside alignment work and other ideas to cause the long-term future to go well.
But just because something sounds like sci-fi doesn't make it harder in-real-life.
Building AGI may be hard. Given AGI, however, building something a lot smarter than humans is very likely easy (because humans are dumb, evolution didn't optimize us for STEM, etc.).