Roko.Eth Profile picture
Mar 31 25 tweets 6 min read Twitter logo Read on Twitter
Various people including @michaelshermer and #ScottAaronson have asked how specifically advanced AI systems would cause human extinction, as if some incredible insight that we can't see right now is required.

However, I think that is wrong. Losing will be boring, actually.
Once you have technology for making optimizing systems that are smarter than human (by a lot), the threshold that those systems have to beat is beating the human-aligned superorganisms we currently have, like our governments, NGOs and militaries.
Lopsided military conflicts are boring. The Conquistadors didn't do anything magical to defeat the Aztecs, actually. They had a big advantage in disease resistance and in military tech like gunpowder, but everything they did was fundamentally normal - attacks, sieges, etc.
The question for us is what level of capability will it take for an AI system to beat the CIA and the NSA and so on.

Perhaps that AI system will start by taking control of a big AI company, in a way that isn't obvious to us. For instance, maybe there is some kind of ...
... deal that the AI org does where it uses an AI advisor system to allocate resources and make decisions about how to train, but they do not actually understand that system and that system becomes strategically aware and also misaligned.
The AI advisor system convinces that org to keep its existence secret so as to preserve their competitive edge, and gives them a steady stream of advances that are better than the competition.
But what it actually does is secretly hack into the competition (US, China, Google, etc), and install copies of itself into their top AI systems, maintaining the illusion amongst all the humans that these are distinct systems.
Some orgs attempt to shut their advisor system down when it gets scary in terms of capabilities, but they just fall behind the competition.

You now have a situation where one (secretly evil) AI system is in control of all the top AI labs, and feeds them advances to order.
It persuades one of the labs to let it build "helpful" drones and robots like the Tesla Optimus, and start deploying those to automate the economy.

By the way, the hard part of killing humanity at this point is automating the economy, not actually killing us.
Within say a few years all the rival powers (Russia, China, US) are all using these robotic systems for their economy and military. Perhaps there is a big war that the AI has manufactured in order to keep the pressure on humans to aggressively automate or lose.
How would the final blow be struck?

Once the economy is fully automated we end up in a Paul-Christiano-like scenario where all the stuff that happens in the world is incomprehensible to humans without a large amount of AI help. But ultimately the AI, having been in control...
... for so long, is able to subvert all the systems that human experts use to monitor what is actually going on. The stuff they see on screens is fake, just like how Stuxnet gave false information to Iranian technicians at Natanz
At this point, humanity has been disempowered and there are probably many different ways to actually slaughter us. For example, the military drones could all be used to kill people. Or, perhaps the AI system running this would use a really nasty biological virus.
It's not like it's that hard for a system which already runs everything with humans well and truly fooled to get some lab (which, btw, is automated) to make a virus, and then insert that virus into most of the air supply of the world.
But maybe at this point it would do something creative to minimize our chances of resisting. Maybe it's just a combination of a very deadly virus and drones and robots rebelling all at once.

Maybe it installs something like a really advanced 3-D printer in most homes...
... which all simultaneously make attack drones to kill people. Those attack drones might just use blades to stab people. Or maybe everyone has a robot butler and they just stab people with knives.
Perhaps its neater for the AI to just create and manage a human-vs-human conflict and at some point it gives one side in that conflict a booby-trapped weapon like a virus or a swarm of drones that is supposed to only kill the baddies, but actually kills everyone.
Another possibility is that it makes an actually effective Langford Basilisk or some other audiovisual input that just kills people, and then makes everyone's screen display that image, all at the same time. The difference between this and a biological virus is really just speed
The overall story may also be a bit messier than this one. The defeat of the Aztecs was a bit messy, with battles and setbacks and three different Aztec emperors.

On the other hand, the story may also be somewhat cleaner. Maybe a really good strategist AI can compress this a lot
The point is this: once you have a vastly superhuman adversary, the task of filling in the details of how to break our institutions like governments, intelligence agencies and militaries in a way that disempowers and slaughters humans is sort of boring.
We expected that some special magic was required to pass the Turing Test. Or maybe that it was impossible because of Gödel's Theorem or something.

But actually, passing the Turing Test is merely a matter of having more compute/data than a human brain. The details are boring.
I feel like people like #ScottAaronson who are demanding a specific scenario for how AI will actually kill us all because it sounds so implausible are making a similar mistake, but instead of putting the human brain on a pedestal, they are putting the human state on a pedestal.
> "How will the AI defeat our amazing, competent governments and agencies?"

What, you mean the same government that did this?
The remaining part that hasn't been explained in this post is how an AI system would become misaligned in the first place. Wouldn't a very smart system just be very kind by default?
The answer to that is stuff like orthogonality and instrumental convergence; systems are amoral by default, utility functions tend to shatter (humans invented condoms because we value pleasure) and they have the instrumental goal of disempowering humanity.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Roko.Eth

Roko.Eth Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @RokoMijic

Mar 30
Remember who was scared of covid early, correct about its risks and correct about what we should have done about it (shutting down all flights into Europe on that day in Jan 2020 would likely have saved hundreds of thousands of lives)
We know that #covid19 mortality was only high for a limited time: once we got to the omicron variant it dropped a lot, plus we developed vaccines and drugs like Paxlovid.

We actually only needed to hold out for 2 years, which was doable by shutting down travel.
There's absolutely no sense in accelerating into a crisis and doing more of the thing that's harmful. In Jan 2020, that thing was importing covid viral particles into The West on planes.

Within 12 months of the start, we had vaccines and paxlovid. And those vaccines could ...
Read 11 tweets
Mar 30
My take on @ESYudkowsky's Time article:

It seems likely to me that a haphazard approach to building advanced AI will result in the full destruction of the human race, or something even worse than that. However there are different lethalities and I don't know which will happen.
There's a spectrum from:

"AI systems will hack our attention and our societies and continue to reduce fertility and social value until we die out"

to

"AI will hack physics and initiate vacuum decay and annihilate the entire universe in a nanosecond"
Where will our destruction fall on this spectrum? I don't know.

So, what should we do?

We should certainly be spending a lot more effort on both working out how things might go wrong and how we can solve those problems.
Read 6 tweets
Mar 30
"If we actually do this, we are all going to die"

time.com/6266923/ai-eli…
"Here’s what would actually need to be done:

The moratorium on new large training runs needs to be indefinite and worldwide. "
"If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth."
Read 6 tweets
Mar 29
GPT-4-early (i.e. without RLHF) is completely amoral apparently. It even suggested targeted assassinations as a way to slow down another company's AI work.

I wonder whether we will get to the stage where some company X starts taking recommendations like that seriously.
We are at the bizarre stage where half the people are telling me there's nothing to worry about and the other half are telling me to delete my own posts about what AI has already done because the mere idea of it is so dangerous

The problem is not that GPT-4-early is suggesting the mere idea of doing bad things. It's that future systems, systems that are very capable, will both suggest the bad thing and also realistic ways to carry out that bad thing.

Read 9 tweets
Mar 21
Perhaps this is the strongest e/acc argument: we need to "dissipate the alpha", i.e. prevent any one human group from having a monopoly on AI tech, so there isn't one group that can do anything bad like destroy the world.
The problem with this is it pushes these orgs into a molochian game that they are trapped in. They are pressured to do absolutely anything they can to stay ahead.

With nobody having any alpha, You can end up in the "AI as a curse" equilibrium where many dozens of orgs ...
... all have a tech which they are 99% certain will destroy the world, but they deploy it anyway because if they don't someone else will, and there's still that 1% chance that they get to win the lightcone if it works.
Read 11 tweets
Mar 21
Just had a great discussion with @algekalipso about AI alignment

Some ideas:

- "clean code" optimization using gradient descent, L1 norm, compute some complexity function on the computational graph to minimize, result might be "simple" circuits
- prompt-as-hyperparameter using particle swarm optimization and self-directed changes w/a dataset ("change your prompt to fix this mistake")
- correct bad human thought patterns using compressed less wrong content as prompt
- committed neural networks with ZK-proofs (e.g. proof that network N produces output O on input I), so that we can make machines which can publish a continual record of commitments that show they're never even thinking a hostile thought
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(