My catch all thread for this discussion of AI risk in relation to Critical Rationalism, to summarize what's happened so far and how to go forward, from here.
I started by simply stating that I thought that the arguments that I had heard so far don't hold up, and seeing if anyone was interested in going into it in depth with me.

So far, a few people have engaged pretty extensively with me, for instance, scheduling video calls to talk about some of the stuff, or long private chats.

(Links to some of those that are public at the bottom of the thread.)
But in addition to that, there has been a much more sprawling conversation happening on twitter, involving a much larger number of people.
Having talked to a number of people, I then offered a paraphrase of the basic counter that I was hearing from people of the Crit Rat persuasion.

Folks offered some nit-picks, as I requested, but unless I missed some, no one objected to this as a good high level summary of the argument for why AI risk is not a concern (or no more of a concern than that of "unaligned people").
I spent a few days and wrote up a counter-counter augment, stating why I thought the that story doesn't actually hold up.

docs.google.com/document/d/1UA…

The very short version:

1. Criticism of goals is always in terms of other goals
2. There are multiple stable equilibria in the space of goal structures, because agents generally prefer to keep whatever terminal goals they have. And because of this, there is path-dependency in goal structures.
3. AGIs will start from different "seed goals" than humans, and therefore reach different goal equilibria than humans and human cultures do, even if AGIs are criticizing and improving their goals.
My hope is, that in outlining how _I_ think goal criticism works, folks who think I'm wrong can outline an alternative story for how it works instead, that doesn't lead to doom.
Multiple people requested that I write up the positive case for AI doom (not just a counter-counter argument).

So, after taking into consideration threads from the previous document, and my conversations with people, I wrote up a second document, in which I outline...
why I expect AGIs to be hostile to humans, starting from very general principles.

docs.google.com/document/d/1D3…
The basic argument is:

1. Conflict is common, and violence is the default solution to conflict. Non-violent solutions are found only when one of two conditions obtain for agents in conflict, either non-violence is less costly than violence, or the agents...
...intrinsically care about the well being of the other agents in conflict.
2. For sufficiently advanced AGIs, violence will not be cheaper than non-violence.
3. By default, there are strong reasons to think that AGIs won't intrinsically care about human beings.
Therefore, we should expect sufficiently advanced AGIs to be hostile to humans.
This second essay, is, in my opinion, somewhat crisper, and less hand-wavy, so it might be a better place to start. I'm not sure.
Some things that would be helpful / interesting for me, going forward in this conversation:

1) Presuming you disagree with me about the conclusion, I would like to know which specific logical link in the "on hostility" argument doesn't hold.
2) Alternatively, I am super interested in if anyone has an alternative account of goal criticism that doesn't entail multiple equilibria in goal-structure space, so that all agents converge to the same morality in the limit.
(An account that is detailed enough that we can work through examples together, and I can see how we get the convergence in the limit.)
3) If folks have counterarguments that don't fit neatly into either of those frames, that also sounds great.

However, I request that you first paraphrase my counter-counterargument to my satisfaction, before offering third order counter arguments.
That is, I want to make sure that we're all on the same page about what the argument I'm making IS, before trying to refute and/or defend it.
I would be exited if people wrote posts, for those things, and am likewise excited to meet with people on calls for 1, 2, or 3.
There are also a bunch of other threads about Bayesianism and Universality and the technical nature of an explanation and the foundation of epistemology, that are also weaving in and out here.
I'm currently treating those as separate threads until they prove themselves relevant to this particular discussion on AI risk.
I also don't know what's most interesting to other people, in this space. Feel free to drop comments saying what YOU'RE hoping for.
Some public, in-depth conversations:

With @ella_hoeppner (Sorry about the volume differential. I think I'm just too loud : / )

Whoops! I linked to entirely the wrong document here.

This is the right link: docs.google.com/document/d/12b…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Eli Tyre

Eli Tyre Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @EpistemicHope

10 Jan
I am increasingly impressed with @robertskmiles's videos on AI safety topics.

They're a really fantastic resource, since they're well explained, and it is much easier to ask a person to watch a youtube video than it is to read a long series of blog posts, or even worse, a book.
(In a conversation, it is feasible to just sit down with a person and watch a 15 minute video together at 1.5 speed, and then dive back into discussion, in a way that is is a lot less feasible to say "read this", and sit there while they rush through a post or three.)
This one on the orthogonality thesis is solid.

Read 7 tweets
9 Jan
My understanding is that there was a 10 year period starting around 1868, in which South Carolina's legislature was mostly black, and when the universities were integrated (causing most white students to leave), before the Dixiecrats regained power.
I would like to find a relatively non-partisan account of this period.

Anyone have suggestions?
Read 5 tweets
9 Jan
Alright. I wrote up another essay outlining the argument from AI doom, pretty much from the top.

docs.google.com/document/d/1D3…

This one is about half as long as the other one, and (I think) somewhat crisper and more legible in its argumentation.
(Though you might still need the other one to get why "it will criticize its own goals", doesn't mean that it will get anything like human morals.)
So I overall, I more strongly recommend that people read this one.
Read 6 tweets
8 Jan
Ok, everyone. I wrote up my first draft of my counterargument to the Critical Rationalist argument against AI risk.

My hope is that folks will read this document carefully, and leave comments, noting which specific claims of mine seem false, and, if you think some part of my story is wrong, outlining how it works instead.

docs.google.com/document/d/12b…
I've done my best to state things clearly and in detail. But probably some parts of this will be unclear and we will run into more miscommunications.

Nevertheless, it seems better to post it, and see what those miscommunications are, and then I can try to clarify them.
Read 7 tweets
5 Jan
In the past few months I've shifted my implicit thinking about meditation and enlightenment.

I've gone from thinking:

"Enlightenment is probably a real thing, and probably related to processing epistemic technical debt somehow.
Probably it also has something to do with noticing the 'edges' of how you're projecting your reality, and getting a visceral sense of the difference between 'the movie' and 'the screen the movie is projected on.'
In particular, enlightenment (probably) is or is the result of progressing far enough down a particular psychological axis, in the "good direction".
Read 22 tweets
5 Jan
Ok. I've been trying to have this conversation on twitter, and its been...difficult, so far.

The nature of twitter, and the number of people involved, has caused me to neglect some of the basic moves for good conversation.

Including many moves that I actively teach. Embarrassing!

In particular, given the number of people responding to me I've fallen into a pattern of giving counter arguments to specific, false (in my view) claims, without checking / showing that I've understood the claims.
So (aided by @VictorLevoso's example in a private correspondence), I'm going to offer a paraphrase of my current understanding of the Crit Rat view on AI risk, in a central place where everyone can respond at once.
Read 29 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!