Including many moves that I actively teach. Embarrassing!
In particular, given the number of people responding to me I've fallen into a pattern of giving counter arguments to specific, false (in my view) claims, without checking / showing that I've understood the claims.
So (aided by @VictorLevoso's example in a private correspondence), I'm going to offer a paraphrase of my current understanding of the Crit Rat view on AI risk, in a central place where everyone can respond at once.
This is my first draft, and I'll make more drafts as needed if this version doesn't capture what's important about the key claims from your perspective.
Also, if you feel like this leaves out some big separate argument, you can link me to it.
One thing that is not helpful at this stage is arguing for this view, if you feel like my statement of it is solid. Let's make sure I understand it before we disagree about it.
I'm going to specifically hold of on responding to this view, until I have gotten broad confirmation that I have understood all the important parts.
ELI'S PARAPHRASE OF THE CRIT RAT STORY ABOUT AGI AND AI RISK
There are two things that you might call "AI".
The first is non-general AI, which is a program that follows some pre-set algorithm to solve a pre-set problem. This includes modern ML.
There might be some risk from powerful, but non-general, non-creative AI systems, and it seems fine to think about that some.
But it non-creative systems are not extremely dangerous: they won't be strategic creative adversaries.
The second kind of thing called AI, is what we would properly call AGI, is different.
An AGI is a program that is CREATIVE, which means that it can generate new explanatory theories as well as create and criticize ideas.
A creative entity is "universal" in the sense of "universal computation": It can comprehend any idea that can be represented by a turning machine.
Creative entities are PEOPLE.
The only thing that we know of (so far) that is creative in is individual humans.
Indeed.
The "goals" of a creative entity are ideas, in its mind, like other ideas. They're not some stable permanent thing, that is hard-coded.
Creative entities consider and criticize different possible goals, and through that process tend to change (indeed, improve) their goals over time.
That's NOT to say that the starting conditions of an agent's goals don't matter. But they aren't constraining.
A human has some innate predispositions from evolution (things like fear of heights or a propensity to violence), we can overcome those predispositions.
(An example that @iamFilos has given several times is one of learning to find sky-diving exhilarating instead of terrifying.)
Similarly, AGIs would presumably have some innate predispositions that result from the parochial details of their design process, but they would likewise be able to overcome those predispositions and change their goals.
Therefore, if a person tries "aligning" an AGI, that must mean inhibiting the ability of the AGI to criticize and change some subset of it's ideas, namely its values.
This seems like it must have the outcome of either
1) breaking the basic general creativity of the AGI mind
or
2) Creating some kind of slave-monstrosity that has the values that you chose at the time frozen in
And in fact, we already know how to live safely in a society with creative entities (people).
We raise them (lovingly), and educate them. We give them moral arguments, and have them reason for themselves about what is good to do.
We don't try to inhibit their ability to change their goals.
Soon they will be participating in the great process of figuring out which explanations are best, and helping us to generate new moral knowledge!
There IS some risk that an AI system will get "hung up" on some set of ideas for some amount of time, in the same way that humans can get hung up on a set of ideas, and this manifests as our neuroses.
If AIs are particularly powerful, this seems like it could be quite bad.
Because it might to do a lot of damage before its epistemic process got unstuck.
But, that seems like a far cry from "the default outcome is doom."
In the past few months I've shifted my implicit thinking about meditation and enlightenment.
I've gone from thinking:
"Enlightenment is probably a real thing, and probably related to processing epistemic technical debt somehow.
Probably it also has something to do with noticing the 'edges' of how you're projecting your reality, and getting a visceral sense of the difference between 'the movie' and 'the screen the movie is projected on.'
In particular, enlightenment (probably) is or is the result of progressing far enough down a particular psychological axis, in the "good direction".
By the way, everyone-who's-disagreeing-with-me-about-AI-risk-on-twitter,
This video is a great introduction to the problem as I, and others I know, think of it. So if you want to make counter arguments, it might be helpful to respond to it.
You might dispute some part of this framing, but it would be good to understand why I'm / we're using it in the first place.
(For instance, it isn't an arbitrary choice to represent goals as a utility function. It solves a specific problem of formalization.)
And if you want to go further than that, @robertskmiles, makes excellent explainer videos on more specific AI Risk problems.
His youtube channel is my go-to recommendation for people who are trying to get up to speed on the shape of the problem.
This quoted text seems really important. How societies and individual institutions adapt to the pandemic, is probably the thing that dominates the "sign" of the impact of the pandemic.
I agree that COVID does seem to be right in our Goldilocks zone: not civilization-hobbling in the long term, but bad enough to cause us collectively to take notice and (ideally) to face up to and correct the flaws in our systems.
It's extreme enough that we have to try possibly radical ideas that wouldn't usually see the light of day in order to succeed.
But it looks like that barely happened at all. It seemed like there was very little innovation.
Similarly, if you think I'm foundationally confused, or my frame here is not even wrong, I'd also love to hear that.
I'm aware that the are mathematical Crit Rat critiques that claim to undermine Bayes. I'll also want those eventually, but I'm considering that a separate thread that I'll take in sequence.
So feel free to send me links to that sort of thing, but I won't engage with them, yet.
The most unrealistic thing about an iron man suit?
The fingers!
There's not that much space between your digits. It would be uncomfortable and impractical to put layers of metal in those gaps. And if you did, they would be too thin to provide much protection.
And the fingers a also have to bend, which means you have even less space for material, and even less protection.
It would make much more sense if the gloves of the iron man suit were like mittens, with all the fingers in one chunk. Then you can put strong layers of metal around all the fingers at once.
I had a dream in which I considered tweeting to ask Dick Grayson why he became a police officer, when he was already Nightwing (which is kind of a substitute for a police officer).
But then I realized that I couldn't do that because it would reveal is secret identify.
Only later did I realize that I couldn't do that because it Dick Grayson is fictional.
But nevertheless, I am still left with the original question. Wouldn't it better to put your resources into one crime-fighting profession or the other?