Thanks for the great article :-)
I am commenting as someone who has spent a lot of time thinking about AI alignment, and considers themselves convinced that there is a medium probability (~65%) of doom. I hope this is not intrusive on this forum!
I hadn't considered the crux to be epistemic, which is an interesting and important point.
I would be interested in an attempt to quantify how slowly humanity should be moving with this: Is the best level comparable to the one with genetic engineering, or nuclear weapon proliferation? Should we pause until our interpretability techniques are good enough so that we can extract algorithms from AlphaFold2?
I am also interested in possible evidence that would convince you of the orthodox ("Bostrom-Yudkowsky") view: what proofs/experiments would one need to observe to become convinced of that (or similar) models? I have found especially the POWER-seeking theorems and the resulting experiments enlightening.
Again, thank you for writing the article.
Is this rationalists anthropomorphizing AI to behave/think like they thing, perhaps?
As someone who thinks AI doom is fairly likely (~65%), I reject this as psychologizing.
I think there is an argument for TAI x-risk which takes progress seriously. The transformative AI does not need to be omnipotent or all-knowing: it simply needs to be more advanced than the capability humanity can muster against it.
Consider the United States versus the world population from 1200: roughly the same size. But if you pitted those two actors against each other in a conflict, it is very clear who would win.
So either one would need to believe that current humanity is very near the ceiling of capability, or that we are not able to create more capable beings. (Which, in narrow domains, has turned out false, and the range of those domains appear to be expanding).
If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.
I claim this is not so outlandish, the current US would win against the 13th century 1000/1000 times. And here's a fairly fine-grained scenario detailing how that could happen with a single agent trapped on the cloud.
But—it need not be that strict a framing. Humanity losing control might look much more prosaic: We integrate AI systems into the economy, which then over time glides out of our control.
In general, when considering what AI systems will act like, I try to simulate the actions of a plan-evaluatior, perhaps an outlandishly powerful one.
Edit: Tried to make this comment less snarky.
True, I was insufficiently careful with my phrasing.