[Posted here for feedback/comment; decided not to publish as-is, although parts of this have been or may be used in other essays.]

Will AI kill us all?

That question is being debated seriously by many smart people at the moment. Following Charles Mann, I’ll call them the wizards and the prophets: the prophets think that the risk from AI is so great that we should actively slow or stop progress on it; the wizards disagree.

Why even discuss this?

(If you are already very interested in this topic, you can skip this section.)

Some of my readers will be relieved that I am finally addressing AI risk. Others will think that an AI apocalypse is classic hysterical pessimist doomerism, and they will wonder why I am even dignifying it with a response, let alone taking it seriously.

A few reasons:

It’s important to take safety seriously

Safety is a value. New technologies really do create risk, and the more powerful we get, the bigger the risk. Making technology safer is a part of progress, and we should celebrate it. Doomer pessimism is generally wrong, but so is complacent optimism. We should be prescriptive, not descriptive optimists, embracing solutionism over complacency.

We shouldn’t dismiss arguments based on vibes

Or mood affiliation, or who is making the argument, or what kind of philosophy they seem to be coming from. Our goal is to see the truth clearly. And the fact that doomer arguments always been wrong doesn’t mean that this one is.

The AI prophets are not typical doomers

They are generally pro-technology, pro-human, and not fatalistic. Nor are they prone to authoritarianism; many lean libertarian. And their arguments are intelligent and thoroughly thought-out.

Many of the arguments against them are bad

Many people (not mentioned in this post) are not thinking clearly and are being fairly sloppy.

So I want to address this.

The argument

I boil it down to three main claims:

AI will become a superintelligent agent

It will be far smarter than any human being, quantitatively if not qualitatively. And some forms of the AI will have goal-directed behavior.

This does not require computers to be conscious (merely that they be able to do things that right now only conscious beings can do). It does not require them to have a qualitatively different form of “intelligence”: it could be enough for them to be as smart as a brilliant human, able to read everything ever written and have perfect recall of it, able to think 1000x faster, able to fork into teams that work on things simultaneously, etc.

The AI’s goals will not be aligned with ours

This is the principal-agent problem again. Whatever it is aiming at will not be exactly what we want. We won’t be able to give it perfect instructions. We will not be able to train it to obey the law. We won’t even be able to train it to follow basic human morality, like “don’t kill everyone.”

This does not require it to have free will to choose its goals, or otherwise to depart from following the training we have given it. Like a genie or a monkey’s paw, it might do exactly what we ask for, in a way that is not at all what we wanted—following the letter of our instructions, but destroying the spirit.

All our prevention and countermeasures will fail

If we test AI in a box before letting it out into the real world, our tests will miss crucial problems. If we try to keep it in a box forever, it will talk its way out (and by the way, we’re not even trying to do that). If we try to limit the AI’s power, it will evade those limitations. If we try to turn it off, it will stop us. If we try to use some AIs as police to watch the other AIs, they will instead collude with each other and conspire against us. In fact, it might anticipate all of the above and conclude that the easiest path is just to launch a sneak attack on humanity and kill us all to get us out of the way.

And whatever happens might happen so fast that we don’t get a chance to learn from failure. There will be no Hindenberg or Tacoma Narrows Bridge or Chernobyl as a cautionary example. There will be no warning shot, no failed robot uprising. The very first time AI takes action against us, it will wipe us all out.

Analogies

In “Four lenses on AI risks”, I gave the analogy that AI might be like expansionary Western empires when they clashed with other civilizations, or like humans when they arrived on the evolutionary scene, wiping out the Neanderthals and hunting many megafauna to extinction.

A related argument is that if you would be worried about an advanced alien civilization coming to Earth, you should worry about AI.

What’s different this time

People have always been worried that new technologies would cause catastrophe. But so far, technology has done far more good than harm overall. What might be different this time?

Related, why worry about AI instead of an asteroid impact, an antibiotic-resistant superbug, etc.?

The crux is the power of intelligence. Humans have been able so far to overcome every challenge because of the power of our intelligence. We can beat natural disasters: drought and famine, storm and flood. We can beat wild animals. We can beat bacteria and viruses. We can make cars, planes, drugs, and X-rays safe. Nature is no match for us because intelligence trumps everything. David Deutsch says that “anything not forbidden by the laws of nature is achievable, given the right knowledge.”

If AI goes rogue, we are for the first time up against an intelligent adversary. We’re not mastering indifferent nature; we’re potentially up against something that has a world-model, that can create and execute plans.

Arguably, the more optimistic you are about the ability of humans to overcome any challenge, the more worried you should be about any non-human thing gaining that same ability.

The crux is epistemic

Why do smart people disagree so much on this?

Eliezer is certain we are doomed. Zvi thinks it’s very likely. Scott Alexander gives it a 33% chance (which means we still have a 2/3 chance to survive!) On the other hand, Scott Aaronson implies that his probability is under 2%; Tyler Cowen says that we just can’t know, Pinker is dismissive of all the arguments.

I think the deepest crux here is epistemological: how well do we understand this issue, how much can we say about it, and what can we predict?

The prophets think that, based on the nature of intelligence, the entire argument above is obviously correct. Most of the argument can be boiled down to a simple syllogism: the superior intelligence is always in control; as soon as AI is more intelligent than we are, we are no longer in control.

The wizards think that we are more in a realm of Knightian uncertainty. There are too many unknown unknowns. We can’t make any confident projections of what will happen. Any attempt to do so is highly speculative. If we were to give equal weight to all hypotheses with equal evidence, there would be a epistemically unmanageable combinatorial explosion of scenarios to consider.

There is then a further disagreement about how to talk about such scenarios. Adherents of Bayesian epistemology want to put a probability on everything, no matter how far removed from evidence. Neo-Popperians like David Deutsch think that even suggesting such probabilities is irrational, that attempting inferences beyond the “reach” of our best explanations is unwarranted—appropriately, the term Popper used for this was “prophecy.”

Eliezer thinks that this is like orbital mechanics: we see an asteroid way out in the distance, we calculate its trajectory, we know from physics that it is going to destroy the Earth.

Why I’m skeptical of the prophecy

Orbital mechanics is very simple and well-understood. The situation with AI is complex and poorly understood.

What could a superintelligence really do? The prophets’ answer seems to be “pretty much anything.” Any sci-fi scenario you can imagine, like “diamondoid bacteria that infect all humans, then simultaneously release botulinum toxin.” In this view, as intelligence increases without limit, it approaches omnipotence. But this is not at all obvious to me.

The same view is behind the argument that all our prevention and countermeasures will fail: the AI will outsmart you, manipulate you, outmaneuver you, etc. As Scott Aaronson points out, this is a “fully general counterargument” to anything that might work.

When we think about Western empires or alien invasions, what makes one side superior is not raw intelligence, but the results of that intelligence compounded over time, in the form of science, technology, infrastructure, and wealth. Similarly, an unaided human is no match for most animals. AI, no matter how intelligent, will not start out with a compounding advantage.

Similarly, will we really have no ability to learn from mistakes? One of the prophets’ worries is “fast takeoff”, the idea that AI progress could go from ordinary to godlike literally overnight (perhaps through “recursive self-improvement”). But in reality, we seem to be seeing a “slow takeoff,” as some form of AI has arrived and we actually have time to talk and worry about it (even though Eliezer claims that fast takeoff has not yet been invalidated).

If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.

Proceed, with caution

We always have to act, even in the face of uncertainty—even Knightian uncertainty.

We also have to remember that the potential advantages of AI are as great as its risks. If it is as powerful as its worst critics fear, then it is also powerful enough to give us abundant clean energy, cheap manufacturing and construction, fast and safe transportation, and the cure for all disease. Remember that no matter what, we’re all going to die eventually, until and unless we cure aging itself.

If we did see an alien fleet approaching us, would we try to hide? If they weren’t even on course for us, but were going to pass us by, would we stay silent, or call out to them? Personally, I would want to meet them and to learn from them. And yes, without some evidence of hostile intent on their part, I would risk our civilization to not pass up that defining moment.

Scott Aaronson defines someone’s “Faust parameter” as “the maximum probability they’d accept of an existential catastrophe in order that we should all learn the answers to all of humanity’s greatest questions,” adding “I confess that my Faust parameter might be as high as 0.02.” I sympathize.

None of the above means “damn the torpedoes, full speed ahead.” Testing and AI safety work are all valuable. It is good to occasionally hold an Asilomar conference. It’s good to think through the safety implications of new developments before even working on them, as Kevin Esvelt did for the gene drive. We can do “reform” vs. “orthodox” AI safety. (And note that OpenAI spent several months testing GPT-4 before its release.)

So, proceed with caution. But proceed.

9

6 comments, sorted by Click to highlight new comments since: Today at 8:13 AM
New Comment

Thanks for the great article :-)

I am commenting as someone who has spent a lot of time thinking about AI alignment, and considers themselves convinced that there is a medium probability (~65%) of doom. I hope this is not intrusive on this forum!

I hadn't considered the crux to be epistemic, which is an interesting and important point.

I would be interested in an attempt to quantify how slowly humanity should be moving with this: Is the best level comparable to the one with genetic engineering, or nuclear weapon proliferation? Should we pause until our interpretability techniques are good enough so that we can extract algorithms from AlphaFold2?

I am also interested in possible evidence that would convince you of the orthodox ("Bostrom-Yudkowsky") view: what proofs/experiments would one need to observe to become convinced of that (or similar) models? I have found especially the POWER-seeking theorems and the resulting experiments enlightening.

Again, thank you for writing the article.

Thanks.

Rather than asking how fast or slow we should move, I think it's more useful to ask what preventative measures we can take, and then estimate which ones are worth the cost/delay. Merely pausing doesn't help if we aren't doing anything with that time. On the other hand, it could be worth a long pause and/or a high cost if there is some preventive measure we can take that would add significant safety.

I don't know offhand what would raise my p(doom), except for obvious things like smaller-scale misbehavior (financial fraud, a cyberattack) or dramatic technological acceleration from AI (genetic engineering, nanotech).

Merely pausing doesn't help if we aren't doing anything with that time.

True, I was insufficiently careful with my phrasing.

Great article! I think you expressed The Argument well and similarly to how I see it expressed by those who believe it.

I’m always surprised by how many tools are available to evaluate the argument…and that its fans rarely use any of them. It’s great to see you use some of these tools to critique it!  

By way of comment: at the same time, your article leaves the argument looking more plausible (to me) than it probably is, just because your critiques don’t include as many angles as it might from progress studies (especially the scientific method and the history of technology). My attempted survey of the possible angles, some but not all of which you tackle:

Most catastrophic risks have a lot of evidence to tell us how much we should worry about them (the history of infectious disease outbreaks, nuclear accidents and near-accidents, etc). The argument never comes with any evidence. Worse yet, it’s rarely presented as a hypothesis to be falsified, but instead as speculation. This is especially surprising because their main catastrophic scenario is an accident, and accidents are one of the most common and well-studied kinds of risk (auto accidents, the Tacoma Narrows Bridge, airplane accidents, policies for canceling ferries in dangerously bad weather, nuclear power plant accidents, accidents involving Covid in a Wuhan laboratory vs. Wuhan seafood market, etc). Accidents are studied by all sorts of people including actuaries, government technocrats, and popular authors. Successful predictions of catastrophe (or anything) are almost always based on evidence.

More generally, the argument is usually presented without any scholarship or context  outside of speculative philosophy. But there is lots of scholarship to know (beyond the above) from the histories of technology, human well-being, and predictions of apocalypse, and probably many other domains. 

A cost-benefit analysis would be needed if the argument were to be made credible. Lifespans are about 35 years shorter in poor countries than they are for Japanese and Swiss women, and about 15 years longer for the richest US females than the poorest US males, so it’s a good estimate that 25+ years of life are lost by the average person due to risks that can be attacked by anti-poverty, public-health, and economic growth measures alone. Peter Attia is probably right that exercise, sleep, and food account for another 10 years. As you say, the argument glibly assumes that AI will solve pretty much any problem it needs to solve to kill us all. We have no reason to believe that, but those who do surely should also believe that the AI will solve any problem it needs to to gain that 35+ years of life for the average person among the 8 billion of us. At this rate, even Scott’s estimated 2% risk of an AI apocalypse looks like a bargain. The context provided by cost-benefit analysis also reminds us of where we ourselves should focus our attention. And of course the likely upside of AI doesn’t just depend on a glib assumption of AI capabilities — AI is a general purpose technology, so progress studies tells us something about what upside to expect. 

Finally, the argument is rarely presented with a plausible mechanism.

What could a superintelligence really do? The prophets’ answer seems to be “pretty much anything.” Any sci-fi scenario you can imagine, like “diamondoid bacteria that infect all humans, then simultaneously release botulinum toxin.” In this view, as intelligence increases without limit, it approaches omnipotence. But this is not at all obvious to me.


The idea of creating ASI as an omnipotent being, far superior and all-knowing, strikes me as a pseudo-religious argument wrapped in technical and rational language that makes it palatable to atheists. It's a bit like how the wildest predictions from longevity/curing aging feel a bit like heaven for people who don't believe in god.

I get how to get from ANI to AGI and then to ASI. It makes sense. But at the same time, something about it doesn't. Perhaps this is why this position (AGI as a harbinger for extinction) lacks mainstream appeal. 

If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.


Is this rationalists anthropomorphizing AI to behave/think like they thing, perhaps?

Is this rationalists anthropomorphizing AI to behave/think like they thing, perhaps?

As someone who thinks AI doom is fairly likely (~65%), I reject this as psychologizing.

I think there is an argument for TAI x-risk which takes progress seriously. The transformative AI does not need to be omnipotent or all-knowing: it simply needs to be more advanced than the capability humanity can muster against it.

Consider the United States versus the world population from 1200: roughly the same size. But if you pitted those two actors against each other in a conflict, it is very clear who would win.

So either one would need to believe that current humanity is very near the ceiling of capability, or that we are not able to create more capable beings. (Which, in narrow domains, has turned out false, and the range of those domains appear to be expanding).

If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.

I claim this is not so outlandish, the current US would win against the 13th century 1000/1000 times. And here's a fairly fine-grained scenario detailing how that could happen with a single agent trapped on the cloud.

But—it need not be that strict a framing. Humanity losing control might look much more prosaic: We integrate AI systems into the economy, which then over time glides out of our control.

In general, when considering what AI systems will act like, I try to simulate the actions of a plan-evaluatior, perhaps an outlandishly powerful one.

Edit: Tried to make this comment less snarky.