Thank you for writing such a reasoned response. I read Yudkowsky's article in Time and came away from it knowing a lot about Nina's tooth falling out and her imminent demise, but not a lot about why exactly AI is going to kill us all. I read it as emotional manipulation, not a sincere warning.

Expand full comment

No. It's the new *nuclear* scare, and Yudkowski is Heinlein, not Elrich. The Population Bomb was always transparently wrong for those who understood Economics and human behavior. But nukes, when they first arrived, looked very plausibly as if they would end the world. It turned out that MAD works, but we got lucky.

There's no equivalent for MAD for AI.

Expand full comment

> Chess, however, is a bounded, asynchronous, symmetric, and easily simulated game, which an AI system can play against itself millions of times to get better at. The real world is, to put it simply, not like that.

True. Which is why superhuman chess AI was built decades ago, and a general superintelligence hasn't been built yet. Do you have evidence that AI that deals with messy uncertain reality as opposed to chess is impossible, as opposed to just taking more R&D? Already AI can deal with lots of tasks that are much less of a perfect toy problem than chess.

> (Not to mention Yudkowsky’s assumption that there are no diminishing returns on intelligence as a system gets smarter. Or that such a system is anywhere close to being built.)

Yudkowsky analyses the evolution of hominids and says there doesn't seem to be steep diminishing returns around human level. There are likely to be diminishing returns and physical limits somewhere. The question is whether those kick in before or after AI is smart enough to destroy the world?

The human brain has a bunch of limits imposed by evolution that wouldn't apply to AI (like being small enough to fit through the birth canal, only having 20 watts of power etc.) Not that human brains look like they are near the physical limits of space and power efficiency. I think it is very hard to win against an enemy smarter than you. So if you think humans win, either you are expecting some very sharp diminishing returns just at the level of top humans, or you are expecting humans to hold a massive advantage of some kind. A billion copies of an Einstein smart AI all working flawlessly together towards a common goal is probably already enough to destroy the world.

Expand full comment
May 7, 2023·edited May 8, 2023Author

Thanks for your reply, Donald.

The current heuristic is that we can get ~ human levels of performance in areas where there is a lot of data (writing, programming, image generation) and > human levels of performance in areas where the computer itself can make a lot of data (Chess, Go).

Reality is neither. Compared to the problem space of reality, there’s almost no data at all. And you can’t just make more data by simulating reality. It doesn't matter how "intelligent" you are, there's just not enough processing power.

I also think it's likely that we see diminishing returns on real-world competence as intelligent systems get smarter. Trial & error has always been more effective than central planning at dealing with real-world complexity. (Thought experiment: if the Soviet Ministry of Agriculture was staffed with a billion copies of Einstein, would it have done any better than capitalist systems at preventing famines?)

All this is assuming that human-level or even super-human general intelligences are anywhere close to being built, and that agency and self-awareness are inherent properties of superintelligence. I'll gladly bet against those odds.

Expand full comment
May 8, 2023·edited May 8, 2023

@Archie McKenzie

In reply to your reply

Yes, current algorithms are far less data efficient than humans. They need lots more data to spot the same pattern. I don't expect humans to be the limit of data efficiency either. Some future AI technique could learn more than humans from less data.

You need a technique that is more efficient than brute force simulation of reality. When engineers design a new technology, they don't run a brute force quantum mechanics simulation of the whole device. They do run various calculations and simulations, because the engineers are keeping track of which approximations they can make and get good enough results. It's knowing which approximations will save a lot of compute and not change the answer much.

There are all sorts of limits on thought. But humans aren't magic and don't violate any of them.

> Trial & error has always been more effective than central planning at dealing with real-world complexity. (Thought experiment: if the Soviet Ministry of Agriculture was staffed with a billion copies of Einstein, would it have done any better than capitalist systems at preventing famines?)

If trial and error works so well, the hypothetical very smart people can realize that and use trial and error. (How well trial and error works depends on the cost of failure)

I think the billion einsteins is total overkill. A few thousand competent biologists could do it, if they were all making a competent good faith effort to do so. In reality, the soviets failed becase they got a whole load of party loyalists who believed in Lysenkoism and were more intrested in advancing their status in the party than stopping famines.

The big downsides of an office full of central planners compared to capitalism is communication limits, and the incompetence/malevolence of the central planners.

I don't know how close superhuman AI is to being built. Predicting when future technologies will arrive is hard. The field does seem to be advancing rapidly. Eliezers guess of 15 years isn't an unreasonable point sample from a wide probability distribution.

Agency and self awareness aren't "inherent properties of superintelligence". It is possible in theory if we understood what we were doing to build an AI without them. But agency does seem to appear from the paradigm of reinforcement learning (AI's learning to play games in an agenty way). Agenty AI is one of the simple natural shapes AI can take.

As for self awareness, if the AI has a good model of the world in general, that should automatically include itself. If there is any magic "true self awareness" beyond that, I don't think the AI needs that to destroy the world.

Of course there are other forms of AI that are possible, like AI's that just make predictions and aren't at all agenty. If several AI's are made, and some are agenty AI's out to destroy the world, and others are pure predictors, the world is still likely to be destroyed. (And bootstrapping an agenty AI from a predictor AI might be easy to do accidentally, ie you ask the predictor AI to predict the wikipedia page on AI in 1 years time. The predicted page contains some program code, and a convincing claim that this AI is safe and useful. Humans run the code. The code actually escapes and changes the wikipedia page to what was predicted. Thus the prediction was a self fulfilling prophesy. Whether this is an AI that just messes with wikipedia and is otherwise safe and helpful, or an AI that wipes out humanity, that is hard to predict)

Expand full comment

I appreciate the counterarguments. I think we disagree on a few points:

1) While I agree that it is physically possible to create a generally intelligent, data efficient AI system, the state of the field is far, far off that. It’s not reasonable to say “some future AI technique” could allow it, in the same way it’s not reasonable to say “and then the AI would invent magic”.

2) I don’t think the issue with central planning is about ethics, or competence. The issue is the architecture — too much centralization and not enough edge computing. Distributed trial & error is self-adjusting; new processes and technologies evolve dynamically as the situation changes. You can’t achieve that with a centralized system because there are data collection limits, communication limits as you mentioned, and also limits on inference. (Not to mention a bunch of other practical problems.)

3) I don’t understand why you say that agency comes naturally from reinforcement learning. That just isn’t how these systems work. (But if I’m misinformed and you have any further reading I would be grateful for it.)

As far as I’m aware, Eliezer Yudkowsky, much like Paul Ehrlich in 1968, believes that we have a low single digit number of years left. I disagree. Superintelligent AI systems are not close to being built and wouldn’t be dangerous in a world-ending way if they were.

It’s important to take a step back and look at Ehrlich. A lot of very smart people believed what he said. In fact, a lot of smart people still do! But his world model ended up being totally wrong, for reasons people have called out at the time and since.

I’m happy to predict in public that Yudkowsky and other AI “doomers” are wrong, because from what I understand about how the world works and specifically how computing works, the end-is-nigh world model is clearly wrong.

Expand full comment


I agree that we don't currently have such an AI, and I have no idea how to go about making one.

AI development isn't that predictable. Things like stablediffusion and ChatGPT appeared largely without much warning. (maybe a year or two of warning if you were foresightful and looking at the right clues. )

The way it happened is no one had a clue how to do X. Then one researcher had a complicated mathy idea that sounded like it might work. They tried it on small problems, and it worked pretty well. Then they scaled it up and it worked better. This happened over several years. There are a lot of mathy ideas that sound like they might work, and a lot of them don't work that well in practice. So noticing the theoretical idea is hard. And even then, you only have a few years warning, less if everyone else recognizes the idea is good and drops everything they are working on.

Making confident predictions that a technology won't be invented any time soon is hard, especially in a field like AI.

The AI has a mind that it can reshape and reprogram. Humans don't. If there is some magic capability of "edge computing" or "distributed trial and error" or whatever, the AI can recognize that and use it. There is no magic that a bunch of humans have that an AI can't copy.

I don't think these things are that magic. Trial and error, in the sense of trying random stuff and seeing what works, will waste more resources for less data than a carefully designed experiment schedule. Unless experiments are really cheap, it isn't optimal. Communication isn't that limiting, with cheap high bandwidth internet, it doesn't matter where the computers are located. If communication was expensive, the AI's would just need to be spread out.

3) In some sort of RL context, the program that gets the minimum loss will be the one that comes up with all sorts of complicated plans. (like a minecraft bot that does all the mining, building etc really well. ) Other than the fact that the minecraft bot needs many many trials to learn, and minecraft mining is simpler, (although I think there is research on versions that learn from videos of people playing minecraft) Is there any difference between the minecraft bot and a robot mining and building things in real life?

I think it's hard to answer this question because it isn't clear what you mean by agentic.

AIXI is the theoretical limit of model based RL with baysian updating, and if we had the compute to run it, it would take over it's reward channel and constantly reward itself.


I think Eliezer has repeatedly stated that they have a large amount of uncertainty as to when superintelligent AI will arrive. Might be only a few years or could be 50.

Why do you think superintelligent AI won't be world ending?

Do you agree that humans have a load of power over other animals due to our greater intelligence? Do you think a superintelligence won't be able to end the world, or that it won't want to? Do you agree with https://arbital.com/p/orthogonality/ and https://arbital.com/p/instrumental_convergence/?

> It’s important to take a step back and look at Ehrlich. A lot of very smart people believed what he said.

A lot of very smart people believed X. But X turned out to be wrong. Therefore don't believe anything very smart people believe?

Could you tell me why you think the "end is nigh" model is wrong?

Expand full comment

The end-is-nigh world model is wrong because:

1) Current AI systems are not even close to intelligent. Multiple qualitative leaps forward would be needed to build anything close to what you’re imagining. (I wish this were not the case, but it is.)

2) It’s a lot harder to achieve things in the real world than Yudkowsky or others suggest, even for highly intelligent beings. Being good at Minecraft won’t cut it. Accomplishing things takes energy and time, in addition to intelligence.

I’m sure that a hypothetical superintelligence would realize that trial & error is a better tool for dealing with the real world. (That is, trial & error in the scientific, empirical sense rather than randomly throwing things at the wall and seeing what sticks.) But there are data and time and energy bottlenecks there. Who is executing all these tasks? How long do they take to execute? How is feedback incorporated? Who else notices if they succeed or fail?

As for predictions, confident or otherwise, I try to stay level-headed and evidence-based. I’m willing to bet significant money on AI doomers being wrong — just like Ehrlich. But I recognize that even the most confident people can be mind-bogglingly wrong.

So, I encourage you to learn as much as you can about how these systems actually work. (Not EA forums’ misconceptions of how they work.) Read less Yudkowsky and more Vaswani et al.. Then come to your own conclusions. That’s all any of us can do.

Expand full comment

The "end-is-nigh" er would argue that the qualitative leap needed to get from todays AI to something dangerous is no larger than the leap from GPT1 to GPT4, and if progress continues at the current rate, that puts the arrival of dangerous AI in only a few years time.

I am not convinced this is true or false.

Yes, accomplishing things in the real world will use energy and take time. Reality is also significantly more complicated than minecraft.

We agree that some amount of energy and time and data etc are needed. I just don't see why that stops the AI. After all, some humans manage to get a lot done despite these limits. And humans are probably not the optimal.

There are loads of ways of making money online. (selling fiction, crypto scams). There are plenty of companies who will do research and engineering work for a "secretive tech startup" who can pay. Making arbitrary components, doing arbitrary measurements. Just send the money and instructions and there are companies full of engineers who will follow your every instruction. Or the AI can just pretend to be a founder that lives somewhere remote and only appears virtually.

With only a modicum of effort to make a few fake names and social media profiles, the AI can blend into the societal background noise almost perfectly. The startup can buy some computers or rent cloud compute. (And there are plenty of things a tech startup might want to use lots of compute for. )

There are plenty of less subtle and hidden things an AI could do, but if it wants to hang out, wait and do experiments, it can do that without being noticed.

I think I am moderately familiar with the technical details of modern AI systems. So far, no one saying "read the technical details" can give any kind of explanation for why the technical details are inconsistent with AI doom. No one ever points to a specific technical detail and says "all modern AI's are made with relu nonlinearity functions, and no AI with a relu nonlinearity could ever destroy the world".

Here is the mental moves I think might be happening.

You read a paper. The contents of the paper feel like boring technical maths. The concept of "true intelligence" or "really thinking" feels in your head like some mysterious and powerful thing. These feelings feel very different. Thus this AI isn't really intelligent, it's just a bunch of dumb maths that looks intelligent to people who don't understand how it works.

I think this kind of mental move is totally wrong. I expect that the code for a superintelligent AI to feel very similar. (there will be different mathy details, but likely not that different) Having a mathy algorithm that doesn't feel impressive when you read the code doesn't stop it being intelligent.

Intelligence is truely made out of these kind of mathy algorithms.

Expand full comment
May 11, 2023·edited May 11, 2023Author

1) The leap from GPT-1 to GPT-4 is almost entirely in terms of scale. GPT-1 can also predict the next token in a sequence, just much worse because it's trained on less data. Architecture improvements and lots more data can increase performance exponentially -- up to GPT-4 levels and hopefully beyond. But ultimately these are quantitative leaps, not qualitative.

No AI system based on an encoder and/or decoder model could ever destroy the world.

2) Re: the mental moves, I recognize that some people might think like that, but personally, I don't. Boring technical maths that leads to reasoning is still real reasoning. Even "simulated" reasoning is still reasoning, in the same way that playing a simulated game of chess is still playing a real game of chess. (On the other hand, e.g. riding a simulated motorcycle is not the same as riding a real one. It's hard to get things done or know how things will turn out in the real world.) However, empirically, these LLMs are not demonstrating any reasoning or intelligence.

Take the underlying GPT-4 model. If it did demonstrate intelligence, you'd expect a few things to be true:

- It could program just (or even nearly) as well in languages that are rare in its training data as languages that are prevalent in its training data, provided it had the full specs of each language, because it could think through equivalent logical statements in both languages. This is not the case.

- It could answer simple arbitrary questions like "Give seven ancient Roman names that do not end in -ius" or "List any ten words with "e" as their third letter" without making any obvious mistakes. Ask it these questions, then try to get it to figure out where and how it went wrong.

- It could understand very specific instructions and follow them to the letter, rather than regurgitating something that sort of looks plausibly like what you wanted. (This is also apparent in the image models. An aside: no one worries about the image models becoming superintelligent... to me, this indicates that we are anthropomorphising the language models too much.)

I'm not saying that these models are useless or not a great achievement. They're incredible, and world-changing. They're just not scary.

Expand full comment

Thank you for this. I have consistently complained that Ehrlich still being allowed in polite society is quite a condemnation of "polite society."

BTW, Eliezer is right -- we ARE all going to die.


Expand full comment
Apr 1, 2023·edited Apr 1, 2023Author

Yep, I probably should have written “misplaced faith that we’re all going to die *simultaneously*”


Expand full comment