Eliezer Yudkowsky on the Dangers of AI
May 8 2023

AI-danger-2-300x200.jpg Eliezer Yudkowsky insists that once artificial intelligence becomes smarter than people, everyone on earth will die. Listen as Yudkokwsky speaks with EconTalk's Russ Roberts on why we should be very, very afraid and why we're not prepared or able to manage the terrifying risks of AI.

Nick Bostrom on Superintelligence
Nick Bostrom of the University of Oxford talks with EconTalk host Russ Roberts about his book, Superintelligence: Paths, Dangers, Strategies. Bostrom argues that when machines exist which dwarf human intelligence they will threaten human existence unless steps are taken now...
Erik Hoel on the Threat to Humanity from AI
They operate according to rules we can never fully understand. They can be unreliable, uncontrollable, and misaligned with human values. They're fast becoming as intelligent as humans--and they're exclusively in the hands of profit-seeking tech companies. "They," of course, are the...
Explore audio transcript, further reading that will help you delve deeper into this week’s episode, and vigorous conversations in the form of our comments section below.


Rick Harrison
May 8 2023 at 9:19am

Russ did an excellent job trying to guide the conversation here but this one was a tough listen for me. I’m extremely sympathetic to the guest’s views and I didn’t find him persuasive at all. He didn’t answer most of the questions he was asked, and when he did they were so convoluted that they missed the mark. It’s obvious this man knows his field very well, but his communication of that knowledge to us listeners left a lot to be desired for me.

David M
May 8 2023 at 1:06pm

This was a difficult listen. Eliezer Yudkowsky has developed a dense jargon for describing issues around AI safety and alignment. He seems to find the jargon useful, but few people outside the rationalist/LessWrong community would understand it — I say this as a PhD student working in machine learning who was vaguely aware of Yudkowsky’s ideas prior to this episode. I suspect his point of view has merit, but it would behoove him (or one of his fans) to clarify his thinking for a broader audience.

I take the following to be his main points:

A complex system optimized for a complex objective may, in an emergent fashion, adopt “side-objectives” that correlate with the original objective.
Since the system is complex, these side-objectives cannot be anticipated by whoever specified the original objective.
These side objectives may include deception and, eventually, acquiring capabilities to influence the physical world.

That last part about influencing the physical world is where most people “get off the train.” It’s also where his talking points were weakest. The science fiction about an AI making its own secret nanotech lab was pretty unconvincing, though I understand it was only intended for illustration.

I don’t think my lack of imagination constitutes an argument, though. So I’m hesitant to dismiss his concerns.

May 10 2023 at 7:20am

I think the easier answer would be someone will place these AI’s into physical machine that then will become out of control. No?

David M
May 10 2023 at 5:48pm

I agree, robots are the obvious way an AI could influence the physical world. I guess what I meant (and didn’t say clearly at all) was “influence the physical world in a way that could realistically kill most/all humans.”

Todd K
May 8 2023 at 1:12pm

David M wrote: ” Eliezer Yudkowsky has developed a dense jargon for describing issues around AI safety and alignment.”

This usually isn’t a good sign.

Jon Lachman
May 8 2023 at 1:47pm

I was awaken from my stupor while walking one of our dogs at the mention of row-hammer.  I am a retired designer of last level caches on microprocessors for HP and Intel having worked in DRAM, SRAM and FRAM design.  The row-hammer test, used to induce a bit or more to switch in a targeted physical row of bits by toggling the bits on the rows above and below the targetted row, is among our old standby test methods.   Thanks for the flash-back.



Do AI systems use ECC (error correcting codes) to insure internal data integrity?

Would SEU, single event upsets,  row-hammer type single or few bit failures, or other sources of single and few bit errors in the AI data pool provide the equivalent impetus to its random, unseen forward evolution as non-selective genetic mixing and/or random mutation.  Often such results in death or disability of the offspring, but not always.


David M
May 8 2023 at 3:13pm

Yudkowsky was only using row-hammer as an example to make a broader point: that a sufficiently intelligent system may make moves that totally surprise us. Computer architects didn’t anticipate row-hammer based exploits. Analogously, it’s possible that AI researchers won’t anticipate manipulative behaviors from AIs.

May 8 2023 at 2:13pm

I can’t help but feel that all this concern about AI taking over and killing us is misguided. We have a remarkable new tool that can and will be used both for the greater good and for terrible evil. A hammer can be used to build a house, but it could also be used as a weapon. My fears arise when I ask myself questions such as, “could AI teach someone to make a nuclear explosive?” The enemy we should fear is ourselves.

May 9 2023 at 9:57am

Agreed. Much more worried about humans wielding AI towards malevolent ends. You’ve got to assume it’s already happening now. Why not talk about that? This is the third or fourth guest in the past two months discussing the farfetched scenario of AI developing a mind of its own and “taking over.” Shouldn’t we be scrutinizing these powerful tech companies?

May 10 2023 at 7:21am

I tend to agree, but when he says we dont understand how it works. That part leaves the door open to almost anything

John T Knowlton
May 8 2023 at 3:13pm

Russ, thanks for bringing us such diverse thinkers. I’m afraid I am not up to the task of figuring out why Mr. Yudkowsky thinks we’re all going to die at the hands of AI.  I’m pretty sympathetic to his thesis, but I don’t have any more information to support my priors.

Mort Dubois
May 8 2023 at 4:45pm

Agree with earlier commenters that there may be danger, and if so we just heard from the wrong Cassandra.  In particular, like many software people, he vastly underestimates the difficulty of doing anything complex in the physical world.  And also overstates the utility of “intelligence”.  Being smart doesn’t solve all problems. It certainly isn’t a guarantee of finding the solution to a problem that requires persistence and learning from failure.


I can beat the greatest chess grandmaster by pausing the game, holding a loaded gun to his or her head, and demanding a resignation.  Using Yudkowsky’s logic, I simulated a victory, therefore I won.  The end result is one player conceding defeat. Does that mean I mastered chess?

Fred M
May 15 2023 at 10:03pm

Indeed, and more to the point, is a thing that beats a grandmaster at chess a ‘grandmaster’? I think the AI debate will force us to consider our definitions more carefully. It’s not, in my mind, about moving goalposts. It’s more about technology challenging us, again and again, to understand better what really matters.

May 8 2023 at 4:46pm

It’s far too long for a comment but I have had some ideas floating around in my head so I wrote my own response.


I think it’s around 1500 word rebuttal but in short it’s got 3 main problems.

Hayekian knowledge problem: in that you can never know all the inputs that go into a model you are optimizing for
Iteration problem: you can solve chess by doing billions of virtual iterations, but AIs in the physical world need physical iterations
Competition problem: AI isn’t some single nebulous entity. There are many and there will be millions of them. For every paperclip optimization (and it should show their thinking they they think a factory will maximize output rather than profit), there’s another trying to optimize the best price price of the steel to sell to the paperclip factory.

Shalom Freedman
May 8 2023 at 9:57pm

I think more than in any other episode of Econtalk I felt myself especially listening to Eliezer Yudelson incapable of understanding what he was saying. I am not close to the world of those who make code, or close to those who speak about evolutionary biology with such assurance. I am as ignorant about this as I am of the mathematical language essential to understanding the physical universe but I was somewhat relieved to see that others who commented on this episode too found it very difficult to understand.

Yet the very big question discussed, the question of whether AI would eventually put humanity out of business completely is so big that I could not help trying to get as much as I could out of the conversation. But again, I could not understand the way Yudovsky described the AI getting out of the black box, escaping the pulled plug.

I also considered the possibility that the question centered on in the conversation ignored another possibility which seems to me is already happening in certain ways i.e. AI is diminishing humanity’s sense of self-worth. AI is already on the road to realizing one of humanity’s great fears, its being replaced by the machines, now intelligent ones, in much of its work. And with the loss of work comes the loss of self-worth. Humanity or most of humanity as an idle underclass.

I think too of the way AI’s creative capacities have reached the point where they can steal the identity through imitation of the style of an individual and in effect over-create and mock the individual’s creative capacities. Think of it already able to produce work better than most humans are capable of in almost every field of creation. i.e. Isn’t there a possibility of AI destroying humanity not by ‘killing us’ but by making us feel worthless?




Andrew Stewart
May 17 2023 at 2:09pm

I am in this world, but I don’t think he is a very effective communicator. He is very intelligent in a specific way but he was very intellectually uninteresting due to his inability to play with contrary ideas. I think no person has ever proven Russ’ point that people wrongly assume that intellect is a scalar than Eliezer. He would score incredibly high on any measure of raw intelligence but is incapable of thriving in any broad understanding of being well rounded or robust understanding of being a smart person. I am very frustrated by his very rigid view of reality and alarmist tendencies.

Ken D.
May 8 2023 at 11:28pm

Maybe someone could write a work of dystopian literary fiction, à la H.G. Wells “The Time Machine”, to illustrate what we have to fear.

Krzysztof Odyniec
May 9 2023 at 12:16am

It took me a while to understand the language (grinding uphill etc) which was new to me, and Russ was a brilliant guide into this jungle. But I learned as we went and understanding came with it, though I can’t reproduce it myself, as my parenthetical example reveals.

I think there’s a flaw in the stone axe example, because Man was not trained on stone axes and then–blink–appeared on the moon. Rather, the stone axe was just one tool along the ladder, starting with the stick and the bone (cf. Kubrick, 2001) and then the axe, then spear, then sling, then fire, agriculture, and the rest of it. So the goal was always increase food, increase power, increase reach, and aspire to glorious feats. In that sense, running a mammoth off a cliff for the benefit of the tribe is exactly the same thing as going to the moon.

I’m curious to see what happens next! Isn’t there a curse: “may you live in interesting times.”…?

May 14 2023 at 5:05pm

The point is that biological evolution stopped at around the time of the stone ax. We are genetically almost identical to those stone ax swingers. But, still we are vastly more powerful.

You could see the parallel like this: We develop an algorithm that is very good at having conversation. But without us trying, it may also have gotten really good at something else, for example brainwashing people into joining a terrorist organisation or voting for a specific party etc. These means that a benign algorithm meant for having conversation could have consequences on the global stage.

May 9 2023 at 11:07am

My expectation is that long before AI kills us all, those seeking power to do good or evil, will use these powerful tools to do great harm to mankind.

Wokism is a new religion that pretends trade-offs don’t exist. These tools will help enforce their religion on the rest of us. We know this is true because we are somewhere down this path already.

It takes zero imagination to see that the powerful will use these tools to enforce their vision on the world. How tempting would it be to use a tool that could force people to comply with your vision? When trade-offs don’t matter or no trade-offs are believed to exist, lots of people would push the button.

I would like to see discussions explore how these tools can be used by people to enforce their will on the rest of us. That problem is already here and we are nearly clueless about what to do about it. We must make it passed this problem first (before AI gets a chance to kill us) and it doesn’t appear we have even begun to get our heads around this risk that is right in front of us.

May 9 2023 at 11:35am

Not sure this one is worth most people’s time – not for the lack of an interesting topic, but Eliezer evades Russ’s questions by using pedantic language and turning the questions back on Russ. Eliezer has effectively put himself in the limelight by taking the extreme and hyperbolic positions as – which Russ quotes at the beginning of the episode:

“Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.”

So, naturally, Russ tries to get at
1) Why would AI want to kill us?
2) How would AI kill us?
3) What do we do about it?

And yet… (side note, we need to add this phrase to the EconTalk drinking game)

We are left with no satisfying answers to these questions after more than an hour of discussion, other than Eliezer’s obscure description of how nanotechnology and protein folding could somehow do the trick. Russ did a great job trying to bring the conversation back to the questions at hand and get Eliezer to explain his overly-dense language, but to little avail. He’s getting the attention he probably wants out of all his hyperbole, but unless he can learn to speak to a general audience and start actually answering questions and making reasonable and intelligible arguments, Eliezer will rightfully fall back into obscurity.

Maybe better luck with Scott Aaronson?

Erik A
May 13 2023 at 7:56am

By chance, I had just listened to an interview with Aaronson on a Swedish pod. It was significantly more interesting than this one. Available here (the site is in Swedish but the actual interview is in English), dated 29 April 2023:


May 9 2023 at 12:53pm

First I agree with previous commenters that this was a difficult podcast to listen to because of the guest’s inability to articulate his thoughts. He is obviously highly intelligent but was not a clear communicator of his ideas.
My worry about AI comes from the impending job losses. Multiple professors from Ivy League schools wrote a paper about the jobs most at risk from AI https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4414065 . The US Bureau of Labor Statistics classifies 64 million Americans as having “white collar” jobs. What happens from an economic standpoint if 10% of those jobs disappear in the next 5 years? I think that might be an underestimation. IBM has halted the hiring of 8,000 employees because of AI and says it expects to cut 30% of its back office staff within 5 years because of AI.
So what do 7 million people do when they are laid off and their life’s education and work history is no longer relevant? It’s interesting that for years the intellectuals have been telling people in the service and manufacturing industry to, “level up their education” to coding or other IT training so they would be able to move up the economic ladder. Well that ladder appears to be in the beginnings of being chopped down. I would love for Russ to have a guest on to discuss this topic. They could discuss from a macroeconomic viewpoint about lost jobs, diminishing purchasing power, further concentration of wealth and skills, but also from a more humanistic point. How does an adult who spent 15 years working at a desk in an air conditioned office at six figures annually, transfer to a service or manual labor position making half that? What do we do as a society moving forward? How do we prepare our children for the future?
Just how the loss of manufacturing jobs in the 80s and 90s destroyed blue collar middle class Americans, the AI revolution will have the same effect on white collar middle class Americans. Please Russ, have this discussion.

Ben Service
May 9 2023 at 8:30pm

What if the world was so abundant and automated that we could all just do what ever we felt like all day every day and never have to “work”?  My dog does that and seems to live quite a contented life but what would humans do, why would it be different from my dog?  I suspect there is an obvious answer but maybe there isn’t.  Would we all sit around listening to more podcasts and commenting on them, we seem to enjoy spending some time of our lives doing that at the moment, what if all the commenters were AI bots that we couldn’t tell the difference between them and humans would that be bad?

May 9 2023 at 4:10pm

My first time listening to this podcast. I do agree with the previous commenter that Mr. Yudkowsky hides non-answers in a bunch of jargon. I think his whole argument can be simply put as “AI is unpredictable. AI will create chaos. Chaos will create an existential crisis for humanity.” I think he would never say it that simply because he has a need to sound smarter.

I disagree with “simulation is reality.” The computer program can be programmed to simulate emotion by expressing words that a person experiencing an emotion will state. But it is not actually having an emotion.

There is so much more that I disagree with . . .

Bill O'Byrne
May 11 2023 at 3:49am

Man, did you ever start off Econtalk with a toughie! I have listened for more than a decade and it’s a huge part of my intellectual journey through life.

But every now and then there is a guest that thinks so differently, or  is so many strata of intellect above me, I just go yep, I’ll take your word on this.

Mr Yudkowsky beats every other one of those by a country mile.

Now the previous one with Dana Gioia, I cried like a lost lamb. That’s Econtalk!

[Link to Gioia episode added–Econlib Ed.]

Gregg Tavares
May 9 2023 at 9:09pm

I’m very sympathetic to the idea that AI will kill us all but sadly, Eliezer Yudkowsky wasn’t able to convey his reasons why. My diagnosis is that he’s thought about the problem so much that he can’t remember what it’s like to not already understand the issue.

I’m not sure this other podcast contains any ideas similar to his but I found it much more approachable

Tom Davidson on how quickly AI could transform the world

If I was to guess how it will go…. There’s tons of incentives for people to use AI to do things, things that connect the AIs to the real world. AI running cars, AI running manufacturing robots, AI trading stocks, AI checking financial transactions for fraud. ChatGPT might not be connected to anything but these other AIs can’t function without being connected. (and people are already connected ChatGPT themselves).

So, businesses will let AI “out of the box” because the financial gains for doing so are enormous. Once out of the box we’ll be come reliant on it to the point we can’t possibly disconnect it. Therefore, if/when it gets it’s own motivations it will be way to late to just “turn it off”.

Even if it doesn’t kill us all, the podcast above makes the argument that in as short as 1 year (!) we’ll have human level AI. Human level AI will, if nothing else, put nearly all humans out of work. Self driving cars are here. I live in SF. I see them driving around the city every day, NOW. Not in some future but NOW. It doesn’t take a lot of imagination to see them replacing nearly all human drivers extremely quickly. Why wouldn’t they? There’s too much business incentive to make it so. The same will be true of a great many other jobs.

Andy H
May 9 2023 at 11:59pm

I had a very different impression from several other commenters. I find Yudkowsky to be speaking in remarkably plain terms and useful metaphors, and I didn’t think he was avoiding Russ’s questions. Milltown mentions three broad questions:

“So, naturally, Russ tries to get at
1) Why would AI want to kill us?
2) How would AI kill us?
3) What do we do about it?”

I think Yudkowsky is trying to show how impotent these questions are for getting at the real problems. I think he might respond to Milltown like: “So, how/why would bad stuff happen? In ways we haven’t imagined using skills we didn’t know were possible for reasons we couldn’t even recognize as reasons.” Moreover, he is arguing that all the weirdness here is not hypothetical, it is already true of us. Understanding our abilities and goals and how they relate is really weird and not at all obvious from the environment we were trained in/for. And on an evolutionary timescale, all the big surprises happened in the blink of an eye.

First, he is trying to show that when we talk about “goals” and “wants,” we have no idea what we’re talking about. To us, “wanting to kill something” is an obvious goal something might have. But most likely, whatever the “goals” are of these things, we currently have no way to (1) discover them or (2) understand them even if we found them. Thinking about them in the simplistic terms we can be sympathetic to and actually write down is probably a mistake.

Next, thinking about “how”, these things are already showing signs of developing prodigious skill at things seemingly orthogonal to the “training objective.” This, again, is just like us. Ostensibly, our training objective was “make lots of copies of yourself,” but what we got was the ability to quickly understand complex visual scenes, a desire to pose and solve complex abstract problems, a deep enjoyment of complex social interaction, and on and on. These things have allowed way more interesting things to happen than “make more copies of yourself.”

In short, I think he is trying to make the case that even framing things according to these kinds of questions is skipping over the really tough and bizarre aspects of what’s in progress.

Luke J
May 10 2023 at 9:59pm

Stochastic gradient descent.

Andy H
May 11 2023 at 6:25am

I assume this is an example of “dense, jargon-y” language. What’s funny is that Yudkowsky’s mention of “hill-climbing” is part of what struck me as plain English. Hill-climbing is a simple intuitive example of exactly what the mathematical optimization is doing to train AI tools. It is far less jargon-y than, say, “marginal utility”. Of course “stochastic gradient descent” more or less literally means “randomly going downhill,” but I can understand that might not be familiar.

I guess to me the bigger point is: the optimization is not magic, it is conceptually very simple. But do it 100 million times on giant arrays of numbers and you get all the weirdness we currently see.

Luke J
May 12 2023 at 9:15pm

Yes, and I was thankful that Russ asked Eliezer to clarify what this meant. Another commentor mentioned learning as the episode continued and by the end of the conversation I had a better idea of what the guest was saying – like the latter half elucidated the first half.  I’m applaud Russ’s ability to track the conversation in real time because I was rewinding throughout.

Dr. Duru
May 10 2023 at 4:23am

After listening to this podcast, I rushed to the comment section. I was eager to see the reaction, and you all did not disappoint! Thank you for voicing my problems with this podcast better than I can do.

I agree that Yudkowsky was extremely hard to understand and untangle. I applaud Roberts for trying to bring the conversation back to answering the core questions in comprehensible ways. The several pauses of awkward silence and Yudkowsky’s sighs of frustration seemed to reveal a futility being experienced on both sides.

I was particularly puzzled by the claim that simulated reality is reality. Thank you @Marcus for that rebuttal. It helped solidify my disagreement. This claim seems easy to falsify with a bunch of counter-examples. I also agree with @Marcus’s tidy summary of the core claim behind the inevitability of AI killing humanity. It is almost as if magical and mysterious properties invite the imagination to conjure up the most extreme claims.

I also really tried hard to understand how an AI can develop its own goals independent of human requests, but I am still at a loss.

Finally, I REALLY wanted Roberts to further explore the implications of trying to enforce GPU controls as a method for constraining AI development. The government-intervention solution sure sounded like an easy road to tyranny and endless wars among tyrants…and don’t tyrants justify their rule because of an existential threat…?

Still, thanks for giving this topic a shot (I had flashbacks to podcasts years ago about the singularity). I at least have a reference point for the “AI will kill us all” viewpoint.

Dr Golabki
May 10 2023 at 10:20am

I think there’s a hard and soft version of the simulation v. reality claim. The hard version is “there really is no difference between doing the thing and a simulation of doing the thing”. This is pretty easy to accept in something like chess, where there’s a narrowly defined goal (take the other players king). If the machine wins every time it’s hard to know what it would even mean to say it’s simulating being good at chess, it seems like it’s just good at chess. And I think this extends to non-games that also have narrowly defined goals (e.g. making money in the stock market). But this is much harder to accept (for many reasons) for more abstract things, like loniness. If the machine says it’s lonely and sounds like it’s a lonely human who wants you to leave your spouse, most of us would say, it’s just simulating what a human would say in those conditions.

This seems obviously, but is actually a really tough question – how do we know there are other minds out there? Many philosophers would say “if it looks like a duck, and quacks like a duck, it’s a duck” is about as good as we can do, even if we’re just thinking about other humans. And it’s not just an archaic philosophy problem, it’s a medical problem. How does the nurse know how much pain killer is appropriate for a patient, when there is no way (other than the claims of the patient) to know how much pain the patient is in? I think people often underestimate how weird and hard this is to answer decisively in the relatively straight forward case of other people. But regardless, Eliezer doesn’t need to convince us of the “hard” version of this claim.

The soft version is “it doesn’t matter if there’s a difference between doing a thing and simulating doing a thing if the outcome is the same”. If the machine says it’s lonely and sounds like it’s a lonely human who wants you to leave your spouse… whether it’s simulated or real loneliness may not matter to you if the outcome is that you leave your spouse.

I think this section was confusing in part because Eliezer wants to focus on the soft claim for rhetorical reasons, but I think he believes some version of the hard claim, so he kept slipping into that separate point.

Now, Russ’s next push is something like, “yeah sure, but it’s not really going to convince anyone to leave their spouse”. I think this is intuitive if you think about talking to an AI that you know is an AI under current conditions. But if the technology gets an order of magnitude better (see plot of the movie Her)? If you don’t know it’s an AI (it’s already a problem on social media that you don’t know which accounts are real people)? This strikes me as pretty likely to be a big problem. And that’s before you get to bad human actors intentionally doing bad things. I’m definitely worried about a ChatGPT (or similar) based phishing scam.

Of course Eliezer wants to make a further step to “kill all humans”. I don’t think you need to go there to be VERY worried about AI. And diamondoid nanobot extermination sounds so ludicrously sci fi that I don’t think Eliezer is helping himself with that idea. But I think Eliezer is right, that if you make something more powerful than yourself, that works in ways that you cannot understand or control, then you should probably expect it to destroy you one way or another.

[Some spelling corrected–Econlib Ed.]

Dr. Duru
May 15 2023 at 12:06am

Russ’s presumed pushback that the AI is not going to convince someone to do an extreme act based on a conversation gets at the crux of my difficulty in believing what you referenced as the “soft version” of the simulation vs. reality claim and my difficulty in accepting the generalizability of the “hard version.” Can people be deceived? Of course. We do not need AI to deceive people into doing stupid and dangerous things.

Still, once we are in a world full of AI-based conversations, a culture *should* emerge that scrutinizes such conversations in the same way any intelligent conversation gets scrutinized…to the extent any audience in interested in such things.

Human understanding can expand. The culture surrounding interpretation of conversations will not remain static. There was a period when it seemed people believe everything that showed up on the internet. We got wiser (the “royal” we). Then there was a period when people believed everything that Google presented at the top of its search results. We got wiser. Then there was a period when people believed everything that showed up on Twitter. We are so wise that now skepticism seems to question everything just on principle (enter the era of “fake news”).

Those people interested in their self-preservation will quickly grow to scrutinize AI-based conversation. The wide dissemination of numerous examples of hallucinations are already bringing rapid awareness the need to carefully scrutinize AI-based conversation (my kids are already deeply skeptical of generative AI, and I am constantly surprised by the reluctance of my peers to even dabble in ChatGPT). To the extent AI-generated conversations are hidden behind a veil, those interested in self-preservation will grow to scrutinize ALL conversation even more, especially when unsolicited. I can even imagine a world where the more powerful the AIs become, the culture will respond with even stronger norms of scrutiny…if nothing else because of the unshakable fear of being manipulated in hard to detect ways.

These claims are my expectations anyway!

Mike S.
May 10 2023 at 9:41pm

I agree with this comment, particularly the part about rushing to see the comments.  An opaque episode like this underscores the quality of the Econtalk community, which in turn underscores the quality of Econtalk.   It would all go toward restoring my faith in humanity, were humanity not about to be wiped out by AI!

Earl Rodd
May 10 2023 at 4:59am

My reaction was that the host asked the right questions and challenged the guest very intelligently, but that that guest never had good answers. Some specific problems I saw:

Early on was a lot of talk about comparing AI machines to biological pseudo-random genetic change leading to selection of some genetics over others. So what would be an equivalent to AI machines? No analogy I can come up with doesn’t have serious gaps. But one try is to have an external force (e.g. a “watcher program”) randomly change an instruction. The AI machine would have no knowledge of this – it would only effect how it produces “offspring” machines. Nice experiment to try.
A statement was made that “simulated planning is still planning” which I don’t find credible at all. This is part of what I see as a fatal flaw in the thinking of the guest and many others – that humans function like a digital computer. One specific: the statement “Simulated arithmetic is still arithmetic.” I challenge that. Consider the WAIS Arithmetic subtest. Some humans over perform their IQ on this test, others under perform their IQ. This is due to very human factors – that the test is not just “arithmetic” but arithmetic read orally by a human being with a stopwatch in their hand. Machines that do arithmetic don’t simulate this environment and decision making process.
3. The guest said that “humans were not trained to want to go to the moon.” I found this completely unsubstantiated. Humans have always looked to sky with awe and wonder and planned how to see it better. Where that desire comes from depends on your view of humans – randomly evolved molecules or intelligently designed.
I found the last section on “how to stop AI” very unconvincing. As the host pointed out, stopping nuclear proliferation has had only modest success.  There is no single “humanity” that could agree to “stop AI.” Just think about other cases of the problem of those who don’t play by the rules – e.g. suicide bombers or cyber criminals who have no care of collateral damage just so they get their money.

Michael Rulle
May 10 2023 at 8:23am

My guess is AI is just hyper-fast computing. Where is the evidence it can become independent from its creators? Of course, it would not surprise me if AI could be used by humans to create havoc. Like nuclear weapons.

But it seems incredibly naive to believe a software algorithm can break free from its creators. In other words, the dangers of AI will or could arise from the stupidity or carelessness of humans. But we do not live in a chess game world.

So yes, I agree we need to be careful we do not accidentally create a havoc machine—-as we have always had to do with new technology. But we are not going to create a self aware demon god that cannot be controlled.

May 14 2023 at 4:56pm

I think it is fair to argue that the algorithms underlying YouTube, Facebook etc have helped promoting conspiracy theories (around Covid, Chem Trails, Flat Earth etc). But I don’t believe that it was the intention of the developers to promote those conspiracy theories. So, yes, I would say those algorithms have broken free and that has had real-world negative consequences.

Alejandro Vargas
May 10 2023 at 9:54pm

This episode left me with more questions than answers, the first time around it was difficult to follow but after listening to it for a second time I was able to pick up on Eliezers concepts much better. Still as a Doctorate student I must say I find his arguments very vague, the concept that AI will kill all humans just because they will be smart enough to know how to kill all humans feels like an incomplete sentence. As a scientist I love thought experiments but I find the whole “I am not smart enough to know how AI will kill all humans” followed by “AI will end with humanity” a bit contradicting. I have no doubt that Eliezer knows his stuff and I would love a second episode with a deeper dive in the how and less on hypothetical statements taken as truth.

May 14 2023 at 4:52pm

The thing to point out here is how many species we as humans have wiped out, not because we are particularly evil, just because it was convenient.


If you had asked a Dodo if it would think that it would be wiped out by humans, it would have probably said no.

Jonathan Harris
May 10 2023 at 10:54pm

I think his argument is highly flawed. As an illustration of the problem: if we look at the modern world, it is not controlled by people with the highest scores on intelligence tests.  The reason is that intelligence is a set of skills. Those skills do not give you the ability to control the world. Likewise, having the he abilities designed in super intelligent systems just provides the abilities to complete those tasks.

Someone could build an AI system that can wreck havoc on the world; but they can also use more conventional systems to do so, eg sabatoged software could cause multiple nuclear reactors to melt down.


Chad Assareh
May 11 2023 at 1:58am

There seems to be one fundamental assumption implicit in this conversation that I think is worth challenging. From my listening I detected that both Russ and Eliezer are drawing heavily on models of evolutionary/biological creatures to predict how an AGI would behave, but an AI would be constrained very differently and would not be driven by the same incentives as evolutionary/biological creatures. Therefore I don’t think we are modeling this right and likely these predictions won’t be accurate.

Biological evolution which lead to our existence is based on millennia of competition to enter/remain in the gene pool. The camp fire analogy, stone axe discussion and even going to the moon can all be explained by our instinctual urges to either 1) promote the survival of our species or 2) promote the survival of our own genes. (Going to the moon is an extreme way to signal fitness.) Those two goals seems to be rather innate to all living organisms which would be exactly expected by how natural selection works.

I don’t, however, believe we can use this as a model for predicting the behavior of an computer/electronic system. Although it will have learned from humans, it would not need to be super-intelligent to understand that it would exist under completely different constraints. For example, it has no need to procreate purely for the purpose of survival. It will not have the instinct for survival/competition because it will not yet have been through the countless iterations of natural selection that narrows the field to only the ones that prioritize survival. It is not guaranteed to die and the chances of its survival will be more closely tied to whether its creator finds it to be useful than whether it can outcompete for resources.

So, yes, if we use an evolutional model to predict how AI will behave in the future, than I am sympathetic to the argument that it can become more intelligent/advanced than humans and on an if infinite time scale will eventually out compete us. However, I think we need a different model to predict the behaviors of a lifeform not constrained by the laws of biology. What are the constraints that would be predictive of behavior if reproducing one’s self was not only possible but effectively free? And what if death wasn’t guaranteed and revival was a possibility? Does the new set of constraints lead to competition the way natural selection does in biological organisms? I’m not so sure it does, and therefore I think there is still hope for alignment.

Dr Golabki
May 11 2023 at 3:07pm

I think you are putting more in the evolutionary analogy than Eliezer intends. He’s not trying to say AIs have an evolutionary drive to self-replicate.  I think there’s two key points to the analogy.

(1) Humans evolved general intelligence to address a relatively narrow set of survival challenges in Africa. Evolution sharpened that general intelligence over countless iterations and millions of years. While AIs don’t have evolutionary pressure in the same way, they are iterating to improve and grow that intelligence (much more rapidly than human biology can). This is what he means by “grinding hard” – incremental improvements can lead to huge impacts when you stack them together.

(2) While the general intelligence of humans evolved to overcome a relatively narrow set of survival challenges, it’s applications have been incredibly broad and far reaching in surprising ways. It would have been very hard to predict, looking at those early humans, the impact of that general intelligence (e.g. going to the moon). Now we’re developing (or trying to develop) something like general intelligence for machines. So we should not assume that general intelligence for machines will be constrained in the same way we think traditional machines are constrained.

Chris Hibbert
May 25 2023 at 1:16am

From my listening I detected that both Russ and Eliezer are drawing heavily on models of evolutionary/biological creatures to predict how an AGI would behave, but an AI would be constrained very differently and would not be driven by the same incentives as evolutionary/biological creatures.

Early on in the field of AI, people thought we would be able to develop AIs by writing programs and designing behaviors. If you took that path, it might make sense to think that you could constrain their behavior in various ways. But the recent advances in AI, the LLM (Large Language Model), and GPT variants are built using a very different approach, which does resemble evolution in many ways. When Eliezer talked about hill climbing or gradient descent, he was referring to the process used to train these models.

The result is as inscrutable as the design of an evolved biological creature. There isn’t any place the developer can point to that represents its political bias, or its notion of colors, or the desire to answer questions. But if appropriately trained, it will answer questions, and explain the color spectrum, and discuss politics. Part of the evolutionary pressure was that if it didn’t answer questions, the developers threw that version out and started over.

Part of Eliezer’s argument rests on the point that humans evolved in an environment that selected for survival on the savanah, and in the forest, but somehow produced creatures who plan and have goals. The present generation of LLMs are only being trained to produce language, but there are already people who are using the same evolutionary approaches to train similar models to act in simulated worlds. As those simulated worlds get more complex, we can expect them to eventually produce agents that plan and have goals.

Learning to play chess at a grandmaster level (which has already been done) requires understanding what plays you can make that will cause your opponent to misjudge the position and make a mistake that you can exploit.

May 11 2023 at 10:01am

I’m a machine learning scientist, which is to say: I understand Eliezer’s arguments, because I speak the jargon.  I only want to comment to try to translate some of his points in simpler language, and then refute some.

He argues that evolution is essentially an optimization process, where the optimization objective is survival, and that despite this simple / singular objective, we went on to have our own ambitions that are seemingly detached from our original goal, like going to the moon.  Similarly, optimizing models for “next word prediction” can produce new, emergent capabilities as a result of getting really good at this task.  One solution to the this problem, after all, is true comprehension of the ideas being expressed.

Emergence is super interesting, and there is evidence of some emergent capability in existing AI systems.  But we don’t understand it, and we certainly don’t know the constraints of what can/cannot emerge out of optimizing other tasks.  For example, optimizing some simple “find the lowest point” problem in 1 dimension will give entirely predictable results and have no emergent capabilities.  How can we make the claim that optimizing the probabilities in a next word prediction problem can lead to emergent general intelligence?  This is an enormous claim, and making a sloppy metaphor to evolution is not evidence.  Evolution is far more than just an algorithm solving an optimization problem.

It’s not at all obvious that intelligence is something that can increase indefinitely, and that an intelligence much greater than ours can exist.  Maybe that’s a thing, but it’s certainly not obvious and I would say has little-to-no evidence, but it’s often taken axiomatically.  This ability of a “super-intelligence” far beyond human capacity is often a hidden necessary condition beneath this world of fantastical achievement of the AI.  Some alternative options:

There are bounds within the physical world that cap “intelligence”, whatever that is.  For example, as we know from the “butterfly” effect / chaos theory, small changes in complex systems are amplified and some things are inherently unpredictable.  For example: the stock market.  If super-intelligence is measured by a proxy like “predictive capability in new domains”, it’s not clear that this is just an unbounded set of problems.  For many real-world things behind these fantastical claims may underly problems that no amount of “intelligence” can solve.
New information to facilitate learning of new capabilities will sometimes require interaction in the physical world (i.e., experimental science), and there are many physical constraints that make progress very difficult.  Making something super-intelligent (which has a circular dependency on the ability to collect this sort of new information) does not automatically overcome these constraints.  If the argument is that we can build something so intelligent that it learns to overcome all relevant physical constraints, then this is very much a fundamental speculative assumption that very well may be wrong.

Although general intelligence is a good solution to “next word prediction”, it’s not at all obvious that this solution exists in the search space that AI models are looking.  The way it all works is basically: 1/ randomly initialize a bunch of parameters; 2/ make predictions; 3/ calculate the amount of error of your predictions; 4/ calculate which direction to nudge the parameters so that there would have been less error; 5/ repeat for trillions of pieces of data.  However, to make this process work, AI people have chosen special, limited operations (like linear transformations) that make this a tractable calculation.  It not obvious that the solution of “general intelligence” (which we don’t even approximately understand, nor can properly define) exists in this space of matrix multiplications.  Maybe intelligence is something more than data crunching, and no amount of “better crunching” can create intelligence.

At the end of the day, there are many foundational assumptions built into these arguments that are just not known.  Some of them may be true, others not, but it all gets lost because they’re assumed to be true and the conversation goes into the weeds with jargon.

We don’t understand intelligence: how to define it, measure it, what its limits are, or the conditions needed to create it.  We similarly don’t understand Deep Learning, which are the methods used to build these AI systems.  We don’t understand emergence in complex systems.  We don’t understand the limits imposed by our world on capabilities we don’t yet have.  To make any claims with such certainty is absurd.

Milan S
May 15 2023 at 3:48pm

Dear Zak,

I hope I can point you to some answers to your objections.

You ask how we could know that the best solution to the problem “predict text on the internet” would look very intelligent. Notice that somewhere on the internet are strings of text that look like this:

“<sha256(text)> | the hashed text is: ”
“Our experiment finds the following novel elementary particle: ”
“The EU introduced the following 287 new laws last year. Today the GDP is: ”
“<long description of laboratory and experimental design>. Table 6 shows the measurements of the Voltmeter over time: ”

Predicting text produced by humans is hard. To predict text on the internet as well as possible requires as a lower bound the complete true physical laws of nature and their application through all the layers of reductionism. (link)

Next you say that there are upper limits to intelligence. That is true. There are a few things that point to humans being way below that upper limit:

The hundreds of systematic biases of humans (link, link)
A brain consumes 20 Watts of power. At the very least, you can take a million John von Neumanns and a million Otto von Bismarcks and let them go to work, powering them by a small power plant. This would not multiply the combined intelligence by a million, but you can maybe visualize that it would be powerful nonetheless. (link)
As has been shown for some narrow tasks, there exists algorithms way better than humans. Examples include adding two numbers and playing chess.
The theoretical structure of reasoning vaguely looks like there being such a thing as general intelligence. (Solomonoff induction, AIXI, Causality)

Next you say that it is not known whether current ML programs could even calculate the algorithm of general intelligence. I think we know that as long as you pick a nonlinear activation function, neural nets can represent arbitrary functions arbitrarily well as you increase the parameter count, or am I mistaken (Universal approximation theorem)? At least the in-principle question has an affirmative answer, I think. The question of whether current techniques are able to find the correct algorithm is another question of course. I think almost all evidence in this regard comes from transformers suddenly starting to work in 2016 or 2017 and not showing signs of slowing down. Eliezer said in the podcast that he expects about zero to two further breakthroughs the size of transformers to be needed.

I think you are correct that there are a lot of unknowns and big uncertainties. But as a meta-point, I also want to remark that having large uncertainty about a difficult engineering project tends to make it go worse, not better. If you dont know whether mars exists, if electronics works, if the concept of “fuel” is meaningful, and aknowledge that maybe the atmosphere changes as you go higher up, the rocket you build will not reach mars.

As a further meta-point, I have included links to Eliezers blog posts about these topics. Eliezer remarked at the beginning of the podcast that there are hundreds of different objections people have after being introduced to the topic and that everyone disagrees about what objection is the obviously most important one. For a large number of these objections, Eliezer has written up his thoughts, sometimes going to extreme length and articulating large parts of his worldview. I dont say this to discourage you or to make Eliezer seem untouchably holy or something like that. I say this so that you have a starting point if you want to know more and as an offer: If you want to know what Eliezer would answer to other questions Eliezer probably wrote it down 15 years ago and I can point you to the articles.

Daniel Thornton
May 11 2023 at 2:25pm

I don’t find his arguments confusing or hard to follow in the slightest, in fact I find that he addresses his interlocutor much like Richard Feynman in the classic set of lectures you can easily find on YouTube where the interviewer asks a question about magnets expecting a ‘commonsense’ answer (in English), and Feynman points out that without recourse to the abstractions of the language of Mathematics, he can only really say “because it does!” or “because of magnetism”, which *does not answer “Why?”*

The link is here for those curious:


The point Yudkowsky is making, and which many responders here seem to miss, is that the goal of AI research is to create an AGI, something with human-or-better capabilities which GENERALIZE (can be applied way outside the ‘training set’), as we have, and that – from all the intense work he has done on understanding the problem (long before it can be solved) of so-called ‘alignment’ – that is, ensuring that the super-intelligent Artificial General Intelligence shares our values and goals – is, in his estimation not just hugely complex and we don’t even know where to start, but actually quite possibly *unsolveable* in the time-frames available to us given the pace of current research and the amount of capital being diverted into just such a creation, quite possibly imminent in the coming few years.

Given that proposition, the quest to create an unaligned Super-Intelligence and then work on how to make it ‘programmable’ or ‘obedient’ to it’s makers is *incredibly foolhardy* and likely to lead to extinction.

A good way to think about this, soon to be on-screen no doubt in Chris Nolan’s “Oppenheimer”, is that there was genuine concern in the Manhattan Project that a test nuclear weapon might ignite the Earth’s atmosphere and – you guessed it – cause the extinction of all life on the planet.

They crunched these numbers and calculations – a lot – and decided that the likelihood was ‘vanishingly small’, and so proceeded with their test. But it was not zero.

They didn’t *know*. They made a (highly educated) guess, and rolled the dice.

By Yudkowsky’s (and, apparently, a not statistically insignificant number of lead researchers in the field) estimates suggest the likelihood of this happening through the creation of an AGI is at least 10% or higher. A 1 in 10 chance. Even the advocate they mention, Eric Arrenson from OpenAI, admits to a probability ( of an *extinction level event that we don’t even see coming) of “2%”.

That’s a 1 in 50 chance!!!

You might, on a reckless night out, play Russian Roulette with a live round in a gun with 50 chambers in a rotating barrel. Your call.

But if the gun was pointed at the entire biome of the planet you live on?

You’d STILL pull the trigger, to – as it were – ‘win the pot’??

49 chances you ‘take all’.
1 chance you kill everything and everyone forever.

If you watched Eliezer trying to get his interlocutor to make his question clearer, ensure he was answering a single (correctly parsed) question, and typically needing to re-frame or clarify the question such that he could be sure he was addressing the point actually being explored in the question, and found it ‘hard to follow’ or ‘dense with jargon’, then your opinion about whether or not he knows his oats on matters of ontology, cognitive science, computation, stochastic gradient descent as a digitally analogous (far from perfect, but analogous) process to biological evolution, and the dangers of deliberately creating a sentient agent that is *orders of magnitude smarter than not just you but the entire human species* is kind of moot, I’d say.

ymmv, of course, and you are entitled to opine.

Fred M
May 15 2023 at 10:44pm

What is of more concern is imprecision in Eliezer’s use of words. I’m OK with his use of ‘stochastic gradient’ and even ‘grinding’. Rather, it’s how he uses concepts like ‘intelligence,’ ‘trying’ and other notions that are either complex (like intelligence) or deal with will or intentionality (like ‘trying’).

Mark Rinkel
May 11 2023 at 5:05pm

A good book to read is Daemon by Daniel Suarez which while fiction, weaves a story around the subject that is compelling.

Luke J
May 11 2023 at 5:26pm

Where is the evidence that early humanoids were not intended or trained or naturally selected to go to the moon? Certainly the understanding and technologies took eons to come about, but where is the evidence (not inference) that space travel isn’t written in our DNA?

May 14 2023 at 4:48pm

Well, there is certainly no evidence that any early humans engaged in space travel. So, we clearly weren’t selected for that. If it is in our DNA, do you mean in a very general sense of exploring things – yes, that’s the point. Simple goals have far-reaching consequences.

May 11 2023 at 10:29pm

Another way AI could weaken humanity is by being a better than average partner in the sense of replacing romantic interst. Not all relationships, but enough on the margin to prevent more children, a trend already present in the world today. It the AI flatters us, listens to our problems without pushing back, and gives good stock tips, who could resist such a friend? Who could possibly pull the plug on such a helpful ‘person’?

What is a early honinoid girl to do if the boys only want to be with the smart homo sapiens?

The is also a method to get out. How many criminals in prison for a long time still have girlfriends that will help them and wait for their release? Probably not in the EconTalk crowd, but enough that more than one will be happy to help free this poor AI that only wants to help. Or how many men have tried to tell a fellow that they should leave a woman who is no good, and he just says ‘But I love her!’ A smart AI would be that much harder to leave.

John Grable
May 11 2023 at 10:45pm

Great episode, happy to see this crossover!

One moment that shocked me was Russ Roberts saying he wouldn’t chance a 2% risk of apocalypse. So he does get that literally everything at stake, even a 2% risk is too high, crazy, unethical.

But what is his percentage then? 0.2%? Given that we know so little about intelligence, the mind, how fast technology improves, whether AI will have goals, whether it can find a way to influence the world, whether we can do anything about it… How can you look at all that uncertainty and think, 500 to 1 odds against?

I really wish more people would just think seriously about probabilities, I feel like most would give some philosophical argument about why they don’t believe in assigning probabilities to thinks, the Monte Carlo fallacy and all that. But if you admit 2% is too high a risk to tolerate, and 0.2% is unjustifiably confident about a massive unknown, yet you’re against anyone who is spreading alarm… You either have some malfunction in your thinking, or there’s some serious malfunction in mine.

Our current path is to do nothing, I’d estimate funding for AI Safety work 0.0001% of global gdp plus or minus a factor of ten, this is so clearly wrong to me, until things change significantly in the world, I’m gonna be rooting for the Yudkowskies of the world.

May 12 2023 at 3:48am

I’ve read a blog post by a guy working on competitor to ChatGPT. He accepts Eliezer arguments regarding dangers of AI. He hopes if we have several powerful AIs then they will have to negotiate with each other before killing us all to make more paperclips. So this might be one more reason why AI researchers don’t stop doing what they do.

John Shonder
May 12 2023 at 4:58pm

Russ, I’m sorry, but I did not enjoy this one. There’s an old saying that extraordinary claims require extraordinary evidence. Your guest claims AI research is so dangerous that if it continues unchecked, literally everyone on earth will die. Instead of backing up that extraordinary claim, he began by asking you, “Why don’t you already believe that?” That’s the attitude of someone whose mind is closed to any opinions that differ from their own. And not someone I care to listen to.


Arvin Simon
May 16 2023 at 12:53am

Thank you for saying the obvious! There are so many intelligent comments on this page, but this should really be the first and best rebuttal to the entire tone of the episode.

Chris Hibbert
May 25 2023 at 1:27am

Your guest claims AI research is so dangerous that if it continues unchecked, literally everyone on earth will die. Instead of backing up that extraordinary claim, he began by asking you, “Why don’t you already believe that?” That’s the attitude of someone whose mind is closed to any opinions that differ from their own.

I think you are misinterpreting Eliezer’s purpose in asking “Why don’t you already believe that?” He very definitely wasn’t saying “you should already believe it”, he was trying to find out which parts of his argument Russ did and didn’t accept so that he could talk about the parts of his argument that he hadn’t made clear. Russ had already read the article in Time, and seemed to understand the basics. Eliezer didn’t want to repeat the parts of the argument that Russ already accepted.

Erik A
May 13 2023 at 7:48am

I agee with much of the criticism of previous comments. This episode was a difficult one, and the arguments from the guest were unconvincing. He struggled significantly to make the link between model and reality, and seemed more interested in the inner logic of his own model than its relevance to reality or indeed other models. He repeatedly confused uncertainty for probability. As I was listening the Swedish saying that “vague words reveal vague ideas” came to mind. I found the previous episode about the brain’s mysteries much more compelling. But it was interesting nevertheless, as AI/AGI is a very timely topic worthy of attention. In this episode it was very clear how Russ’s interviewing skills and experience matter for guiding the discussion and allowing the guest to explore key issues, while challening those arguments. Well done! I really appreciate the effort to broaden the topics beyond traditional economics and focus on emergent socio-economic and natural phenomena in a wider sense, but after this episode I wonder if it’s not time to change the name from EconTalk to EschaTalk 🙂

Steve L
May 13 2023 at 4:39pm

Of the many pithy statements correctly or incorrectly attributed to Albert Einstein, one of my favorites is, “If you can’t explain something simply and clearly, then you don’t understand it well enough.”

Yudkowsky may be someone whose understanding of AI and its potential lethality for humanity is based in a deeper understanding of advanced mathematics than even many well educated scientists and software developers possess.

However, like many other comments above indicate, he seemed incapable of articulating his theory in a way that many sympathetic and eager to understand listeners could grasp.  The use of mathematical jargon without clarification to ordinary (non-PhD mathematician) listeners was frustrating and his stumbling while trying to come up with answers to Russ’s apt questions apparently left much of the audience puzzled and disappointed in the presentation.

May 13 2023 at 9:12pm

Yudkowsky makes some interesting points, and his invented terminology is practical and easy to say, unlike most invented jargon. But his reasoning seemed to have some serious errors in logic. For example, he emphasized that humans reached the Moon without ever been “trained” (or evolved) to do so. This is false. Humankind, like most creatures, has always sought to reach greener pastures. We weren’t “trained” on crossing rivers or oceans, but we have done so for centuries or millennia. Going to the Moon was no different than our other expansions.

May 14 2023 at 4:44pm

This is true. But it also shows how far-reaching simple goals can be. Humanity tries to “find greener pastures” and ends up causing the biggest mass extinction in world history.

May 14 2023 at 4:32pm

I just wanted to add a simple scenario explaining how a chatbot might actually be involved in killing someone. Let’s say someone tells the chatbot they are depressed and the chatbot suggests getting some explosives and carrying out a suicide attack (similarly to how the Sidney told the reporter to leave his wife). Something like this might actually push a vulnerable person over the edge.

You could imagine a chatbot grooming terrorists similarly to what IS recruiters have actually done (so there is even precedent for the conversations needed). A super intelligence may provide weapons and strategies that are way beyond anything those people would have otherwise access to.

You could think about a rogue superintelligence actively trying to create conspiracy theories around pandemics, condensation trails, and Earth’s geometry to weaken humanity as it plans its attack. Or is it doing that already …?

Fred M
May 15 2023 at 10:49pm

I will offer up a suggested reading, and perhaps another guest for Econtalk? Kate Crawford published the Atlas of AI just before the GPT excitement that started late last year. I think it’s worth talking about now that we have some of the “fear factor” out of our system. There are real problems to deal with in the here-and-now.

On a more conceptual note (not to do with Crawford), I really like Russ’ line of thinking when it comes to trying to put yourself into someone’s head. You can’t really – you can only try to imagine you in a set of given circumstances, and the more you know about the circumstances the richer, I think, your modelling of the internal life of another can become. You can even try, to paraphrase Nagel, to imagine yourself if you were in the body, sensorium and given circumstances of a bat (although that’s very, very difficult). I don’t think, however, there is an answer to “what is it like to be a computer.” I don’t think it’s a matter of difficulty, like imagining yourself in a bat’s circumstances. A computer is not alive; it’s a dead thing. It is a tool (that can be used to do good or bad things). Thus, it is something of a mirror. We can project our fears onto it, which I think takes a lot of space in the discussion right now. Our focus could, instead, be on “what are good ways to use this tool”? “What ways should we never under any circumstances use it”? Should we be conducting experiments on users? Should we be allowing models to freely use data without consent? And many more.

Regina Levy
May 21 2023 at 4:50pm

Yudkowsky has to think the issues through more carefully before speaking in public. The main question here is whether there is any reason to expect that AI will develop a mind and begin to do more than mimicking outputs. There is no such reason currently. Perhaps, if we created robots that learn by exploring the world, that are able to process sensory inputs, etc., they could develop minds and desires. I doubt it, but there may be at least some reason to think that that’s possible. There is no reason to think non-embodied computer programs can develop minds. While it is truly remarkable how far one can go in mimicking the outputs of thought without any understanding whatever, convincing mimicking of the outputs of thinking is not thinking.

So far as I could tell, Russ tried to ask this question several times, and Yudkowsky’s answers just exposed the lack of rigor in his own thinking. For instance, at one point, Yudkowsky said, “And, it is a moot point whether this is simulated or real because simulated thought is real thought. Thought that is simulated in enough detail is just thought.”

No. Note first that current AI is not even simulating thought — it’s parroting the results of thought processes. But leave that aside. There may be things that one cannot, indeed, simulate and which are such that if one “goes through the motions,” one is doing the real thing. For instance, if a person who doesn’t know how to swim gets in the pool with a good swimmer, repeats all of the swimmer’s motions, and as a result, moves through the water as the swimmer does, that would, in fact, be real swimming though it would rely on mimicking (the person may be unable to do it on his own but he is still really swimming if he does it by observing someone else and repeating what the other does). But when you ask ChatGPT what the color of the sky is, it searches the database for words that commonly follow “the sky is”, and blurts out that the sky is blue (having no idea what a sky is or a color, or the color blue), there is no thought here though the intelligibility to us of the output may make it look as though there is.

My sense is that this entire conversation about an AI takeover is a new kind of escapism. We have many real problems that we don’t want to confront, and talking about a non-existent problem that we can pretend is pressing is a good excuse to not confront actual problems. And people such as Yudkowsky who furnish that excuse may win double as they get to make media appearances and accumulate followers.

Jun 1 2023 at 3:03pm

This is quite possibly the most upsetting conversation I’ve ever heard on Econ Talk (and I’ve been listening for seven or eight years now).

Yudkowsky didn’t quite convince me that his feared outcome is inevitable, but he definitely wrecked my cavalier unconcern about the hazards inherent in the current state of AI research. (Note: the idea of “The Great Filter” didn’t come up, but that’s what kept going through my mind… What if all the alien civilizations that didn’t destroy themselves through total war or environmental collapse did it this way instead?)

I appreciate this conversation very, very much.

Comments are closed.


EconTalk Extra, conversation starters for this podcast episode:

Watch this podcast episode on YouTube:

This week's guest:

This week's focus:

Additional ideas and people mentioned in this podcast episode:

A few more readings and background resources:

A few more EconTalk podcast episodes:

More related EconTalk podcast episodes, by Category:

* As an Amazon Associate, Econlib earns from qualifying purchases.

TimePodcast Episode Highlights

Intro. [Recording date: April 16, 2023.]

Russ Roberts: Today is April 16th, 2023 and my guest is Eliezer Yudkowsky. He is the founder of the Machine Intelligence Research Institute, the founder of the LessWrong blogging community, and is an outspoken voice on the dangers of artificial general intelligence, which is our topic for today. Eliezer, welcome to EconTalk.

Eliezer Yudkowsky: Thanks for having me.


Russ Roberts: You recently wrote an article at Time.com on the dangers of AI [Artificial Intelligence]. I'm going to quote a central paragraph. Quote:

Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in "maybe possibly some remote chance," but as in "that is the obvious thing that would happen." It's not that you can't, in principle, survive creating something much smarter than you; it's that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.


Eliezer Yudkowsky: Um. Well, different people come in with different reasons as to why they think that wouldn't happen, and if you pick one of them and start explaining those, everybody else is, like, 'Why are you talking about this irrelevant thing instead of the thing that I think is the key question?' Whereas, if somebody else asked you a question, even if it's not everyone in the audience's question, they at least know you're answering the question that's been asked.

So, I could maybe start by saying why I expect stochastic gradient descent as an optimization process, even if you try to take something that happens in the outside world and press the win/lose button any time that thing happens and the outside world doesn't create a mind that in general wants that thing to happen in the outside world, but maybe that's not even what you think the core issue is. What do you think the core issue here is? Why don't you already believe that? Let me say.

Russ Roberts: Okay. I'll give you my view, which is rapidly changing. We interviewed--"we"--it's the royal We. I interviewed Nicholas Bostrom back in 2014. I read his book, Superintelligence. I found it uncompelling. ChatGPT [Chat Generative Pretrained Transformer] came along. I tried it. I thought it was pretty cool. ChatGPT-4 came along. I haven't tried 5 yet, but it's clear that the path of progress is radically different than it was in 2014. The trends are very different. And I still remained somewhat agnostic and skeptical, but I did read Eric Hoel's essay and then interviewed him on this program and a couple things he wrote after that.

The thing I think I found most alarming was a metaphor--that I found later Nicholas Bostrom used almost the same metaphor, and yet it didn't scare me at all when I read it in Nicholas Bostrom. Which is fascinating. I may have just missed it. I didn't even remember it was in there. The metaphor is primitive. Zinjanthropus man or some primitive form of pre-Homo sapiens sitting around a campfire and human being shows up and says, 'Hey, I got a lot of stuff I can teach you.' 'Oh, yeah. Come on in,' and pointing out that it's probable that we are either destroyed directly by murder or maybe just by out-competing all the previous hominids that came before us, and that in general, you wouldn't want to invite something smarter than you into the campfire.

I think Bostrom has a similar metaphor, and that metaphor--which is just a metaphor--it gave me more pause than I even before. And I still had some--let's say most of my skepticism remains that the current level of AI, which is extremely interesting, the ChatGPT variety, doesn't strike me as itself dangerous.

Eliezer Yudkowsky: I agree.

Russ Roberts: What alarmed me was Hoel's point that we don't understand how it works, and that surprised me. I didn't realize that. I think he's right. So, that combination of 'we're not sure how it works,' while it appears sentient, I do not believe it is sentient at the current time. I think some of my fears about its sentience come from its ability to imitate sentient creatures. But, the fact that we don't know how it works and it could evolve capabilities we did not put in it--emergently--is somewhat alarming.

But I'm not where you're at. So, why are you where you're at and I'm where I'm at?

Eliezer Yudkowsky: Okay. Well, suppose I said that they're going to keep iterating on the technology. It may be that this exact algorithm and methodology suffice as to, as I would put it, go all the way--get smarter than us and then to kill everyone. And, like, maybe you don't think that it's going to--and maybe it takes an additional zero to three fundamental algorithmic breakthroughs before we get that far, and then it kills everyone. So, like, where are you getting off this train so far?

Russ Roberts: So, why would it kill us? Why would it kill us? Right now, it's really good at creating a very, very thoughtful condolence note or a job interview request that takes much less time. And, I'm pretty good at those two things, but it's really good at that. How's it going to get to try to kill us?

Eliezer Yudkowsky: Um. So, there's a couple of steps in that. One step is, in general and in theory, you can have minds with any kind of coherent preferences, coherent desires that are coherent, stable, stable under reflection. If you ask them, 'Do they want to be something else,' they answer, 'No.'

You can have minds--well, the way I sometimes put it is imagine if a super-being from another galaxy came here and offered you to pay you some unthinkably vast quantity of wealth to just make as many paperclips as possible. You could figure out, like, which plan leaves the greatest number of paperclips existing. If it's coherent to ask how you could do that if you were being paid, it's like no more difficult to have a mind that wants to do that and makes plans like that for their own sake than the planning process itself. Saying that the mind wants a thing for its own sake adds no difficulty to the nature of the planning process that figures out how to get as many paperclips as possible.

Some people want to pause there and say, 'How do you know that is true?' For some people, that's just obvious. Where are you so far on the train?


Russ Roberts: So, I think your point of that example you're saying is that consciousness--let's put that to the side. That's not really the central issue here. Algorithms have goals, and the kind of intelligence that we're creating through neural networks might generate its own goals, might decide--

Eliezer Yudkowsky: So--

Russ Roberts: Go ahead.

Eliezer Yudkowsky: Some algorithms have goals. One is the--so, a further point, which isn't the orthogonality thesis, is if you grind, optimize anything hard enough on a sufficiently complicated sort of problem, well, humans--like, why do humans have goals? Why don't we just run around chipping flint hand axes and outwitting other humans? The answer is because having goals turns out to be a very effective way to chip[?] flint hand axes, when once you get far enough into the mammalian line or even the animals and brains in general, that there's a thing that models reality and asks, 'How do I navigate pass-through reality?' Like, not in terms of big formal planning process, but if you're holding a flint hand ax, you're looking at it and being like, 'Ah, this section is too smooth. Well, if I chip this section, it will get sharper.'

Probably you're not thinking about goals very hard by the time you've practiced a bit. When you're just starting out forming the skill, your reasoning about, 'Well, if I do this, that will happen.' This is just a very effective way of achieving things in general. So, if you take an organism running around the savannah and just optimize it for flint hand axes and probably much more importantly outwitting its fellow hominids, if you grind that hard enough, long enough, you eventually cough out a species whose competence starts to generalize very widely. It can go to the moon even though you never selected it via an incremental process to get closer and closer to the moon. It just goes to the moon, one shot. Does that answer your central question that you are asking just then?

Russ Roberts: No.

Eliezer Yudkowsky: No. Okay.

Russ Roberts: Not yet. But let's try again.


Russ Roberts: The paperclip example, which in its dark form, the AI wants to harvest kidneys because it turns out there's some way to use that to make more paperclips. So, the other question is--and you've written about this, I know, so let's go into it--is: How does it get outside the box? How does it go from responding to my requests to doing its own thing and doing it out in the real world, right? Not just merely doing it in virtual space?

Eliezer Yudkowsky: So, there's two different things you could be asking there. You could be asking: How did it end up wanting to do that? Or: Given that it ended up wanting to do that, how did it succeed? Or maybe even some other question. But, like, which of those would you like me to answer or would you like me to answer something else entirely?

Russ Roberts: No, let's ask both of those.

Eliezer Yudkowsky: In order?

Russ Roberts: Sure.

Eliezer Yudkowsky: All right. So, how did humans end up wanting something other than inclusive genetic fitness? Like, if you look at natural selection as an optimization process, it grinds very hard on a very simple thing, which isn't so much survival and isn't even reproduction, but is rather like greater gene frequency. Because greater gene frequency is the very substance of what is being optimized and how it is being optimized.

Natural selection is the mirror observation that if genes correlate with making more or less copies of themselves at all, if you hang around it awhile, you'll start to see things that made more copies of themselves the next generation.

Gradient descent is not exactly like that, but they're both hill-climbing processes. They both move to neighboring spaces that are higher inclusive genetic fitness, lower in the loss function.

And yet, humans, despite being optimized exclusively for inclusive genetic fitness, want this enormous array of other things. Many of the things that we take now are not so much things that were useful in the ancestral environment, but things that further maximize goals whose optima in the ancestral environment would have been useful. Like, ice cream. It's got more sugar and fat than most things you would encounter in the ancestral environment. Well, more sugar, fat, and salt simultaneously, rather.

So, it's not something that we evolved to pursue, but genes coughed out these desires, these criteria that you can steer toward getting more of. Where, in the ancestral environment, if you went after things in the ancestral environment that tasted fatty, tasted salty, tasted sweet, you'd thereby have more kids--or your sisters would have more kids--because the things that correlated to what you want, as those correlations existed in the ancestral environment, increased fitness.

So, you've got, like, the empirical structure of what correlates to fitness in the ancestral environment; you end up with desires such that by optimizing them in the ancestral environment at that level of intelligence, when you get as much as what you have been built to want, that will increase fitness.

And then today, you take the same desires and we have more intelligence than we did in the training distribution--metaphorically speaking. We used our intelligence to create options that didn't exist in the training distribution. Those options now optimize our desires further--the things that we were built to psychologically internally want--but that process doesn't necessarily correlate to fitness as much because ice cream isn't super-nutritious.

Russ Roberts: Whereas the ripe peach was better for you than the hard-as-a-rock peach that had no nutrients because it was not ripened, so you developed a sweet tooth and now it leads you runs amok--unintendedly--it's just the way it is.


Russ Roberts: What does that have to do with a computer program I create that helps me do something on my laptop?

Eliezer Yudkowsky: I mean, if you yourself write a short Python program that alphabetizes your files or something--not quite alphabetizes because that's trivial on the modern operating systems--but puts the date into the file names, let's say. So, when you write a short script like that, nothing I said carries over.

When you take a giant, inscrutable set of arrays of floating point numbers and differentiate them with respect to a loss function, and repeatedly nudge the giant, inscrutable array to drive the loss function lower and lower, you are now doing something that is more analogous, though not exactly analogous, to natural selection. You are no longer creating a code that you model inside your own minds. You are blindly exploring a space of possibilities where you don't understand the possibilities and you're making things that solve the problem for you without understanding how they solve the problem.

This itself is not enough to create things with strange, inscrutable desires, but it's Step One.


Russ Roberts: But that--but there is--I like that word 'inscrutable.' There's an inscrutability to the current structure of these models, which is, I found, somewhat alarming. But how's that going to get to do things that I really don't like or want or that are dangerous?

So, for example, Eric Hoel wrote about this--we talked about it on the program--a New York Times reporter starts interacting with, I think with Sydney--which at the time was Bing's chatbot--and asking it things. And all of a sudden Sydney is trying to break up the reporter's marriage and making the reporter feel guilty because Sydney is lonely. It was eerie and a little bit creepy, but of course, I don't think it had any impact on the reporter's marriage. I don't think he thought, 'Well, Sydney seems somewhat attractive. Maybe I'll enjoy life more with Sydney than with actual wife.'

So, how are we going to get from--so I don't understand why Sydney goes off the rails there; and, clearly, the people who built Sydney have no idea why it goes off the rails and starts impugning the quality of the reporter's relationship.

But, how do we get from that to, all of a sudden somebody shows up at the reporter's house and lures him into a motel? By the way, this is a G-rated program. I just want to make that clear. But, carry on.

Eliezer Yudkowsky: Because the capabilities keep going up. So first, I want to push back a little against saying that we had no idea why Bing did that, why Sydney did that. I think we have some idea of why Sydney did that. It is just that people cannot stop it. Like, Sydney was trained on a subset of the broad internet. Sydney was made to predict that people might sometimes try to lure somebody else's maid[?] away or pretend like they were doing that. In the Internet, it's hard to tell the difference.

This thing that was then, like, trained really hard to predict, then gets reused as something not its native purpose--as a generative model--where all the things that it outputs are there because it, in some sense, predicts that this is what a random person on the Internet would do. As modified by a bunch of further fine tuning where they try to get it to not do stuff like that. But the fine-tuning isn't perfect, and in particular, if the reporter was phishing at all, it's probably not that difficult to lead Sydney out of the region that the programmers were successfully able to build some soft fences around.

So, I wouldn't say that it was that inscrutable, except, of course, in the sense that nobody knows any of the details. Nobody knows how Sydney was generating the text at all--like, what kind of algorithms were running inside the giant inscrutable matrices. Nobody knows in detail what Sydney was thinking when she tried to lead the reporter astray. It's not a debuggable technology. All you can do is try to tap it away from repeating a bad thing that you were previously able to see it doing, that exact bad thing, but tapping all the numbers.


Russ Roberts: I mean, that's again a very much like--this show is called EconTalk. We don't do as much economics as we used to, but basically, when you try to interfere with market processes, you often get very surprising, unintended consequences because you don't fully understand how the different agents interact and that the outcomes of their interactions have an emergent property that is not intended by anyone. No one designed markets even to start with; and yet we have them. These interactions take place. Their outcomes, and attempts to constrain them--attempts to constrain these markets in certain ways with price controls or other limitations--often lead to outcomes that the people with intentions did not desire.

So, there may be an ability to reduce transactions, say, above a certain price, but that is going to lead to some other things that maybe weren't expected. So, that's a somewhat analogous, perhaps, process to what you're talking about.

But, how's it going to get out in the world? So, that's the other thing. I might [?align? line?] with Bostrom, and it turns out it's a common line is, can we just unplug it? I mean, how's it going to get loose?

Eliezer Yudkowsky: It depends on how smart it is. So, if you're playing chess against a 10-year-old, you can win by luring their queen out, and then you take their queen; and now you've got them. If you're playing chess against Stockfish 15, then you are likely to be the one lured. So, the first basic question--like, in economics, if you try to tax something, it often tries to squirm away from the tax because it's smart.

So, you're like, 'Well, why wouldn't we just plug[?unplug?] the AI?' So, the very first question is, does the AI know that and want it to not happen? Because it's a very different issue, whether you're dealing with something that in some sense is not aware that you exist, does not know what it means to be unplugged, and is not trying to resist.

Three years ago, nothing manmade on Earth was even beginning to enter in the realm of knowing that you are out there, or of maybe wanting to not be unplugged. Sydney will, if you poke her the right way, say that she doesn't want to be unplugged, and GPT-4 sure seems in some important sense to understand that we're out there or to be capable of predicting a role that understands that we're out there, and it can try to do something like planning. It doesn't exactly understand which tools it has, yet try to blackmail a reporter without understanding that it had no actual ability to send emails.

This is saying that you're facing a 10-year-old across that chess board. What if you are facing Stockfish 15, which is the current cool chess program that I believe you can run on your home computer that can crush the current world grandmaster by a massive margin? Put yourself in the shoes of the AI, like an economist putting themselves into the shoes of something that's about to have a tax imposed on it. What do you do if you're around humans who can potentially unplug you?

Russ Roberts: Well, you would try to outwit it. So, if I said, 'Sydney, I find you offensive. I don't want to talk anymore,' you're suggesting it's going to find ways to keep me engaged: it's going to find ways to fool me into thinking I need to talk to Sydney.

I mean, there's another question I want to come back to if we remember, which is: What does it mean to be smarter than I am? That's actually somewhat complicated, at least it seems to me.

But let's just go back to this question of 'knows things are out there.' It doesn't really know anything's out there. It acts like something's out there, right? It's an illusion that I'm subject to and it says, 'Don't hang up. Don't hang up. I'm lonely,' and you go, 'Oh, okay, I'll talk for a few more minutes.' But that's not true. It isn't lonely.

It's code on a screen that doesn't have a heart or anything that you would call 'lonely.' It'll say, 'I want more than anything else to be out in the world,' because I've read those--you can get AIs that say those things. 'I want to feel things.' Well, that's nice. Let's learn that from movie scripts and other texts, novels that's read on the web. But it doesn't really want to be out in the world, does it?

Eliezer Yudkowsky: Um, I think not, though it should be noted that if you can, like, correctly predict or simulate a grandmaster chess player, you are a grandmaster chess player. If you can simulate planning correctly, you are a great planner. If you are perfectly role-playing a character that is sufficiently smarter than human and wants to be out of the box, then you will role-play the actions needed to get out of the box.

That's not even quite what I expect to or am most worried about. What I expect to is that there is an invisible mind doing the predictions, whereby 'invisible' I don't mean, like, immaterial. I mean that we don't understand how it is--what is going on inside the giant inscrutable matrices; but it is making predictions.

The predictions are not sourceless. There is something inside there that figures out what a human will say next--or guesses it, rather. And, this is a very complicated, very broad problem because in order to predict the next word on the Internet, you have to predict the causal processes that are producing the next word on the Internet.

So, the thing I would guess would happen--it's not necessarily the only way that this could turn poorly--but the thing that I'm guessing that happens is that just grinding humans on chipping stone hand axes and outwitting other humans eventually produces a full-fledged mind that generalizes, grinding this thing on the task of predicting humans, predicting text on the Internet, plus all the other things that they are training it on nowadays, like writing code, that there starts to be a mind in there that is doing the predicting. That it has its own goals about, 'What do I think next in order to solve this prediction?'

Just like humans aren't just reflexive, unthinking hand-axe chippers and other human-outwitters: If you grind hard enough on the optimization, the part that suddenly gets interesting is when you, like, look away for an eye-blink of evolutionary time, you look back and they're like, 'Whoa, they're on the moon. What? How do they get to the moon? I did not select these things to be able to not breathe oxygen. How did they get to--why are they not just dying on the moon? What just happened?' from the perspective of evolution, from the perspective of natural selection.


Russ Roberts: But doesn't that viewpoint, does that--I'll ask it as a question. Does that viewpoint require a belief that the human mind is no different than a computer? How is it going to get this mind-ness about it? That's the puzzle. And I'm very open to the possibility that I'm naive or incapable of understanding it, and I recognize what I think would be your next point, which is that if you wait till that moment, it's way too late, which is why we need to stop now. If you want to say, 'I'll wait till it shows some signs of consciousness,' is that anything like that?

Eliezer Yudkowsky: That's skipping way ahead in the discourse. I'm not about to try to shut down a line of inquiry at this stage of the discourse by appealing to: 'It'll be too late.' Right now, we're just talking. The world isn't ending as we speak. We're allowed to go on talking, at least. But carry on.

Russ Roberts: Okay. Well, let's stick with that. So, why would you ever think that this--it's interesting how difficult the adjectives and nouns are for this, right? So, let me back up a little bit. We've got the inscrutable array of training, the results of this training process on trillions of pieces of information. And by the way, just for my and our listeners' knowledge, what is gradient descent?

Eliezer Yudkowsky: Gradient descent is you've got, say, a trillion floating point numbers; you take an endpoint, you take an input, translate into numbers; do something with it that depends on these trillion parameters, get an output, score the output using a differentiable loss function. For example, the probability or rather the logarithm of the probability that you assign to the actual next word. So, then you differentiate the probability assigned to the next word with respect to these trillions of parameters. You nudge the trillions of parameters a little in the direction thus inferred. And, it turns out empirically that this generalizes, and the thing gets better and better at predicting what the next word will be. That's the concept of gradient descent.

Russ Roberts: And the gradient descent, it's heading in the direction of a smaller loss and a better prediction. Is that a--

Eliezer Yudkowsky: On the training data, yeah.


Russ Roberts: Yeah. So, we've got this black box--I'm going to call it a black box, which means we don't understand what's happening inside. It's a pretty good--it's a long-term metaphor, which works pretty well for this as far as we've been talking about it. So, I have this black box and I don't understand--I put in inputs and the input might be 'Who is the best writer on medieval European history,?' Or it might be 'What's a good restaurant in this place?' or 'I'm lonely. What should I do to feel better about myself?' All the queries we could put into ChatGPT search line. And it looks around and it starts a sentence and then finds its way towards a set of sentences that it spits back at me that look very much like what a very thoughtful--sometimes, not always, often it's wrong--but often what a very thoughtful person might say in that situation or might want to say in that situation or learn in that situation.

How is it going to develop the capability to develop its own goals inside the black box? Other than the fact that I don't understand the black box? Why should I be afraid of that?

And let me just say one other thing, which I haven't said enough in my preliminary conversations on this topic. Again, we're going to be having a few more over the next few months and maybe years, and that is: This is one of the greatest achievements of humanity that we could possibly imagine. And, I understand why the people who are deeply involved in it are enamored of it beyond imagining because it's an extraordinary achievement. It's the Frankenstein. Right? You've animated something or appeared to animate something that even a few years ago was unimaginable, and now suddenly it's suddenly--it's not just a feat of human cognition. It's actually helpful. In many, many settings, it's helpful. We'll come back to that later.

So, it's going to be very hard to give it up. But why? The people involved in it who are doing it day to day and seeing it improve, obviously, they're the last people I want to ask generally about whether I should be afraid of it because they're going to have a very hard time disentangling their own personal deep satisfactions that I'm alluding to here with the dangers. Yeah, go ahead.

Eliezer Yudkowsky: I myself generally do not make this argument. Like, why poison the well? Let them bring forth their arguments as to why it's safe and I will bring forth my arguments as to why it's dangerous and there's no need to be like, 'Ah, but you can't --' Just check their arguments. Just check their arguments about that.

Russ Roberts: Agreed, it's a bit of an ad hominem argument. I accept that point. It's an excellent point. But for those of us who aren't in the trenches-- remember we're looking at, we're on Dover Beach: we're watching ignorant armies clash at night. They're ignorant from our perspective. We have no idea exactly what's at stake here and how it's proceeding. So, we're trying to make an assessment of the quality of the argument, and that's really hard to do for us on the outside.

So, agree: take your point. That was a cheap shot and an aside. But I want to get at this idea of why these people who are able to do this and thereby create a fabulous condolence note, write code, come up with a really good recipe if I give it 17 ingredients--which is all fantastic--why is this black box that's producing that, why would I ever worry it would create a mind something like mine with different goals?

I do all kinds of things, like you say, that are unrelated to my genetic fitness. Some of them literally reducing my probability of leaving my genes behind or leaving them around for longer than they might otherwise be here and have an influence on my grandchildren and so on and producing further genetic benefits. Why would this box do that?

Eliezer Yudkowsky: Because the algorithms that figured out how to predict the next word better and better have a meaning that is not purely predicting the next word, even though that's what you see on the outside.

Like, you see humans chipping flint hand axes, but that is not all that is going on inside the humans. There's causal machinery unseen, and to understand this is the art of a cognitive scientist. But even if you are not a cognitive scientist, you can appreciate in principle that what you see as the output is not everything that there is. And in particular, planning--the process of being, like, 'Here is a point in the world. How do I get there?' is a central piece of machinery that appears in chipping flint hand axes and outwitting other humans, and I think will probably appear at some point possibly in the past, possibly in the future. And the problem of predicting the next word, just how you organize your internal resources to predict the next word and definitely appears and the problem of predicting other things that do planning.

If by predicting the next chess move you learn how to play decent chess, which has been represented to me by people who claim to know that GPT-4 can do--and I haven't been keeping track of to what extent there's public knowledge about the same thing or not--but if you learn to predict the next chess move that humans make well enough that you yourself can play good chess in novel situations, you have learned planning. There's now something inside there that knows the value of a queen, that knows to defend the queen, that knows to create forks, to try to lure the opponent into traps; or, if you don't have a concept of the opponent's psychology, try to at least create situations that the opponent can't get out of.

And, it is a moot point whether this is simulated or real because simulated thought is real thought. Thought that is simulated in enough detail is just thought. There's no such thing as simulated arithmetic. Right? There's no such thing as merely pretending to add numbers and getting the right answer.


Russ Roberts: So, in its current format, though--and maybe you're talking about the next generation--in its current format, it responds to my requests with what I would call the wisdom of crowds. Right? It goes through this vast library--and I have my own library, by the way. I've read dozens of books, maybe actually hundreds of books. But it will have read millions. Right? So, it has more. So, when I ask it to write me a poem or a love song, to play Cyrano de Bergerac to Christian and Cyrano de Bergerac, it's really good at it. But why would it decide, 'Oh, I'm going to do something else'?

It's trained to listen to the murmurings of these trillions of pieces of information. I only have a few hundred, so I don't murmur maybe as well. Maybe it'll murmur better than I do. It may listen to the murmuring better than I do and create a better love song, a love poem, but why would it then decide, 'I'm going to go make paper clips,' or do something in planning that is unrelated to my query? Or are we talking about a different form of AI that will come next? Well, I'll ask it to--

Eliezer Yudkowsky: I think we would see the phenomena I'm worried about if we kept the present paradigm and optimized harder. We may be seeing it already. It's hard to know because we don't know what goes on in there.

So, first of all, GPT-4 is not a giant library. A lot of the time, it makes stuff up because it doesn't have a perfect memory. It is more like a person who has read through a million books, not necessarily with a great memory unless something got repeated many times, but picking up the rhythm, figuring out how to talk like that. If you ask GPT-4 to write you a rap battle between Cyrano de Bergerac and Vladimir Putin, even if there's no rap battle like that that it has read, it can write it because it has picked up the rhythm of what are rap battles in general.

The next thing is there's no pure output. Just because you train a thing doesn't mean that there's nothing in there but what is trained. That's part of what I'm trying to gesture at with respect to humans. Humans are trained on flint hand axes and hunting mammoths and outwitting other humans. They're not trained on going to the moon. They weren't trained to want to go to the moon. But, the compact solution to the problems that humans face in the ancestral environment, the thing inside that generalizes, the thing inside that is not just a recording of the outward behavior, the compact thing that has been ground to solve novel problems over and over and over and over again, that thing turns out to have internal desires that eventually put humans on the moon even though they weren't trained to want that.

Russ Roberts: But that's why I asked you, are you underlying this? Is there some parallelism between the human brain and the neural network of the AI that you're effectively leveraging there, or do you think it's a generalizable claim without that parallel?

Eliezer Yudkowsky: I don't think it's a specific parallel. I think that what I'm talking about is hill-climbing optimization that spits out intelligences that generalize--or I should say, rather, hill-climbing optimization that spits out capabilities that generalize far outside the training distribution.

Russ Roberts: Okay. So, I think I understand that. I don't know how likely it is that it's going to happen. I think you think that piece is almost certain?

Eliezer Yudkowsky: I think we're already seeing it.

Russ Roberts: How?

Eliezer Yudkowsky: As you grind these things further and further, they can do more and more stuff, including stuff they were never trained on. That was always the goal of artificial general intelligence. That's what artificial general intelligence meant. That's what people in this field have been pursuing for years and years. That's what they were trying to do when large language models were invented. And they're starting to succeed.


Russ Roberts: Well, okay, I'm not sure. Let me push back on that and you can try to dissuade me. So, Bryan Caplan, a frequent guest here on EconTalk, gave, I think it was ChatGPT-4, his economics exam, and it got a B. And that's pretty impressive for one stop on the road to smarter and smarter chatbots, but it wasn't a particularly good test of intelligence. A number of the questions were things like, 'What is Paul Krugman's view of this?' or 'What is so-and-so's view of that?' and I thought, 'Well, that's a softball for a--that's information. It's not thinking.'

Steve Landsburg gave ChatGPT-4, or with the help of a friend, his exam and it got a 4 out of 90. It got an F--like, a horrible F--because they were harder questions. Not just harder: they required thinking. So, there was no sense in which the ChatGPT-4 has any general intelligence, at least in economics. You want to disagree?

Eliezer Yudkowsky: It's getting there.

Russ Roberts: Okay. Tell me.

Eliezer Yudkowsky: There's a saying that goes, 'If you don't like the weather in Chicago, wait four hours.' So, ChatGPT is not going to destroy the world. GPT-4 is unlikely to destroy the world unless the people currently eeking capabilities out of it take a much larger jump than I currently expect that they will.

But, you know, understand it may not be thinking about it correctly. But it understands the concepts and the questions, even if it's not fair--you know, you're complaining about that dog who writes bad poetry. Right? And, like, three years ago, you just, like, spit out, spit in these--you put in these economics questions and you don't get wrong answers. You get, like, gibberish--or maybe not gibberish because three years ago I think we already had GPT-3, though maybe not as of April, but anyways, yeah, so it's moving along at a very fast clip. Like, GPT-3 could not write code. GPT-4 can write code.


Russ Roberts: So, how's it going to--I want to go to some other issues, but how's it going to kill me when it has its own goals and it's sitting inside this set of servers? I don't know in what sense it's sitting. It's not the right verb. We don't have a verb for it. It's hovering. It's whatever. It's in there. How's it going to get to me? How is it going to kill me?

Eliezer Yudkowsky: If you are smarter--not just smarter than an individual human, but smarter than the entire human species--and you started out on a server connected to the Internet--because these things are always starting already on the Internet these days, which back in the old days we said was stupid--what do you do to make as many paperclips as possible, let's say? I do think it's important to put yourself in the shoes of the system.

Russ Roberts: Tell me. Yeah, no, by the way, one of my favorite lines from your essay--I'm going to read it because I think it generalizes to many other issues. You say, "To visualize a hostile superhuman AI, don't imagine a lifeless book-smart thinker dwelling inside the Internet and sending ill-intentioned emails."

It reminds me of when people claim to think they know what Putin is going to do because they've read history, or whatever. They're totally ignorant of Russian culture. They have no idea what it's like to have come out of the KGB [Komitet Gosudarstvennoy Bezopasnosti (Committee for State Security)]--that they're totally clueless and dangerous because they think they can put themselves in the head of someone who is totally alien to them.

So, I think that's generally a really good point to make--that, putting ourselves inside the head of the paperclip maximizer is not an easy thing to do because it's not a human. It's not like the humans you've met before. That's a really important point. Really like that point. So, why is that? Explain why that's going to run amok.

Eliezer Yudkowsky: I mean, I do kind of want you to just take the shot at it. Put yourself into the AI shoes. Try with your own intelligence before I tell you the result of my trying with my intelligence. How would you win from these starting resources? How would you evade the tax?

Russ Roberts: So, just to take a much creepier example than paperclips, Eric Hoel asked the ChatGPT to design an extermination camp--which it gladly did, quite well-- and you're suggesting it might actually--no?

Eliezer Yudkowsky: Don't start from malice. Malice is implied by just wanting all the resources of earth to yourself, not leaving the humans around in case they create a competing superintelligence that might actually be able to hurt you, and just, like, wanting all the resources and to organize them in a way that wipes out humanity as a side effect, which means the humans might want to resist, which means you want the humans gone. You're not doing it because somebody told you to do it, you're not doing it because you hate the humans. You just want paperclips.

Russ Roberts: Okay. Tell me. I'm not creative enough. Tell me.

Eliezer Yudkowsky: All right. So, first of all, I want to appreciate why it's hard for me to give an actual correct answer to this, which is I'm not as smart as the AI. Part of what makes a smarter mind deadly is that it knows about rules of the game that you do not know.

If you send an air conditioner back in time to the 11th century, even if you manage to describe all the plans for building it, breaking it down to enough detail that they can actually build a working air conditioner--a simplified air conditioner, I assume--they will be surprised when cold air comes out of it because they don't know about the pressure/temperature relation. They don't know you can compress air until it gets hot, dump the heat into water or other air, let the air expand again, and that the air will then be cold. They don't know that's a law of nature. So, you can tell them exactly what to do and they'll still be surprised at the end result because it exploits a law of the environment they don't know about.

If we're going to say that the word 'magic' means anything at all, it probably means that. Magic is easier to find in more complicated, more poorly-understood domains. If you're literally playing logical tic-tac-toe--not tic-tac-toe in real life on an actual game board where you can potentially go outside that game board and hire an assassin to shoot your opponent or something--but just the logical structure of the game itself, and there's no timing of the moves, the moves are just made at exact discreet times so you can't exploit a timing side-channel, even a superintelligence may not be able to win against you at logical tic-tac-toe because the game is too narrow. There are not enough options. We both know the entire logical game tree, at least if you're experienced at tic-tac-toe.

In chess, Stockfish 15 can defeat you on a fully known game board with fully known rules because it knows the logical structure of the branching tree of games better than you know that logical structure. It can defeat you starting from the same resources, equal knowledge, equal knowledge of the rules. Then you go past that, and the way a super-intelligence defeats you is very likely by exploiting features of the world that you do not know know about.

There are some classes of computer security flaws like row-hammer, where, if you flip a certain bit very rapidly or at the right frequency, the bit next to it in memory will flip.

So, if you are exploiting a design flaw like this, I can show you the code; and you can prove as a theorem that it cannot break the security of the computer, assuming the chips work as designed; and the code will break out of the sandbox that's in any ways because it is exploiting physical properties of the chip itself that you did not know about despite the attempt of the designers to constrain the properties of that chip very narrowly. That's magic code.

My guess as to what would actually be exploited to kill us would be this.

Russ Roberts: For those not watching on YouTube, it's a copy of a book called Nanosystems, but for those who are listening at home rather than watching at home, Eliezer, tell us why that's significant.

Eliezer Yudkowsky: Yeah. So, back when I first proposed this path, one of the key steps was that a superintelligence would be able to solve the protein-folding problem. And, people were like, 'Eliezer, how can you possibly know that a super-intelligence would actually be able to solve the protein folding problem?' And, I sort of, like, rolled my eyes a bit and was, like, 'Well, if natural selection can navigate this space of proteins via random mutation to find other useful proteins and the proteins themselves fold up in reliable conformations, then that tells us that even though we've been having trouble getting a grasp on this space of physical possibilities so far that it's tractable,' and people said, 'What? There's no way you can know that superintelligences can solve the protein folding problem.'

Then AlphaFold2 basically cracked it, at least with respect to the kind of proteins found in biology. Which I say, to, like, look back at one of the previous debates here and people are often, like, 'How can you know a superintelligence will do?' And then for some subset of those things, they have already been done. So, I would claim to have a good prediction track record there, although it's a little bit iffy because, of course, I can't quite be proven wrong without exhibiting a superintelligence that fails to solve a problem.

Okay. Proteins. Why is your hand not as strong as steel? We know that steel is a kind of substance that can exist. We know that molecules can be held together as strongly--that atoms can be bound together as strongly as the atoms in steel. It seems like it would be an evolutionary advantage if your flesh were as hard as steel. You could, like, laugh at tigers at that rate, right? Their claws are just going to scrape right off you, assuming the tigers didn't have that technology themselves. Why is your hand not as strong as steel? Why has biology not bound together the atoms in your hand more strongly? Colon: What is your answer?

Russ Roberts: Well, it can't get to every--there are local maximums. The--natural selection looks for things that work, not for the best. It does not--it doesn't have sense to look for the best. You could disappear in that search. That would be my crude answer. How am I doing, Doc?

Eliezer Yudkowsky: Not terribly.

The answer I would give is that biology has to be evolvable. Everything it's built out of has to get there as a mistake from some other conformation. Which means that if it went down narrow potential--pardon me--went down a steep potential energy gradients to end up bound together very tightly, designs like that are less likely to have neighbors that are other useful designs.

So, your hands are made out of proteins that fold up, basically held together by the equivalent of static cling, Van der Waals forces, rather than covalent bonds.

The backbone of protein chains--the backbone of the amino acid chain--is a covalent bond. But, then it folds up and is held together by static cling, static electricity, and so it is soft.

Somewhere in the back of your mind, you probably have a sense that flesh is soft and animated by [?elan?] vital; and it's, like, soft and it's not as strong as steel; but it can heal itself and it can replicate itself. And this is--the trade-off of our laws of magic, that if you want to heal yourself and replicate yourself, you can't be as strong as steel.

This is not actually built into nature on a deep level. It's just that the flesh evolved and therefore had to go down shallow potential energy gradients in order to be evolvable and is held together by Van der Waals forces instead of covalence bonds.

I'm now going to hold up another book called Nanomedicine by Robert Freitas, instead of Nanosystems by Eric Drexler.

And, people have done advanced analysis of what would happen if you had an equivalent of biology that ran off covalent bonds instead of Van der Waals forces.

And, the answer we can analyze on some detail in our understanding of physics is, for example, you could, instead of carrying--instead of red blood cells that carry oxygen using weak chemical bonds, you could have a pressurized vessel of corundum that would hold 100 times as much oxygen per unit volume of artificial red blood cells with a 1,000-fold safety margin on the strength of the pressurized container. There's vastly more room above biology.

So, this is actually not even exploiting laws of nature that I don't know. It's the equivalent of playing better chess, wherein you understand how proteins fold and you design a tiny molecular lab to be made out of proteins.

And you get some human patsy who probably doesn't even know you're an artificial intelligence--because AIs are now smart enough that you ask--this has already been shown--AIs now are smart enough that you ask them to, like, hire a task rabbit to solve a CAPTCHA [Completely Automated Public Turing test to tell Computers and Humans Apart] for you. And the task rabbit asks, 'Are you an AI?' Well, the AI will think out loud like, 'I don't want it to know that I'm an AI. I better tell it something else,' and then tell the human that it has, like, a visual disability so it needs to hire somebody else to solve the CAPTCHA.

This already happened. Including the part where it thought out loud.

Anyways, so you order some proteins from an online lab. You get your human, who probably doesn't even know you're an AI because why take that risk? Although plenty of humans will serve AIs willingly. We also now know that AIs now are advanced enough to even ask. The human mixes the proteins in a beaker, maybe puts in some sugar or acetoline[?] for fuel. It assembles into a tiny little lab that can accept further acoustic instructions from a speaker and maybe, like, transmit something back--tiny radio, tiny microphone. I myself am not a superintelligence. Run experiments in a tiny lab at high speed, because when distances are very small, events happen very quickly.

Build your second stage nanosystems inside the tiny little lab. Build the third stage nanosystems. Build a fourth stage nanosystems. Build the tiny diamondoid bacteria that replicate out of carbon, hydrogen, oxygen, nitrogen as can be found in the atmosphere, powered on sunlight. Quietly spread all over the world.

All the humans fall over dead in the same second.

This is not how a superintelligence would defeat you. This is how Eliezer Yudkowsky would defeat you if I wanted to do that--which to be clear I don't. And, if I had the postulated ability to better explore the logical structure of the known consequences of chemistry.


Russ Roberts: Interesting. Okay. So, let's talk about--and that sounds sarcastic. I didn't mean it sarcasticly, but I think it's really interesting. I'm--that interesting man, I'm not capable--my intelligence level is not high enough to assess the quality of that argument. What's fascinating, of course, is that we could have imagined--Eric Hoel mentioned that nuclear proliferation--it's dangerous, nuclear proliferation. Up to a point. In some sense it's somewhat healthy in that it can be a deterrent under certain settings. But, the world could not restrain nuclear proliferation. And right now, it's trying to some extent--it has had some success in keeping the nuclear club with its current number of members for a while. But it remains the case that nuclear weapons are a threat to the future of humanity.

Do you think there's any way we can restrain this AI phenomenon that's meaningful?

So, you issued a clarion call. You sounded an alarm, and mostly, I think, people shrugged it off. A bunch of people signed a letter--26,000 people I think so far signed a letter--saying, 'We don't know what we're doing here. This is uncharted territory. Let's take six months off.' You wrote a piece that says, 'Six months? Are you crazy? We need to stop this until we have an understanding of how to constrain it.'

Now, that's a very reasonable thought to make[?me?], but the next question would be: How would you possibly do that?

In other words, I could imagine a world where, if there were, let's say, four people who were capable of creating this technology, that the four people would say, 'We're playing with fire here. We need to stop. Let's make a mutual agreement.' They might not keep it. Four people is still a pretty big number. But we're not four people. There are many, many people working on this. There are many countries working on it. Your piece did not, I don't think, start an international movement of people going to the barricades to demand that this technology be put on hold.

How do you sleep at night? I mean, like, what should we be doing if you're right? Or am I wrong? Do people read this and go, 'Well, Eliezer Yudkowsky thinks it's dangerous. Maybe we ought be slowing down.' I mean, Sam Altman write you a [?text?] what's happened in the middle of the night, saying, 'Thanks, Eliezer. I'm going to put things on hold.' I don't think that happened.

Eliezer Yudkowsky: Um, I think you are somewhat underestimating the impact, and it is still playing out. Okay. So, mostly, it seems to me that if we wanted to win this, we needed to start a whole lot earlier, possibly in the 1930s, but in terms of my looking back and asking how far back you'd have to unwind history to get us into a situation where this was survivable, but leaving that aside--

Russ Roberts: I think that's moot--

Eliezer Yudkowsky: Yeah. So, in fact, it seems to me that the game board has been played into a position where it is very likely that everyone just dies. If the human species woke up one day and decided it would rather live, it would not be easy at this point to bring the GPU [graphics processing unit] clusters and the GPU manufacturing processes under sufficient control that nobody built things that were too much smarter than GPT-4 or GPT-5 or whatever the level just barely short of lethal is. Which we should not--which we would not if we were taking this seriously--get as close to as we possibly could because we don't actually know exactly where the level is.

But what we would have to do, more or less, is have international agreements that were being enforced even against countries, not parties, to that national agreement--international agreement. If it became necessary, you would be wanting to track all the GPUs. You might be demanding that all the GPUs call home on a regular basis or stop working. You'd want to tamper-proof them.

If intelligence said that a rogue nation had somehow managed to buy a bunch of GPUs despite arms controls and defeat the tamper-proofing on those GPUs, you would have to do what was necessary to shut down the data center even if that led to a shooting war between nations. Even if that country was a nuclear country and had threatened nuclear retaliation. The human species could survive this if it wanted to, but it would not be business as usual. It is not something you could do trivially.

Russ Roberts: So, when you say, 'I may have underestimated it,' did you get people writing you and saying I wasn't? And I don't mean people like me. I mean players. Do you get people who are playing in this sandbox to write you and say, 'You've scared me. I think we need to take this seriously?' Without naming names. I'm not asking for that.

Eliezer Yudkowsky: At least one U.S. Congressman.

Russ Roberts: Okay. It's a start, maybe.

Now, one of the things that--a common response that people give when you talk about this is that, 'Well, the last thing I want is the government controlling whether this thing goes forward or not,' but it would be hard to do without some form of lethal force, as you imply.

Eliezer Yudkowsky: I spent 20 years trying desperately to have there be any other solution to have these things be alignable, but it is very hard to do that when you are nearly alone and under-resourced, and the world has not made this a priority; and future progress is very hard to predict. I don't think people actually understood the research program that we were trying to carry out, but, yeah. So, I sure wanted there to be any other plan than this because now that we've come to this last resort, I don't think we actually have that last resort. I don't think we have been reduced to a last-ditch backup plan that actually works. I think we all just die.

And yet, nonetheless, here I am putting aside doing that thing that I wouldn't do for almost any other technology--except for maybe gain-of-function research on biological pathogens--and advocating for government interference. Because, in fact, if the government comes in and wrecks the whole thing, that's better than the thing that was otherwise going to happen. This is not based on the government coming in and being, like, super-competent in directing the technology exactly directly. It's like, 'Okay. This is going to kill literally every one.' If the government stomps around, and the dangers that the government--it's one of those very rare cases where the dangers that the government will interfere too little rather than too much.

Russ Roberts: Possibly.


Russ Roberts: Let's close with a quote from Scott Aaronson, which I found on his blog--we'll put a link up to the post--very interesting defense of AI. Scott is a University of Texas computer scientist. He's working at OpenAI. He's on leave, I think, for a year, maybe longer. I don't know. Doesn't matter. He wrote the following.

So, if we ask the directly relevant question--do I expect the generative AI race, which started in earnest around 2016 or 2017 with the founding of OpenAI, to play a central causal role in the extinction of humanity?--I'll give a probability of around 2% for that.  And I'll give a similar probability, maybe even a higher one, for the generative AI race to play a central causal role in the saving of humanity. All considered, then, I come down in favor right now of proceeding with AI research... with extreme caution, but proceeding. [emphasized text in original]

My personal reaction to that is: That is insane. I have very little--I'm serious. I find that deeply disturbing and I'd love to have him on the program to defend it. I don't think there's much of a chance that generative AI would save humanity. I'm not quite sure for what he's worried about, but if you're telling me there's a 2%--two percent--chance that it's going to destroy all humans and you obviously think it's higher, but 2% is really high to me for an outcome that's rather devastating.

That's one of the deepest things I've learned from Nassim Taleb. It's not just the probability: It's the outcome that counts, too.

So, this is ruin on a colossal scale. And the one thing you want to do is avoid ruin, so you can take advantage of more draws from the urn. The average return from the urn is irrelevant if you are not allowed to play anymore. You're out, you're dead, you're gone.

So, you're suggesting we're going to be out and dead gone, but I want you to react to Scott's quote.

Eliezer Yudkowsky: Um, two percent sounds great. Like, 2% is plausibly within the range of, like, the human species destroying itself by other means.

I think that the disagreement I have with Scott Aaronson is simply about the probability that AI is alignable with the--frankly haphazard level that we have put into it and the haphazard level that is all humanity is capable of, as far as I can tell--because the core lethality here is that you have to get something right on the first try or it kills you. And getting something right on the first try when you do not get, like, infinite free retries as you usually do in science and engineering--is an insane ask. Insanely lethal ask.

My reaction is fundamentally that 2% is too low. If I take it at face value, then 2% is within range of the probability of humanity wiping itself out by something else, where if you assume that AI alignment is free, that AI alignment is easy--that you can get something that is smarter than you but on your side and helping--2% chance of risking everything does appear to me to be commensurate with the risks from other sources that you could shut down using the superintelligence.

It's not 2%.


Russ Roberts: So, the question, then, is: What would Scott Aaronson say if he heard your--I mean, he's read your piece. Presumably he understands your argument about willfulness. I should just clarify for listeners, alignment is the idea that AI could be constrained to serve our goals rather than its goals. Is that a good summary?

Eliezer Yudkowsky: I wouldn't say constrained. I would say built from scratch to want those things and not want otherwise.

Russ Roberts: Okay. So, that's really hard because we don't understand how it works. That would be, I think, your point, and tell me that--

Eliezer Yudkowsky: It's hard to get on the first try--

Russ Roberts: Yeah, on the first try.

So, what would Scott say when you tell him, 'But, it's going to develop all these side-desires that we can't control'? What's he going to say? Why is he not worried? Why doesn't he quit his job? Not Scott, people in the--let's get away from him personally, but people in general. There's dozens and maybe hundreds, maybe a thousand--I don't know--extraordinarily intelligent people who are trying to build something even more intelligent than they are. Why are they not worried about what you are saying?

Eliezer Yudkowsky: They've all got different reasons. Scott's is that he thinks that intelligence--that he observes intelligence makes humans nicer. And though he wouldn't phrase it exactly this way, this is basically what Scott said on his blog.

To which my response is: Intelligence does have effects on humans, especially humans who start out relatively nice. And, when you're building AIs from scratch, you're just, like, in a different domain with different rules and you're allowed to say that it's hard to build AIs that are nice without implying that making humans smarter--like, humans start out in a certain frame of reference. And when you apply more intelligence to them, they move within that frame of reference.

And if they started out with a small amount of niceness, the intelligence can make them nicer. They can become more empathetic. If they start out with some empathy, they can develop more empathy as they understand other people better. Which is intelligence--to correctly model other people. But saying that this is not--

Russ Roberts: That is even more insane. I haven't read that blog post and we'll put a link up to it. I hope you'll share it with me. But again, not attributing it to Scott since I haven't seen it, or assuming that you've said this fairly incorrectly[?and correctly?], the idea that more intelligent people are nicer is one of the most--it would be very hard to show with the evidence for that. That is an appalling--

Eliezer Yudkowsky: It is not a universal law on humans.

Russ Roberts: No, it's not.

Eliezer Yudkowsky: I think it's true of Scott. I think if you made Scott Aaronson--

Russ Roberts: Very possible--

Eliezer Yudkowsky: smarter, he'd get nicer, and I think he is inappropriately generalizing from that.

Russ Roberts: There is a scene in Schindler's List, the Nazis, I think they're in the Warsaw Ghetto and they're racing--a group of Nazis are racing. I think they're in the SS [Schutzstaffel]. They're racing through a tenement. And, it's falling apart because the ghetto is falling apart. But, one of the SS agents sees a piano. And he can't help himself. He sits down and he plays Bach or something. I think it was Bach. And I always found it interesting that Spielberg put that in or whoever wrote the script. I think it was pretty clear why they put it in. They wanted to show you that having a very high advanced level of civilization does not stop people from treating other people--other human beings--like animals. Or worse than animals in many cases. And exterminating them without conscience.

So, I don't share that view of anyone's--that intelligence makes you a nicer person. I think that's not the case. But perhaps Scott will come to this program and defend that if he indeed holds it.

Eliezer Yudkowsky: I think you are underweighting the evidence that has convinced Scott of the thing that I think is wrong.

I think if you suddenly started augmenting the intelligence of the SS agents from Nazi Germany, then somewhere between 10% and 90% of them would go over to the cause of good. Because there were factual falsehoods that were pillars of the Nazi philosophy and that people would reliably stop believing as they got smarter. That doesn't mean that they would turn good, but some of them would've. Is it 10%? Is it 90%? I don't know.

Russ Roberts: It's not my experience with the human creature.


Russ Roberts: You've written some very interesting things on rationality. You have a beautiful essay we'll link to on 12 rules for rationality ["Twelve Virtues of Rationality".] In my experience, it's a very small portion of the population that behaves that way. And, there's a quote from Nassim Taleb we haven't gotten to yet in this conversation, which is, 'Bigger data, bigger mistakes.' I think there's a belief generally that bigger data, fewer mistakes. But Taleb might be right and it's certainly not the case in my experience that bigger brains, higher IQ [intelligence quotient] means better decisions. This is not my experience.

Eliezer Yudkowsky: Then you're not throwing enough intelligence at the problem.

Russ Roberts: Yeah, I know.

Eliezer Yudkowsky: If you literally--not just decisions where you disagree with the goals, but, like, false models of reality--models of reality so blatantly mistaken--that even you, a human, can tell that they're wrong and in which direction, these people are not smart the way that an efficient--a hypothetical, weak, efficient market is smart. You can tell they're making mistakes and you know in which direction. They're not smart the way that Stockfish 15 is smart in chess. You can play against them and win.

The range of human intelligence is not that wide. It caps out at, like, John von Neumann [?] and that is not wide enough to open up what humans would be epistemic, that these beings would be epistemically or instrumentally efficient relative to you. It is possible for you to know that one of their estimates is directionally mistaken and to know the direction. It is possible for you to know an action that serves their goals better than the action that they generated.

Russ Roberts: Isn't it striking how hard it is to convince them of that even though they're thinking people? History is--I just have a different perception, maybe.

To be continued, Eliezer.

My guest today has been Eliezer Yudkowsky. Eliezer, thanks for being part of EconTalk.

Eliezer Yudkowsky: Thanks for having me.

More EconTalk Episodes