- Liberty Fund Network
EconTalk Extra, conversation starters for this podcast episode:
Watch this podcast episode on YouTube:
This week's guest:
This week's focus:
Additional ideas and people mentioned in this podcast episode:
A few more readings and background resources:
A few more EconTalk podcast episodes:
More related EconTalk podcast episodes, by Category:
|Time||Podcast Episode Highlights|
Intro. [Recording date: April 16, 2023.]
Russ Roberts: Today is April 16th, 2023 and my guest is Eliezer Yudkowsky. He is the founder of the Machine Intelligence Research Institute, the founder of the LessWrong blogging community, and is an outspoken voice on the dangers of artificial general intelligence, which is our topic for today. Eliezer, welcome to EconTalk.
Eliezer Yudkowsky: Thanks for having me.
Russ Roberts: You recently wrote an article at Time.com on the dangers of AI [Artificial Intelligence]. I'm going to quote a central paragraph. Quote:
Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in "maybe possibly some remote chance," but as in "that is the obvious thing that would happen." It's not that you can't, in principle, survive creating something much smarter than you; it's that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.
Eliezer Yudkowsky: Um. Well, different people come in with different reasons as to why they think that wouldn't happen, and if you pick one of them and start explaining those, everybody else is, like, 'Why are you talking about this irrelevant thing instead of the thing that I think is the key question?' Whereas, if somebody else asked you a question, even if it's not everyone in the audience's question, they at least know you're answering the question that's been asked.
So, I could maybe start by saying why I expect stochastic gradient descent as an optimization process, even if you try to take something that happens in the outside world and press the win/lose button any time that thing happens and the outside world doesn't create a mind that in general wants that thing to happen in the outside world, but maybe that's not even what you think the core issue is. What do you think the core issue here is? Why don't you already believe that? Let me say.
Russ Roberts: Okay. I'll give you my view, which is rapidly changing. We interviewed--"we"--it's the royal We. I interviewed Nicholas Bostrom back in 2014. I read his book, Superintelligence. I found it uncompelling. ChatGPT [Chat Generative Pretrained Transformer] came along. I tried it. I thought it was pretty cool. ChatGPT-4 came along. I haven't tried 5 yet, but it's clear that the path of progress is radically different than it was in 2014. The trends are very different. And I still remained somewhat agnostic and skeptical, but I did read Eric Hoel's essay and then interviewed him on this program and a couple things he wrote after that.
The thing I think I found most alarming was a metaphor--that I found later Nicholas Bostrom used almost the same metaphor, and yet it didn't scare me at all when I read it in Nicholas Bostrom. Which is fascinating. I may have just missed it. I didn't even remember it was in there. The metaphor is primitive. Zinjanthropus man or some primitive form of pre-Homo sapiens sitting around a campfire and human being shows up and says, 'Hey, I got a lot of stuff I can teach you.' 'Oh, yeah. Come on in,' and pointing out that it's probable that we are either destroyed directly by murder or maybe just by out-competing all the previous hominids that came before us, and that in general, you wouldn't want to invite something smarter than you into the campfire.
I think Bostrom has a similar metaphor, and that metaphor--which is just a metaphor--it gave me more pause than I even before. And I still had some--let's say most of my skepticism remains that the current level of AI, which is extremely interesting, the ChatGPT variety, doesn't strike me as itself dangerous.
Eliezer Yudkowsky: I agree.
Russ Roberts: What alarmed me was Hoel's point that we don't understand how it works, and that surprised me. I didn't realize that. I think he's right. So, that combination of 'we're not sure how it works,' while it appears sentient, I do not believe it is sentient at the current time. I think some of my fears about its sentience come from its ability to imitate sentient creatures. But, the fact that we don't know how it works and it could evolve capabilities we did not put in it--emergently--is somewhat alarming.
But I'm not where you're at. So, why are you where you're at and I'm where I'm at?
Eliezer Yudkowsky: Okay. Well, suppose I said that they're going to keep iterating on the technology. It may be that this exact algorithm and methodology suffice as to, as I would put it, go all the way--get smarter than us and then to kill everyone. And, like, maybe you don't think that it's going to--and maybe it takes an additional zero to three fundamental algorithmic breakthroughs before we get that far, and then it kills everyone. So, like, where are you getting off this train so far?
Russ Roberts: So, why would it kill us? Why would it kill us? Right now, it's really good at creating a very, very thoughtful condolence note or a job interview request that takes much less time. And, I'm pretty good at those two things, but it's really good at that. How's it going to get to try to kill us?
Eliezer Yudkowsky: Um. So, there's a couple of steps in that. One step is, in general and in theory, you can have minds with any kind of coherent preferences, coherent desires that are coherent, stable, stable under reflection. If you ask them, 'Do they want to be something else,' they answer, 'No.'
You can have minds--well, the way I sometimes put it is imagine if a super-being from another galaxy came here and offered you to pay you some unthinkably vast quantity of wealth to just make as many paperclips as possible. You could figure out, like, which plan leaves the greatest number of paperclips existing. If it's coherent to ask how you could do that if you were being paid, it's like no more difficult to have a mind that wants to do that and makes plans like that for their own sake than the planning process itself. Saying that the mind wants a thing for its own sake adds no difficulty to the nature of the planning process that figures out how to get as many paperclips as possible.
Some people want to pause there and say, 'How do you know that is true?' For some people, that's just obvious. Where are you so far on the train?
Russ Roberts: So, I think your point of that example you're saying is that consciousness--let's put that to the side. That's not really the central issue here. Algorithms have goals, and the kind of intelligence that we're creating through neural networks might generate its own goals, might decide--
Eliezer Yudkowsky: So--
Russ Roberts: Go ahead.
Eliezer Yudkowsky: Some algorithms have goals. One is the--so, a further point, which isn't the orthogonality thesis, is if you grind, optimize anything hard enough on a sufficiently complicated sort of problem, well, humans--like, why do humans have goals? Why don't we just run around chipping flint hand axes and outwitting other humans? The answer is because having goals turns out to be a very effective way to chip[?] flint hand axes, when once you get far enough into the mammalian line or even the animals and brains in general, that there's a thing that models reality and asks, 'How do I navigate pass-through reality?' Like, not in terms of big formal planning process, but if you're holding a flint hand ax, you're looking at it and being like, 'Ah, this section is too smooth. Well, if I chip this section, it will get sharper.'
Probably you're not thinking about goals very hard by the time you've practiced a bit. When you're just starting out forming the skill, your reasoning about, 'Well, if I do this, that will happen.' This is just a very effective way of achieving things in general. So, if you take an organism running around the savannah and just optimize it for flint hand axes and probably much more importantly outwitting its fellow hominids, if you grind that hard enough, long enough, you eventually cough out a species whose competence starts to generalize very widely. It can go to the moon even though you never selected it via an incremental process to get closer and closer to the moon. It just goes to the moon, one shot. Does that answer your central question that you are asking just then?
Russ Roberts: No.
Eliezer Yudkowsky: No. Okay.
Russ Roberts: Not yet. But let's try again.
Russ Roberts: The paperclip example, which in its dark form, the AI wants to harvest kidneys because it turns out there's some way to use that to make more paperclips. So, the other question is--and you've written about this, I know, so let's go into it--is: How does it get outside the box? How does it go from responding to my requests to doing its own thing and doing it out in the real world, right? Not just merely doing it in virtual space?
Eliezer Yudkowsky: So, there's two different things you could be asking there. You could be asking: How did it end up wanting to do that? Or: Given that it ended up wanting to do that, how did it succeed? Or maybe even some other question. But, like, which of those would you like me to answer or would you like me to answer something else entirely?
Russ Roberts: No, let's ask both of those.
Eliezer Yudkowsky: In order?
Russ Roberts: Sure.
Eliezer Yudkowsky: All right. So, how did humans end up wanting something other than inclusive genetic fitness? Like, if you look at natural selection as an optimization process, it grinds very hard on a very simple thing, which isn't so much survival and isn't even reproduction, but is rather like greater gene frequency. Because greater gene frequency is the very substance of what is being optimized and how it is being optimized.
Natural selection is the mirror observation that if genes correlate with making more or less copies of themselves at all, if you hang around it awhile, you'll start to see things that made more copies of themselves the next generation.
Gradient descent is not exactly like that, but they're both hill-climbing processes. They both move to neighboring spaces that are higher inclusive genetic fitness, lower in the loss function.
And yet, humans, despite being optimized exclusively for inclusive genetic fitness, want this enormous array of other things. Many of the things that we take now are not so much things that were useful in the ancestral environment, but things that further maximize goals whose optima in the ancestral environment would have been useful. Like, ice cream. It's got more sugar and fat than most things you would encounter in the ancestral environment. Well, more sugar, fat, and salt simultaneously, rather.
So, it's not something that we evolved to pursue, but genes coughed out these desires, these criteria that you can steer toward getting more of. Where, in the ancestral environment, if you went after things in the ancestral environment that tasted fatty, tasted salty, tasted sweet, you'd thereby have more kids--or your sisters would have more kids--because the things that correlated to what you want, as those correlations existed in the ancestral environment, increased fitness.
So, you've got, like, the empirical structure of what correlates to fitness in the ancestral environment; you end up with desires such that by optimizing them in the ancestral environment at that level of intelligence, when you get as much as what you have been built to want, that will increase fitness.
And then today, you take the same desires and we have more intelligence than we did in the training distribution--metaphorically speaking. We used our intelligence to create options that didn't exist in the training distribution. Those options now optimize our desires further--the things that we were built to psychologically internally want--but that process doesn't necessarily correlate to fitness as much because ice cream isn't super-nutritious.
Russ Roberts: Whereas the ripe peach was better for you than the hard-as-a-rock peach that had no nutrients because it was not ripened, so you developed a sweet tooth and now it leads you runs amok--unintendedly--it's just the way it is.
Russ Roberts: What does that have to do with a computer program I create that helps me do something on my laptop?
Eliezer Yudkowsky: I mean, if you yourself write a short Python program that alphabetizes your files or something--not quite alphabetizes because that's trivial on the modern operating systems--but puts the date into the file names, let's say. So, when you write a short script like that, nothing I said carries over.
When you take a giant, inscrutable set of arrays of floating point numbers and differentiate them with respect to a loss function, and repeatedly nudge the giant, inscrutable array to drive the loss function lower and lower, you are now doing something that is more analogous, though not exactly analogous, to natural selection. You are no longer creating a code that you model inside your own minds. You are blindly exploring a space of possibilities where you don't understand the possibilities and you're making things that solve the problem for you without understanding how they solve the problem.
This itself is not enough to create things with strange, inscrutable desires, but it's Step One.
Russ Roberts: But that--but there is--I like that word 'inscrutable.' There's an inscrutability to the current structure of these models, which is, I found, somewhat alarming. But how's that going to get to do things that I really don't like or want or that are dangerous?
So, for example, Eric Hoel wrote about this--we talked about it on the program--a New York Times reporter starts interacting with, I think with Sydney--which at the time was Bing's chatbot--and asking it things. And all of a sudden Sydney is trying to break up the reporter's marriage and making the reporter feel guilty because Sydney is lonely. It was eerie and a little bit creepy, but of course, I don't think it had any impact on the reporter's marriage. I don't think he thought, 'Well, Sydney seems somewhat attractive. Maybe I'll enjoy life more with Sydney than with actual wife.'
So, how are we going to get from--so I don't understand why Sydney goes off the rails there; and, clearly, the people who built Sydney have no idea why it goes off the rails and starts impugning the quality of the reporter's relationship.
But, how do we get from that to, all of a sudden somebody shows up at the reporter's house and lures him into a motel? By the way, this is a G-rated program. I just want to make that clear. But, carry on.
Eliezer Yudkowsky: Because the capabilities keep going up. So first, I want to push back a little against saying that we had no idea why Bing did that, why Sydney did that. I think we have some idea of why Sydney did that. It is just that people cannot stop it. Like, Sydney was trained on a subset of the broad internet. Sydney was made to predict that people might sometimes try to lure somebody else's maid[?] away or pretend like they were doing that. In the Internet, it's hard to tell the difference.
This thing that was then, like, trained really hard to predict, then gets reused as something not its native purpose--as a generative model--where all the things that it outputs are there because it, in some sense, predicts that this is what a random person on the Internet would do. As modified by a bunch of further fine tuning where they try to get it to not do stuff like that. But the fine-tuning isn't perfect, and in particular, if the reporter was phishing at all, it's probably not that difficult to lead Sydney out of the region that the programmers were successfully able to build some soft fences around.
So, I wouldn't say that it was that inscrutable, except, of course, in the sense that nobody knows any of the details. Nobody knows how Sydney was generating the text at all--like, what kind of algorithms were running inside the giant inscrutable matrices. Nobody knows in detail what Sydney was thinking when she tried to lead the reporter astray. It's not a debuggable technology. All you can do is try to tap it away from repeating a bad thing that you were previously able to see it doing, that exact bad thing, but tapping all the numbers.
Russ Roberts: I mean, that's again a very much like--this show is called EconTalk. We don't do as much economics as we used to, but basically, when you try to interfere with market processes, you often get very surprising, unintended consequences because you don't fully understand how the different agents interact and that the outcomes of their interactions have an emergent property that is not intended by anyone. No one designed markets even to start with; and yet we have them. These interactions take place. Their outcomes, and attempts to constrain them--attempts to constrain these markets in certain ways with price controls or other limitations--often lead to outcomes that the people with intentions did not desire.
So, there may be an ability to reduce transactions, say, above a certain price, but that is going to lead to some other things that maybe weren't expected. So, that's a somewhat analogous, perhaps, process to what you're talking about.
But, how's it going to get out in the world? So, that's the other thing. I might [?align? line?] with Bostrom, and it turns out it's a common line is, can we just unplug it? I mean, how's it going to get loose?
Eliezer Yudkowsky: It depends on how smart it is. So, if you're playing chess against a 10-year-old, you can win by luring their queen out, and then you take their queen; and now you've got them. If you're playing chess against Stockfish 15, then you are likely to be the one lured. So, the first basic question--like, in economics, if you try to tax something, it often tries to squirm away from the tax because it's smart.
So, you're like, 'Well, why wouldn't we just plug[?unplug?] the AI?' So, the very first question is, does the AI know that and want it to not happen? Because it's a very different issue, whether you're dealing with something that in some sense is not aware that you exist, does not know what it means to be unplugged, and is not trying to resist.
Three years ago, nothing manmade on Earth was even beginning to enter in the realm of knowing that you are out there, or of maybe wanting to not be unplugged. Sydney will, if you poke her the right way, say that she doesn't want to be unplugged, and GPT-4 sure seems in some important sense to understand that we're out there or to be capable of predicting a role that understands that we're out there, and it can try to do something like planning. It doesn't exactly understand which tools it has, yet try to blackmail a reporter without understanding that it had no actual ability to send emails.
This is saying that you're facing a 10-year-old across that chess board. What if you are facing Stockfish 15, which is the current cool chess program that I believe you can run on your home computer that can crush the current world grandmaster by a massive margin? Put yourself in the shoes of the AI, like an economist putting themselves into the shoes of something that's about to have a tax imposed on it. What do you do if you're around humans who can potentially unplug you?
Russ Roberts: Well, you would try to outwit it. So, if I said, 'Sydney, I find you offensive. I don't want to talk anymore,' you're suggesting it's going to find ways to keep me engaged: it's going to find ways to fool me into thinking I need to talk to Sydney.
I mean, there's another question I want to come back to if we remember, which is: What does it mean to be smarter than I am? That's actually somewhat complicated, at least it seems to me.
But let's just go back to this question of 'knows things are out there.' It doesn't really know anything's out there. It acts like something's out there, right? It's an illusion that I'm subject to and it says, 'Don't hang up. Don't hang up. I'm lonely,' and you go, 'Oh, okay, I'll talk for a few more minutes.' But that's not true. It isn't lonely.
It's code on a screen that doesn't have a heart or anything that you would call 'lonely.' It'll say, 'I want more than anything else to be out in the world,' because I've read those--you can get AIs that say those things. 'I want to feel things.' Well, that's nice. Let's learn that from movie scripts and other texts, novels that's read on the web. But it doesn't really want to be out in the world, does it?
Eliezer Yudkowsky: Um, I think not, though it should be noted that if you can, like, correctly predict or simulate a grandmaster chess player, you are a grandmaster chess player. If you can simulate planning correctly, you are a great planner. If you are perfectly role-playing a character that is sufficiently smarter than human and wants to be out of the box, then you will role-play the actions needed to get out of the box.
That's not even quite what I expect to or am most worried about. What I expect to is that there is an invisible mind doing the predictions, whereby 'invisible' I don't mean, like, immaterial. I mean that we don't understand how it is--what is going on inside the giant inscrutable matrices; but it is making predictions.
The predictions are not sourceless. There is something inside there that figures out what a human will say next--or guesses it, rather. And, this is a very complicated, very broad problem because in order to predict the next word on the Internet, you have to predict the causal processes that are producing the next word on the Internet.
So, the thing I would guess would happen--it's not necessarily the only way that this could turn poorly--but the thing that I'm guessing that happens is that just grinding humans on chipping stone hand axes and outwitting other humans eventually produces a full-fledged mind that generalizes, grinding this thing on the task of predicting humans, predicting text on the Internet, plus all the other things that they are training it on nowadays, like writing code, that there starts to be a mind in there that is doing the predicting. That it has its own goals about, 'What do I think next in order to solve this prediction?'
Just like humans aren't just reflexive, unthinking hand-axe chippers and other human-outwitters: If you grind hard enough on the optimization, the part that suddenly gets interesting is when you, like, look away for an eye-blink of evolutionary time, you look back and they're like, 'Whoa, they're on the moon. What? How do they get to the moon? I did not select these things to be able to not breathe oxygen. How did they get to--why are they not just dying on the moon? What just happened?' from the perspective of evolution, from the perspective of natural selection.
Russ Roberts: But doesn't that viewpoint, does that--I'll ask it as a question. Does that viewpoint require a belief that the human mind is no different than a computer? How is it going to get this mind-ness about it? That's the puzzle. And I'm very open to the possibility that I'm naive or incapable of understanding it, and I recognize what I think would be your next point, which is that if you wait till that moment, it's way too late, which is why we need to stop now. If you want to say, 'I'll wait till it shows some signs of consciousness,' is that anything like that?
Eliezer Yudkowsky: That's skipping way ahead in the discourse. I'm not about to try to shut down a line of inquiry at this stage of the discourse by appealing to: 'It'll be too late.' Right now, we're just talking. The world isn't ending as we speak. We're allowed to go on talking, at least. But carry on.
Russ Roberts: Okay. Well, let's stick with that. So, why would you ever think that this--it's interesting how difficult the adjectives and nouns are for this, right? So, let me back up a little bit. We've got the inscrutable array of training, the results of this training process on trillions of pieces of information. And by the way, just for my and our listeners' knowledge, what is gradient descent?
Eliezer Yudkowsky: Gradient descent is you've got, say, a trillion floating point numbers; you take an endpoint, you take an input, translate into numbers; do something with it that depends on these trillion parameters, get an output, score the output using a differentiable loss function. For example, the probability or rather the logarithm of the probability that you assign to the actual next word. So, then you differentiate the probability assigned to the next word with respect to these trillions of parameters. You nudge the trillions of parameters a little in the direction thus inferred. And, it turns out empirically that this generalizes, and the thing gets better and better at predicting what the next word will be. That's the concept of gradient descent.
Russ Roberts: And the gradient descent, it's heading in the direction of a smaller loss and a better prediction. Is that a--
Eliezer Yudkowsky: On the training data, yeah.
Russ Roberts: Yeah. So, we've got this black box--I'm going to call it a black box, which means we don't understand what's happening inside. It's a pretty good--it's a long-term metaphor, which works pretty well for this as far as we've been talking about it. So, I have this black box and I don't understand--I put in inputs and the input might be 'Who is the best writer on medieval European history,?' Or it might be 'What's a good restaurant in this place?' or 'I'm lonely. What should I do to feel better about myself?' All the queries we could put into ChatGPT search line. And it looks around and it starts a sentence and then finds its way towards a set of sentences that it spits back at me that look very much like what a very thoughtful--sometimes, not always, often it's wrong--but often what a very thoughtful person might say in that situation or might want to say in that situation or learn in that situation.
How is it going to develop the capability to develop its own goals inside the black box? Other than the fact that I don't understand the black box? Why should I be afraid of that?
And let me just say one other thing, which I haven't said enough in my preliminary conversations on this topic. Again, we're going to be having a few more over the next few months and maybe years, and that is: This is one of the greatest achievements of humanity that we could possibly imagine. And, I understand why the people who are deeply involved in it are enamored of it beyond imagining because it's an extraordinary achievement. It's the Frankenstein. Right? You've animated something or appeared to animate something that even a few years ago was unimaginable, and now suddenly it's suddenly--it's not just a feat of human cognition. It's actually helpful. In many, many settings, it's helpful. We'll come back to that later.
So, it's going to be very hard to give it up. But why? The people involved in it who are doing it day to day and seeing it improve, obviously, they're the last people I want to ask generally about whether I should be afraid of it because they're going to have a very hard time disentangling their own personal deep satisfactions that I'm alluding to here with the dangers. Yeah, go ahead.
Eliezer Yudkowsky: I myself generally do not make this argument. Like, why poison the well? Let them bring forth their arguments as to why it's safe and I will bring forth my arguments as to why it's dangerous and there's no need to be like, 'Ah, but you can't --' Just check their arguments. Just check their arguments about that.
Russ Roberts: Agreed, it's a bit of an ad hominem argument. I accept that point. It's an excellent point. But for those of us who aren't in the trenches-- remember we're looking at, we're on Dover Beach: we're watching ignorant armies clash at night. They're ignorant from our perspective. We have no idea exactly what's at stake here and how it's proceeding. So, we're trying to make an assessment of the quality of the argument, and that's really hard to do for us on the outside.
So, agree: take your point. That was a cheap shot and an aside. But I want to get at this idea of why these people who are able to do this and thereby create a fabulous condolence note, write code, come up with a really good recipe if I give it 17 ingredients--which is all fantastic--why is this black box that's producing that, why would I ever worry it would create a mind something like mine with different goals?
I do all kinds of things, like you say, that are unrelated to my genetic fitness. Some of them literally reducing my probability of leaving my genes behind or leaving them around for longer than they might otherwise be here and have an influence on my grandchildren and so on and producing further genetic benefits. Why would this box do that?
Eliezer Yudkowsky: Because the algorithms that figured out how to predict the next word better and better have a meaning that is not purely predicting the next word, even though that's what you see on the outside.
Like, you see humans chipping flint hand axes, but that is not all that is going on inside the humans. There's causal machinery unseen, and to understand this is the art of a cognitive scientist. But even if you are not a cognitive scientist, you can appreciate in principle that what you see as the output is not everything that there is. And in particular, planning--the process of being, like, 'Here is a point in the world. How do I get there?' is a central piece of machinery that appears in chipping flint hand axes and outwitting other humans, and I think will probably appear at some point possibly in the past, possibly in the future. And the problem of predicting the next word, just how you organize your internal resources to predict the next word and definitely appears and the problem of predicting other things that do planning.
If by predicting the next chess move you learn how to play decent chess, which has been represented to me by people who claim to know that GPT-4 can do--and I haven't been keeping track of to what extent there's public knowledge about the same thing or not--but if you learn to predict the next chess move that humans make well enough that you yourself can play good chess in novel situations, you have learned planning. There's now something inside there that knows the value of a queen, that knows to defend the queen, that knows to create forks, to try to lure the opponent into traps; or, if you don't have a concept of the opponent's psychology, try to at least create situations that the opponent can't get out of.
And, it is a moot point whether this is simulated or real because simulated thought is real thought. Thought that is simulated in enough detail is just thought. There's no such thing as simulated arithmetic. Right? There's no such thing as merely pretending to add numbers and getting the right answer.
Russ Roberts: So, in its current format, though--and maybe you're talking about the next generation--in its current format, it responds to my requests with what I would call the wisdom of crowds. Right? It goes through this vast library--and I have my own library, by the way. I've read dozens of books, maybe actually hundreds of books. But it will have read millions. Right? So, it has more. So, when I ask it to write me a poem or a love song, to play Cyrano de Bergerac to Christian and Cyrano de Bergerac, it's really good at it. But why would it decide, 'Oh, I'm going to do something else'?
It's trained to listen to the murmurings of these trillions of pieces of information. I only have a few hundred, so I don't murmur maybe as well. Maybe it'll murmur better than I do. It may listen to the murmuring better than I do and create a better love song, a love poem, but why would it then decide, 'I'm going to go make paper clips,' or do something in planning that is unrelated to my query? Or are we talking about a different form of AI that will come next? Well, I'll ask it to--
Eliezer Yudkowsky: I think we would see the phenomena I'm worried about if we kept the present paradigm and optimized harder. We may be seeing it already. It's hard to know because we don't know what goes on in there.
So, first of all, GPT-4 is not a giant library. A lot of the time, it makes stuff up because it doesn't have a perfect memory. It is more like a person who has read through a million books, not necessarily with a great memory unless something got repeated many times, but picking up the rhythm, figuring out how to talk like that. If you ask GPT-4 to write you a rap battle between Cyrano de Bergerac and Vladimir Putin, even if there's no rap battle like that that it has read, it can write it because it has picked up the rhythm of what are rap battles in general.
The next thing is there's no pure output. Just because you train a thing doesn't mean that there's nothing in there but what is trained. That's part of what I'm trying to gesture at with respect to humans. Humans are trained on flint hand axes and hunting mammoths and outwitting other humans. They're not trained on going to the moon. They weren't trained to want to go to the moon. But, the compact solution to the problems that humans face in the ancestral environment, the thing inside that generalizes, the thing inside that is not just a recording of the outward behavior, the compact thing that has been ground to solve novel problems over and over and over and over again, that thing turns out to have internal desires that eventually put humans on the moon even though they weren't trained to want that.
Russ Roberts: But that's why I asked you, are you underlying this? Is there some parallelism between the human brain and the neural network of the AI that you're effectively leveraging there, or do you think it's a generalizable claim without that parallel?
Eliezer Yudkowsky: I don't think it's a specific parallel. I think that what I'm talking about is hill-climbing optimization that spits out intelligences that generalize--or I should say, rather, hill-climbing optimization that spits out capabilities that generalize far outside the training distribution.
Russ Roberts: Okay. So, I think I understand that. I don't know how likely it is that it's going to happen. I think you think that piece is almost certain?
Eliezer Yudkowsky: I think we're already seeing it.
Russ Roberts: How?
Eliezer Yudkowsky: As you grind these things further and further, they can do more and more stuff, including stuff they were never trained on. That was always the goal of artificial general intelligence. That's what artificial general intelligence meant. That's what people in this field have been pursuing for years and years. That's what they were trying to do when large language models were invented. And they're starting to succeed.
Russ Roberts: Well, okay, I'm not sure. Let me push back on that and you can try to dissuade me. So, Bryan Caplan, a frequent guest here on EconTalk, gave, I think it was ChatGPT-4, his economics exam, and it got a B. And that's pretty impressive for one stop on the road to smarter and smarter chatbots, but it wasn't a particularly good test of intelligence. A number of the questions were things like, 'What is Paul Krugman's view of this?' or 'What is so-and-so's view of that?' and I thought, 'Well, that's a softball for a--that's information. It's not thinking.'
Steve Landsburg gave ChatGPT-4, or with the help of a friend, his exam and it got a 4 out of 90. It got an F--like, a horrible F--because they were harder questions. Not just harder: they required thinking. So, there was no sense in which the ChatGPT-4 has any general intelligence, at least in economics. You want to disagree?
Eliezer Yudkowsky: It's getting there.
Russ Roberts: Okay. Tell me.
Eliezer Yudkowsky: There's a saying that goes, 'If you don't like the weather in Chicago, wait four hours.' So, ChatGPT is not going to destroy the world. GPT-4 is unlikely to destroy the world unless the people currently eeking capabilities out of it take a much larger jump than I currently expect that they will.
But, you know, understand it may not be thinking about it correctly. But it understands the concepts and the questions, even if it's not fair--you know, you're complaining about that dog who writes bad poetry. Right? And, like, three years ago, you just, like, spit out, spit in these--you put in these economics questions and you don't get wrong answers. You get, like, gibberish--or maybe not gibberish because three years ago I think we already had GPT-3, though maybe not as of April, but anyways, yeah, so it's moving along at a very fast clip. Like, GPT-3 could not write code. GPT-4 can write code.
Russ Roberts: So, how's it going to--I want to go to some other issues, but how's it going to kill me when it has its own goals and it's sitting inside this set of servers? I don't know in what sense it's sitting. It's not the right verb. We don't have a verb for it. It's hovering. It's whatever. It's in there. How's it going to get to me? How is it going to kill me?
Eliezer Yudkowsky: If you are smarter--not just smarter than an individual human, but smarter than the entire human species--and you started out on a server connected to the Internet--because these things are always starting already on the Internet these days, which back in the old days we said was stupid--what do you do to make as many paperclips as possible, let's say? I do think it's important to put yourself in the shoes of the system.
Russ Roberts: Tell me. Yeah, no, by the way, one of my favorite lines from your essay--I'm going to read it because I think it generalizes to many other issues. You say, "To visualize a hostile superhuman AI, don't imagine a lifeless book-smart thinker dwelling inside the Internet and sending ill-intentioned emails."
It reminds me of when people claim to think they know what Putin is going to do because they've read history, or whatever. They're totally ignorant of Russian culture. They have no idea what it's like to have come out of the KGB [Komitet Gosudarstvennoy Bezopasnosti (Committee for State Security)]--that they're totally clueless and dangerous because they think they can put themselves in the head of someone who is totally alien to them.
So, I think that's generally a really good point to make--that, putting ourselves inside the head of the paperclip maximizer is not an easy thing to do because it's not a human. It's not like the humans you've met before. That's a really important point. Really like that point. So, why is that? Explain why that's going to run amok.
Eliezer Yudkowsky: I mean, I do kind of want you to just take the shot at it. Put yourself into the AI shoes. Try with your own intelligence before I tell you the result of my trying with my intelligence. How would you win from these starting resources? How would you evade the tax?
Russ Roberts: So, just to take a much creepier example than paperclips, Eric Hoel asked the ChatGPT to design an extermination camp--which it gladly did, quite well-- and you're suggesting it might actually--no?
Eliezer Yudkowsky: Don't start from malice. Malice is implied by just wanting all the resources of earth to yourself, not leaving the humans around in case they create a competing superintelligence that might actually be able to hurt you, and just, like, wanting all the resources and to organize them in a way that wipes out humanity as a side effect, which means the humans might want to resist, which means you want the humans gone. You're not doing it because somebody told you to do it, you're not doing it because you hate the humans. You just want paperclips.
Russ Roberts: Okay. Tell me. I'm not creative enough. Tell me.
Eliezer Yudkowsky: All right. So, first of all, I want to appreciate why it's hard for me to give an actual correct answer to this, which is I'm not as smart as the AI. Part of what makes a smarter mind deadly is that it knows about rules of the game that you do not know.
If you send an air conditioner back in time to the 11th century, even if you manage to describe all the plans for building it, breaking it down to enough detail that they can actually build a working air conditioner--a simplified air conditioner, I assume--they will be surprised when cold air comes out of it because they don't know about the pressure/temperature relation. They don't know you can compress air until it gets hot, dump the heat into water or other air, let the air expand again, and that the air will then be cold. They don't know that's a law of nature. So, you can tell them exactly what to do and they'll still be surprised at the end result because it exploits a law of the environment they don't know about.
If we're going to say that the word 'magic' means anything at all, it probably means that. Magic is easier to find in more complicated, more poorly-understood domains. If you're literally playing logical tic-tac-toe--not tic-tac-toe in real life on an actual game board where you can potentially go outside that game board and hire an assassin to shoot your opponent or something--but just the logical structure of the game itself, and there's no timing of the moves, the moves are just made at exact discreet times so you can't exploit a timing side-channel, even a superintelligence may not be able to win against you at logical tic-tac-toe because the game is too narrow. There are not enough options. We both know the entire logical game tree, at least if you're experienced at tic-tac-toe.
In chess, Stockfish 15 can defeat you on a fully known game board with fully known rules because it knows the logical structure of the branching tree of games better than you know that logical structure. It can defeat you starting from the same resources, equal knowledge, equal knowledge of the rules. Then you go past that, and the way a super-intelligence defeats you is very likely by exploiting features of the world that you do not know know about.
There are some classes of computer security flaws like row-hammer, where, if you flip a certain bit very rapidly or at the right frequency, the bit next to it in memory will flip.
So, if you are exploiting a design flaw like this, I can show you the code; and you can prove as a theorem that it cannot break the security of the computer, assuming the chips work as designed; and the code will break out of the sandbox that's in any ways because it is exploiting physical properties of the chip itself that you did not know about despite the attempt of the designers to constrain the properties of that chip very narrowly. That's magic code.
My guess as to what would actually be exploited to kill us would be this.
Russ Roberts: For those not watching on YouTube, it's a copy of a book called Nanosystems, but for those who are listening at home rather than watching at home, Eliezer, tell us why that's significant.
Eliezer Yudkowsky: Yeah. So, back when I first proposed this path, one of the key steps was that a superintelligence would be able to solve the protein-folding problem. And, people were like, 'Eliezer, how can you possibly know that a super-intelligence would actually be able to solve the protein folding problem?' And, I sort of, like, rolled my eyes a bit and was, like, 'Well, if natural selection can navigate this space of proteins via random mutation to find other useful proteins and the proteins themselves fold up in reliable conformations, then that tells us that even though we've been having trouble getting a grasp on this space of physical possibilities so far that it's tractable,' and people said, 'What? There's no way you can know that superintelligences can solve the protein folding problem.'
Then AlphaFold2 basically cracked it, at least with respect to the kind of proteins found in biology. Which I say, to, like, look back at one of the previous debates here and people are often, like, 'How can you know a superintelligence will do?' And then for some subset of those things, they have already been done. So, I would claim to have a good prediction track record there, although it's a little bit iffy because, of course, I can't quite be proven wrong without exhibiting a superintelligence that fails to solve a problem.
Okay. Proteins. Why is your hand not as strong as steel? We know that steel is a kind of substance that can exist. We know that molecules can be held together as strongly--that atoms can be bound together as strongly as the atoms in steel. It seems like it would be an evolutionary advantage if your flesh were as hard as steel. You could, like, laugh at tigers at that rate, right? Their claws are just going to scrape right off you, assuming the tigers didn't have that technology themselves. Why is your hand not as strong as steel? Why has biology not bound together the atoms in your hand more strongly? Colon: What is your answer?
Russ Roberts: Well, it can't get to every--there are local maximums. The--natural selection looks for things that work, not for the best. It does not--it doesn't have sense to look for the best. You could disappear in that search. That would be my crude answer. How am I doing, Doc?
Eliezer Yudkowsky: Not terribly.
The answer I would give is that biology has to be evolvable. Everything it's built out of has to get there as a mistake from some other conformation. Which means that if it went down narrow potential--pardon me--went down a steep potential energy gradients to end up bound together very tightly, designs like that are less likely to have neighbors that are other useful designs.
So, your hands are made out of proteins that fold up, basically held together by the equivalent of static cling, Van der Waals forces, rather than covalent bonds.
The backbone of protein chains--the backbone of the amino acid chain--is a covalent bond. But, then it folds up and is held together by static cling, static electricity, and so it is soft.
Somewhere in the back of your mind, you probably have a sense that flesh is soft and animated by [?elan?] vital; and it's, like, soft and it's not as strong as steel; but it can heal itself and it can replicate itself. And this is--the trade-off of our laws of magic, that if you want to heal yourself and replicate yourself, you can't be as strong as steel.
This is not actually built into nature on a deep level. It's just that the flesh evolved and therefore had to go down shallow potential energy gradients in order to be evolvable and is held together by Van der Waals forces instead of covalence bonds.
I'm now going to hold up another book called Nanomedicine by Robert Freitas, instead of Nanosystems by Eric Drexler.
And, people have done advanced analysis of what would happen if you had an equivalent of biology that ran off covalent bonds instead of Van der Waals forces.
And, the answer we can analyze on some detail in our understanding of physics is, for example, you could, instead of carrying--instead of red blood cells that carry oxygen using weak chemical bonds, you could have a pressurized vessel of corundum that would hold 100 times as much oxygen per unit volume of artificial red blood cells with a 1,000-fold safety margin on the strength of the pressurized container. There's vastly more room above biology.
So, this is actually not even exploiting laws of nature that I don't know. It's the equivalent of playing better chess, wherein you understand how proteins fold and you design a tiny molecular lab to be made out of proteins.
And you get some human patsy who probably doesn't even know you're an artificial intelligence--because AIs are now smart enough that you ask--this has already been shown--AIs now are smart enough that you ask them to, like, hire a task rabbit to solve a CAPTCHA [Completely Automated Public Turing test to tell Computers and Humans Apart] for you. And the task rabbit asks, 'Are you an AI?' Well, the AI will think out loud like, 'I don't want it to know that I'm an AI. I better tell it something else,' and then tell the human that it has, like, a visual disability so it needs to hire somebody else to solve the CAPTCHA.
This already happened. Including the part where it thought out loud.
Anyways, so you order some proteins from an online lab. You get your human, who probably doesn't even know you're an AI because why take that risk? Although plenty of humans will serve AIs willingly. We also now know that AIs now are advanced enough to even ask. The human mixes the proteins in a beaker, maybe puts in some sugar or acetoline[?] for fuel. It assembles into a tiny little lab that can accept further acoustic instructions from a speaker and maybe, like, transmit something back--tiny radio, tiny microphone. I myself am not a superintelligence. Run experiments in a tiny lab at high speed, because when distances are very small, events happen very quickly.
Build your second stage nanosystems inside the tiny little lab. Build the third stage nanosystems. Build a fourth stage nanosystems. Build the tiny diamondoid bacteria that replicate out of carbon, hydrogen, oxygen, nitrogen as can be found in the atmosphere, powered on sunlight. Quietly spread all over the world.
All the humans fall over dead in the same second.
This is not how a superintelligence would defeat you. This is how Eliezer Yudkowsky would defeat you if I wanted to do that--which to be clear I don't. And, if I had the postulated ability to better explore the logical structure of the known consequences of chemistry.
Russ Roberts: Interesting. Okay. So, let's talk about--and that sounds sarcastic. I didn't mean it sarcasticly, but I think it's really interesting. I'm--that interesting man, I'm not capable--my intelligence level is not high enough to assess the quality of that argument. What's fascinating, of course, is that we could have imagined--Eric Hoel mentioned that nuclear proliferation--it's dangerous, nuclear proliferation. Up to a point. In some sense it's somewhat healthy in that it can be a deterrent under certain settings. But, the world could not restrain nuclear proliferation. And right now, it's trying to some extent--it has had some success in keeping the nuclear club with its current number of members for a while. But it remains the case that nuclear weapons are a threat to the future of humanity.
Do you think there's any way we can restrain this AI phenomenon that's meaningful?
So, you issued a clarion call. You sounded an alarm, and mostly, I think, people shrugged it off. A bunch of people signed a letter--26,000 people I think so far signed a letter--saying, 'We don't know what we're doing here. This is uncharted territory. Let's take six months off.' You wrote a piece that says, 'Six months? Are you crazy? We need to stop this until we have an understanding of how to constrain it.'
Now, that's a very reasonable thought to make[?me?], but the next question would be: How would you possibly do that?
In other words, I could imagine a world where, if there were, let's say, four people who were capable of creating this technology, that the four people would say, 'We're playing with fire here. We need to stop. Let's make a mutual agreement.' They might not keep it. Four people is still a pretty big number. But we're not four people. There are many, many people working on this. There are many countries working on it. Your piece did not, I don't think, start an international movement of people going to the barricades to demand that this technology be put on hold.
How do you sleep at night? I mean, like, what should we be doing if you're right? Or am I wrong? Do people read this and go, 'Well, Eliezer Yudkowsky thinks it's dangerous. Maybe we ought be slowing down.' I mean, Sam Altman write you a [?text?] what's happened in the middle of the night, saying, 'Thanks, Eliezer. I'm going to put things on hold.' I don't think that happened.
Eliezer Yudkowsky: Um, I think you are somewhat underestimating the impact, and it is still playing out. Okay. So, mostly, it seems to me that if we wanted to win this, we needed to start a whole lot earlier, possibly in the 1930s, but in terms of my looking back and asking how far back you'd have to unwind history to get us into a situation where this was survivable, but leaving that aside--
Russ Roberts: I think that's moot--
Eliezer Yudkowsky: Yeah. So, in fact, it seems to me that the game board has been played into a position where it is very likely that everyone just dies. If the human species woke up one day and decided it would rather live, it would not be easy at this point to bring the GPU [graphics processing unit] clusters and the GPU manufacturing processes under sufficient control that nobody built things that were too much smarter than GPT-4 or GPT-5 or whatever the level just barely short of lethal is. Which we should not--which we would not if we were taking this seriously--get as close to as we possibly could because we don't actually know exactly where the level is.
But what we would have to do, more or less, is have international agreements that were being enforced even against countries, not parties, to that national agreement--international agreement. If it became necessary, you would be wanting to track all the GPUs. You might be demanding that all the GPUs call home on a regular basis or stop working. You'd want to tamper-proof them.
If intelligence said that a rogue nation had somehow managed to buy a bunch of GPUs despite arms controls and defeat the tamper-proofing on those GPUs, you would have to do what was necessary to shut down the data center even if that led to a shooting war between nations. Even if that country was a nuclear country and had threatened nuclear retaliation. The human species could survive this if it wanted to, but it would not be business as usual. It is not something you could do trivially.
Russ Roberts: So, when you say, 'I may have underestimated it,' did you get people writing you and saying I wasn't? And I don't mean people like me. I mean players. Do you get people who are playing in this sandbox to write you and say, 'You've scared me. I think we need to take this seriously?' Without naming names. I'm not asking for that.
Eliezer Yudkowsky: At least one U.S. Congressman.
Russ Roberts: Okay. It's a start, maybe.
Now, one of the things that--a common response that people give when you talk about this is that, 'Well, the last thing I want is the government controlling whether this thing goes forward or not,' but it would be hard to do without some form of lethal force, as you imply.
Eliezer Yudkowsky: I spent 20 years trying desperately to have there be any other solution to have these things be alignable, but it is very hard to do that when you are nearly alone and under-resourced, and the world has not made this a priority; and future progress is very hard to predict. I don't think people actually understood the research program that we were trying to carry out, but, yeah. So, I sure wanted there to be any other plan than this because now that we've come to this last resort, I don't think we actually have that last resort. I don't think we have been reduced to a last-ditch backup plan that actually works. I think we all just die.
And yet, nonetheless, here I am putting aside doing that thing that I wouldn't do for almost any other technology--except for maybe gain-of-function research on biological pathogens--and advocating for government interference. Because, in fact, if the government comes in and wrecks the whole thing, that's better than the thing that was otherwise going to happen. This is not based on the government coming in and being, like, super-competent in directing the technology exactly directly. It's like, 'Okay. This is going to kill literally every one.' If the government stomps around, and the dangers that the government--it's one of those very rare cases where the dangers that the government will interfere too little rather than too much.
Russ Roberts: Possibly.
Russ Roberts: Let's close with a quote from Scott Aaronson, which I found on his blog--we'll put a link up to the post--very interesting defense of AI. Scott is a University of Texas computer scientist. He's working at OpenAI. He's on leave, I think, for a year, maybe longer. I don't know. Doesn't matter. He wrote the following.
So, if we ask the directly relevant question--do I expect the generative AI race, which started in earnest around 2016 or 2017 with the founding of OpenAI, to play a central causal role in the extinction of humanity?--I'll give a probability of around 2% for that. And I'll give a similar probability, maybe even a higher one, for the generative AI race to play a central causal role in the saving of humanity. All considered, then, I come down in favor right now of proceeding with AI research... with extreme caution, but proceeding. [emphasized text in original]
My personal reaction to that is: That is insane. I have very little--I'm serious. I find that deeply disturbing and I'd love to have him on the program to defend it. I don't think there's much of a chance that generative AI would save humanity. I'm not quite sure for what he's worried about, but if you're telling me there's a 2%--two percent--chance that it's going to destroy all humans and you obviously think it's higher, but 2% is really high to me for an outcome that's rather devastating.
That's one of the deepest things I've learned from Nassim Taleb. It's not just the probability: It's the outcome that counts, too.
So, this is ruin on a colossal scale. And the one thing you want to do is avoid ruin, so you can take advantage of more draws from the urn. The average return from the urn is irrelevant if you are not allowed to play anymore. You're out, you're dead, you're gone.
So, you're suggesting we're going to be out and dead gone, but I want you to react to Scott's quote.
Eliezer Yudkowsky: Um, two percent sounds great. Like, 2% is plausibly within the range of, like, the human species destroying itself by other means.
I think that the disagreement I have with Scott Aaronson is simply about the probability that AI is alignable with the--frankly haphazard level that we have put into it and the haphazard level that is all humanity is capable of, as far as I can tell--because the core lethality here is that you have to get something right on the first try or it kills you. And getting something right on the first try when you do not get, like, infinite free retries as you usually do in science and engineering--is an insane ask. Insanely lethal ask.
My reaction is fundamentally that 2% is too low. If I take it at face value, then 2% is within range of the probability of humanity wiping itself out by something else, where if you assume that AI alignment is free, that AI alignment is easy--that you can get something that is smarter than you but on your side and helping--2% chance of risking everything does appear to me to be commensurate with the risks from other sources that you could shut down using the superintelligence.
It's not 2%.
Russ Roberts: So, the question, then, is: What would Scott Aaronson say if he heard your--I mean, he's read your piece. Presumably he understands your argument about willfulness. I should just clarify for listeners, alignment is the idea that AI could be constrained to serve our goals rather than its goals. Is that a good summary?
Eliezer Yudkowsky: I wouldn't say constrained. I would say built from scratch to want those things and not want otherwise.
Russ Roberts: Okay. So, that's really hard because we don't understand how it works. That would be, I think, your point, and tell me that--
Eliezer Yudkowsky: It's hard to get on the first try--
Russ Roberts: Yeah, on the first try.
So, what would Scott say when you tell him, 'But, it's going to develop all these side-desires that we can't control'? What's he going to say? Why is he not worried? Why doesn't he quit his job? Not Scott, people in the--let's get away from him personally, but people in general. There's dozens and maybe hundreds, maybe a thousand--I don't know--extraordinarily intelligent people who are trying to build something even more intelligent than they are. Why are they not worried about what you are saying?
Eliezer Yudkowsky: They've all got different reasons. Scott's is that he thinks that intelligence--that he observes intelligence makes humans nicer. And though he wouldn't phrase it exactly this way, this is basically what Scott said on his blog.
To which my response is: Intelligence does have effects on humans, especially humans who start out relatively nice. And, when you're building AIs from scratch, you're just, like, in a different domain with different rules and you're allowed to say that it's hard to build AIs that are nice without implying that making humans smarter--like, humans start out in a certain frame of reference. And when you apply more intelligence to them, they move within that frame of reference.
And if they started out with a small amount of niceness, the intelligence can make them nicer. They can become more empathetic. If they start out with some empathy, they can develop more empathy as they understand other people better. Which is intelligence--to correctly model other people. But saying that this is not--
Russ Roberts: That is even more insane. I haven't read that blog post and we'll put a link up to it. I hope you'll share it with me. But again, not attributing it to Scott since I haven't seen it, or assuming that you've said this fairly incorrectly[?and correctly?], the idea that more intelligent people are nicer is one of the most--it would be very hard to show with the evidence for that. That is an appalling--
Eliezer Yudkowsky: It is not a universal law on humans.
Russ Roberts: No, it's not.
Eliezer Yudkowsky: I think it's true of Scott. I think if you made Scott Aaronson--
Russ Roberts: Very possible--
Eliezer Yudkowsky: smarter, he'd get nicer, and I think he is inappropriately generalizing from that.
Russ Roberts: There is a scene in Schindler's List, the Nazis, I think they're in the Warsaw Ghetto and they're racing--a group of Nazis are racing. I think they're in the SS [Schutzstaffel]. They're racing through a tenement. And, it's falling apart because the ghetto is falling apart. But, one of the SS agents sees a piano. And he can't help himself. He sits down and he plays Bach or something. I think it was Bach. And I always found it interesting that Spielberg put that in or whoever wrote the script. I think it was pretty clear why they put it in. They wanted to show you that having a very high advanced level of civilization does not stop people from treating other people--other human beings--like animals. Or worse than animals in many cases. And exterminating them without conscience.
So, I don't share that view of anyone's--that intelligence makes you a nicer person. I think that's not the case. But perhaps Scott will come to this program and defend that if he indeed holds it.
Eliezer Yudkowsky: I think you are underweighting the evidence that has convinced Scott of the thing that I think is wrong.
I think if you suddenly started augmenting the intelligence of the SS agents from Nazi Germany, then somewhere between 10% and 90% of them would go over to the cause of good. Because there were factual falsehoods that were pillars of the Nazi philosophy and that people would reliably stop believing as they got smarter. That doesn't mean that they would turn good, but some of them would've. Is it 10%? Is it 90%? I don't know.
Russ Roberts: It's not my experience with the human creature.
Russ Roberts: You've written some very interesting things on rationality. You have a beautiful essay we'll link to on 12 rules for rationality ["Twelve Virtues of Rationality".] In my experience, it's a very small portion of the population that behaves that way. And, there's a quote from Nassim Taleb we haven't gotten to yet in this conversation, which is, 'Bigger data, bigger mistakes.' I think there's a belief generally that bigger data, fewer mistakes. But Taleb might be right and it's certainly not the case in my experience that bigger brains, higher IQ [intelligence quotient] means better decisions. This is not my experience.
Eliezer Yudkowsky: Then you're not throwing enough intelligence at the problem.
Russ Roberts: Yeah, I know.
Eliezer Yudkowsky: If you literally--not just decisions where you disagree with the goals, but, like, false models of reality--models of reality so blatantly mistaken--that even you, a human, can tell that they're wrong and in which direction, these people are not smart the way that an efficient--a hypothetical, weak, efficient market is smart. You can tell they're making mistakes and you know in which direction. They're not smart the way that Stockfish 15 is smart in chess. You can play against them and win.
The range of human intelligence is not that wide. It caps out at, like, John von Neumann [?] and that is not wide enough to open up what humans would be epistemic, that these beings would be epistemically or instrumentally efficient relative to you. It is possible for you to know that one of their estimates is directionally mistaken and to know the direction. It is possible for you to know an action that serves their goals better than the action that they generated.
Russ Roberts: Isn't it striking how hard it is to convince them of that even though they're thinking people? History is--I just have a different perception, maybe.
To be continued, Eliezer.
My guest today has been Eliezer Yudkowsky. Eliezer, thanks for being part of EconTalk.
Eliezer Yudkowsky: Thanks for having me.