0:37 | Intro. [Recording date: March 3, 2025.] Russ Roberts: Today is March 3rd, 2025, and before introducing today's guest, I want to share the results of our annual poll of your favorite episodes of last year, 2024. I want to thank everyone who voted. Here are the Top Ten: - And tied for 1st place, two episodes:
I want to, again, thank everyone for voting. We'll have links to all of those episodes that made the Top 10, and I would remind listeners that we have a category called Favorites where you can listen to past years' favorite episodes of listeners. And, now for today's guest, author and educational consultant, Daisy Christodoulou. She was last here in February of 2025, talking about Coase, the rules of the game, and the costs of perfection. Her substack is entitled: No More Marking. And, for listeners not from the United Kingdom, 'marking' is what we call 'grading' in the United States--and I don't know what they call it anywhere else. But, our topic for today is feedback in education and the potential of AI [artificial intelligence] to provide feedback. Daisy, welcome back to EconTalk. Daisy Christodoulou: Great to be here again, Russ. |
2:49 | Russ Roberts: We're going to base this conversation on some recent essays of yours at your substack, No More Marking, and we'll link to those for readers to check out. Am I correct--that is what people in Britain call grading, right? Daisy Christodoulou: Yes, yes, you're right. So, marking in a technical sense in the United Kingdom means the application of a number to--or a number or grade to a piece of work. So, yes; although I will say that a number of people in the United Kingdom don't think that's what it means, either. Maybe in that sense we're misnamed, but it is a great name. Russ Roberts: Let's start by talking about feedback. You write that it can be a thermometer or a thermostat. Explain that phrase. Daisy Christodoulou: Absolutely. So, when it comes to education, I think anyone working in education will talk about the importance of feedback. It's a very popular topic, I think, with teachers and with students--the idea that you need to get some idea of how you're doing to improve. But feedback is obviously a very, very important concept beyond education; and the origins of that--not a term that's used a lot now, but maybe cybernetics--the origins of these sort of control systems, the origins of information technology--a lot of the origins of those fields of study are in the concept of feedback: of how you change a system based on inputs. And so, a really nice way of thinking about feedback in that more general sense is to think about a thermometer and a thermostat. So, a thermometer is a measurement tool: it will measure the temperature. A thermostat will change the temperature based on the thermometer's reading. And, I apply that analogy to educational feedback by saying that a teacher can give feedback where they can just kind of give a measurement, read a measurement of the student's work. And that would be maybe to just give the grade--to say, 'This is the standard it is at.' But, when we talk about feedback, what we are hoping is that you will be able to give the student some kind of information that will move them closer to the goal state. So, the aim of feedback in that technical sense in any sector is to think of something that will move you closer to your goal state. So, in the case of the thermometer, we have the reading and thermostat. If the reading of the thermometer is too low--so if the thermometer comes back 18 degrees Celsius and you have a goal state you want to achieve of 20, the thermostat will kick on the heating and it will then stop when we get to 20. And likewise, if your thermometer comes back at 22 and your goal state is 20, you will then switch on--this thermostat will switch on the air conditioning and bring that temperature down. The aim with educational feedback is: We have a goal state. I was an English teacher, what will our goal state be? Our goal state is we want our students to read fluently, infer insightfully, write coherently. How do we move them from where they are at the moment to that goal state? We could also think about the goal state in terms of grades. If you want to say, 'Well, they are a grade C and we want to move them to a grade A,' so, what we have to do--in some way, shape or form--is provide feedback that will close the gap between the actual state where the student is at the moment and the goal state. And, as I say, this is something that is crucial in so many parts of information technology of so many parts of the world, but it's of big interest to teachers, as well. |
6:22 | Russ Roberts: Of course, in education, if I say to you, 'You've earned a C,' some people say, 'Great. Oh, that's fantastic. I didn't think I was going to get such good grade,' and they're done. But, others will say, 'Oh, I need to improve.' And, as you point out, the letter grade or the number grade--a 73--doesn't tell you how to improve. It just says, 'There's room for improvement.' And, in addition, it doesn't really, by itself, tell you where the holes are. There could be things that you're doing at a good level--at a quality level--and there could be things that you're doing that are inadequate: but overall it's a C. In which case it's not very helpful to tell you where to start other than to try harder. And, trying harder, by the way, is not usually a very helpful bit of advice. I want to make--as we go on, I want to make a distinction between two kinds of feedback that we might consider. One is writing. And, a lot of what you wrote about in your essays is about giving feedback on written essays. And, the second I would say is we would call content knowledge, which could include something simple like, 'Do you get the facts right?' But, it also could include, 'Are you able to apply the knowledge you've gained to other things?' Which would be the highest level. My favorite bad piece of feedback that I received as a teacher is when a student gave me a 1 on a scale of 1 to 5 and said, 'This course is unfair. Professor Roberts expects us to apply the material to things we've never seen before.' And, I used to read that, of course, at the beginning of every class--because that was the goal of the course. But these two types of feedback--suggestions, we might call them, on an essay, or evaluation on an essay; and then, content and application--I think, are different. Do you agree with me? Daisy Christodoulou: Yes, I can see what you're saying. Do you mean--what you're talking about--both types of feedback could be in the form of something that the teacher has written at the bottom of a student's response? Yeah. Yeah. So, there are of course all kinds of different categories that you could say that those comments would be in. So, I'm concerned a lot now in my day job with technical accuracy of students' writing. So, I tend to think a lot about things to do with sentence structure and apostrophes and tense. So, that's one category. Another category is just getting the facts right in terms of what they're writing about. So, an example I always give is there was a play that I used to teach quite a lot, called "An Inspector Calls," when I was an English teacher; and a really common basic factual error be the students would mix up two characters. There was a character called Gerald and a character called Eric and they'd mix them up. And it wasn't the worst mistake in the world, and it didn't mean they couldn't be making some other really excellent points. But it was really confusing. And it would confuse them. And it was not great. So, that's a really good example. And then--I think then the other bit you're talking about is then your ability to perhaps have those higher-order analysis and application to other ideas and perhaps those things which really elevate a piece of writing to the top levels. Are those the kind of things that you're thinking about? Russ Roberts: Well, I'm thinking about the history of the comments I've seen typically on my kids' papers and on my own papers, and on--sometimes when I was giving feedback, the feedback I would give to a certain kind of answer and it would--and the example I would give would be: I would write in the margin--or I would see in the margin--'Awkward,' or, 'Confusing.' You circle a paragraph and say, 'Confusing.' And, the student--it took me a long time before I realized, and you know this quite well, you write about it--the student reads that and goes, 'Okay, now what?' And, that's different than, 'Gerald and Eric are two different characters,' and you go, 'Oh yeah, I got them confused. Okay, I get it now. I might have to reread the play and learn.' Daisy Christodoulou: Yes. Although--yeah. Go on, carry on. Russ Roberts: No, go ahead. Daisy Christodoulou: So, what I would say is that--actually I get the distinction you're making of those different categories, but I would actually say in all of those categories, I suppose my slightly heretical thought for somebody like you [?who?] spent a lot of time giving and receiving written feedback, is that: written feedback is not optimal for any of these categories, and that written feedback is not a particularly effective way of doing feedback. |
11:01 | Russ Roberts: Okay. So, let's take essay, both grading and education. You're involved in a number of projects, which I find utterly fascinating. We'll get into it in a little more detail. We touched on your previous episode. But you're involved in some projects of thinking about: How do we scale grading and feedback? But, what I'm having in mind here is that a student writes an essay and it's mediocre. There are many, many, many ways it can be mediocre: It could be written in the wrong tone, it could have actual errors, it could lack sparkle--it could be--so it's--we would call it flat. And as a result, it gets a mediocre grade, a C+, a B-, whatever it is, depending on the system. And, the student gets that grade and thinks, 'How do I do this better?' So, the next level up is to say, 'Well, I've told you where it's better. When I wrote Awkward, that meant you should try to rewrite that.' And so, I'm curious what--if written--I'm going to disagree with you in a minute about written feedback. I think there is some written feedback that could help. But, if I don't give that student written feedback, what do I do for them? Daisy Christodoulou: All right, great question. So, let's just also think about the--quick thing before we get into the meat of this, which is the big question I do want to get into--but also just about age groups here. So, I taught students aged 11 to 18. That's where a lot of my expertise is. I now do a lot of work with 11 to 18, but also younger, so 5 to 11. But, I've also been a peer-reviewed author on an academic paper where I got written feedback from the peer reviewers. And, what I'm saying, I think, holds true, I would say, across the age ranges. So, there's some of what I'm saying, I think, that holds true from ages five up to however old you are when you're writing your peer-reviewed papers. Some of what I'm saying is truer for certain age groups than others. But, there is a central point here that I think is true across all of the age groups. So, however you are involved with giving and receiving written feedback, I think there are points I make here which hold true across those ranges. And the central point I will make, which we touched on last time I spoke to you, is that prose is not optimized for action. Prose is not optimized to move you on to that next level. And, the central philosophical, theoretical piece behind this is, again, someone I talked about in the last episode I did with you and who you have talked about since and you talk about a lot, which is Michael Polanyi. So, Michael Polanyi's entire theory of tacit knowledge is that there are things that we can do but we cannot tell. There are a whole bunch of stuff out there that we do have this intuitive real understanding and sense of, but it's very hard to put that into words. And I can give you some examples. Polanyi's classic example is riding a bike. You can think about--also if you're swimming, learning to float. If you read an entire book on how to ride a bike and then you put the book down and you go and try and ride the bike, is that time spent reading the book better or worse than some time spent actually with your mum or dad on the bike, trying to get your balance? Obviously the latter is more important. And so, the central point I'm making that I think holds true across the range is that prose is not optimized for improvement. And let's carry on. I can keep going. I'll keep going. I've got-- Russ Roberts: Go on a little bit more-- Daisy Christodoulou: Let me give you an example. And so, as I say--I would say Polanyi is the kind of guiding philosophical point behind this. But, let me give you some really practical examples. So, there is a modern education writer called Dylan Wiliam, a big name in the field of education and assessment and feedback. And, he has given the most brilliant example of how this plays out in a middle school classroom. And, he says he was visiting a middle school classroom and the students had all been given quite detailed written feedback on a science investigation that they'd just completed. And one of the students had been given the feedback, 'You need to make your scientific inquiries more systematic.' And, Dylan Wiliam says to the student, 'What do you understand by that? What are you going to do differently next time?' And, the student says, 'I don't know. If I'd known how to be more systematic, I would have been so the first time.' Russ Roberts: Yeah. Daisy Christodoulou: Now, it's obviously a flippant response, and you can imagine the kind of student who gives that kind of response. I've taught students like that. But, when I read that-- Russ Roberts: It's true-- Daisy Christodoulou: paragraph, when I read that paragraph in Dylan Wiliam, I cringed. Because I had spent large chunks of my life as a teacher writing feedback like that--large chunks, particularly of my Sunday evenings. And not only had I done that: I had done that because it was recommended as best practice. So, the best practice, when I was training as a teacher--and I've since realized that this is often seen as best practice in many other schools and countries that I now work in, including the United States--the best practice is seen to take the language of the mark scheme and give that language back to the student. That, that's the most transparent way of getting them to understand how to do better. And the one I was probably guilty of because I would copy it from the mark scheme was saying to students--well, writing down, 'You need to infer more insightfully.' Now this is what Dylan Wiliam would say is True But Useless. It is TBU. It's like telling an unsuccessful comedian to tell funnier jokes. Russ Roberts: Yeah. Daisy Christodoulou: Yes. Okay? it's what--yes, it's true. But what do you do? What is the action that you take as a result of that? So, if we go back to the thermometer/thermostat, this is not closing the gap. So: This leads on to my next point, which is that the thing you need to do to close the gap and the action you need to take--I've just explained via Polanyi why prose is not optimized to do that. So, a lot of people will say to me, 'Oh, well, you know, that teacher's feedback, it wasn't precise or specific enough. They needed to really give the specific action step.' And, my point is, prose is not good for this. You need something else. So, what is that something else? Dylan Wiliam, he talks about it as being a recipe for action. And, he also, I think quite rightly, talks about sport: that actually often in sport, we have a much better idea of the action steps you need to take to get from your current state to your goal state. And the crucial insight here, which I think is not fully appreciated enough in education, is that the actions you need to take to get from your actual state to your goal state can often look very different to either the actual state or the goal state. Let me give a concrete example. You've written your piece of writing: you've written your essay. And let's say that you need to infer more insightfully--that is true. This student needs to infer more insightfully. What is inference? What do you need to do to be better at inferring? Well, actually, that's a huge thing. And what we know you need to do is be better at inferring: you need to have a wider vocabulary and more background knowledge. That's actually what allows you to make inferences. So, what I would say is, in the case of that student who needs to infer more insightfully, they--you could do a sequence of lessons with them on teaching them new vocabulary and perhaps some new prefixes and suffixes which will help them expand their vocabulary even further. That would be an activity which will help them improve to move from the actual state to the goal state. But it doesn't look like writing an essay. It doesn't look like the piece of work that you submitted or the piece of work that you're going to be doing. In fact, in those lessons where you're learning about new words, you may never hold a pen or write. But my argument is, that that will help you get to your goal state. And if we think about this, Dylan Wiliam talked about this in the context of sports, and I am, too. And, a big metaphor that I used in my second book, Making Good Progress, I use a metaphor which I think people really understand--they get this more than they do when I talk about it in the context of academics--I use the metaphor of running a marathon. So let's imagine you want to learn to run a marathon and you've never run a marathon before; and you have three to six months to train. Do you go out and run a marathon in every single training practice session? No, you do not. Do you think that the only way you can measure your improvement from your actual state to your goal state is to run a marathon and then run another one and then see if you've got any faster? No, you do not. What you do is you set up a training plan. You will increase your mileage gradually, and you will build into that training plan activities that do not involve running. That do not look like running. So, one of the reasons people often struggle with marathon running is they need to build up their muscles to allow them to get to those heavy mileages. So, you have to do, often--and I speak from experience: I have run one marathon--you have to do strength work in the gym and things like yoga and have massages and do some strength and conditioning. You have to do all these things which don't look like running, to make yourself a better runner. And the same is true in academics: that there are a bunch of things you can do that will make you a better writer that do not look like writing. And, I would argue that is true not just for total novices, but also for students who are at a more advanced level. And, with the marathon analogy, it would hold true as well: that spending time in the gym is important for novice runners. You look at elite marathon runners, they will do that, too. So, I would say that one of the big issues we have in education with the written feedback--not the only issue, but one of the big issues--is that the written feedback kind of encourages you to maybe just focus on redoing the work in some way when what you actually need to do is to step back and think what--the term I use in my book is 'model of progression.' You need to step back and think, 'There is a model of progression here. There are some steps we need to take. What are these steps we need to take? Which steps are missing?' And, some of these steps will not look like doing another piece of writing. They would involve something very different. |
20:43 | Russ Roberts: Well, I think that's extremely interesting and I agree a lot of it, but not all of it. So, let me give you where I'm a little bit--I disagree. One of the challenges--the recent sports analogies are so attractive is that we know a lot about sports and we think about them and we can often measure improvement. Which is much harder in the kind of academic environment we're thinking about. So, an analogy that I would agree with you a hundred percent on--I've used it before, it's--I love it because it was so insightful for me--is in baseball: If you're having trouble hitting the ball well, a coach will often say, 'Keep your eye on the ball.' It's really good advice. It's also really hard to do. And, if you don't explain why it's hard to do--and I didn't realize this until I coached baseball--the reason it's hard to do is because when you're hitting a baseball, your body is turning counterclockwise, but as the ball comes in, your head has to turn clockwise because it's seeing the ball coming from the pitcher's hand. And, this could be cricket for those of you listening at home outside the United States or places where baseball is played. But, when the ball is coming towards you, as it gets closer and closer, if you're going to watch it close to where it's going to hit the bat, you have to turn your head in the opposite direction you're turning your body. This is not natural. And, basically you have to train to do that. You have to practice that motion in some dimension--not literally probably, but being aware that is very useful. So, that's good feedback, but it's not enough; but it's good feedback compared to the, 'Keep your eye on the ball,' which is useless feedback. So, that's true. Daisy Christodoulou: That's a fantastic example. That's a fantastic example. Russ Roberts: Thank you. I've got one. Daisy Christodoulou: Yeah. There's similar things in football, in soccer. Russ Roberts: Sure. Daisy Christodoulou: So, I am terrible at understanding baseball; and it constantly pops up in a lot of academic literature and I just havepeople explain it to me. I've seen a couple of games. I've watched too much cricket. I'll never get baseball. I didn't play cricket, I watched a lot of cricket. I did play soccer and football. And, one of the bits of feedback you get then is people say, 'Look, you've got to keep your head up. Keep your head up and look around the field.' And, what happens with young players when they get the ball--and that is good advice--but what happens with young players is their first touch is poor, so whenever they get the ball, they're looking down at their feet. Okay? They look down at their feet because they don't have a great first touch and they're still learning the mechanics of touching the ball. So, actually, if you say to someone learning to play football [soccer], if you say to them too early, 'Keep your head up,' well, they get the ball and they trip over the ball because they haven't got their first touch sorted. Russ Roberts: Or it's very far away and it doesn't matter that they've got their head up because their first touch was so bad. The analogy-- Daisy Christodoulou: Right. So, what you-- Russ Roberts: No, go ahead. Daisy Christodoulou: What you need to do in that situation is not say, 'Keep your head up.' You need to say, like, 'We're going to do some passing drills.' Russ Roberts: Yeah, exactly. Daisy Christodoulou: And, again, to go back to my example about this being not just true for novices: What do all the elite clubs do? And all the--the really good ones who are pioneering and all the tiki-taka football and the new tactics? They do lots of passing drills--lots of small-sided passing drills, where you're just trying to get that incredibly beautiful slick first touch that you don't even have to think about. And, if you've got that and you can depend on that, sure, then you can get your head up and you can start looking around and scanning the pitch. |
23:59 | Russ Roberts: My other example, which I've joked about in print, is the advice, 'Don't bunch up,' to young players. And, young players all surge around the ball, and the parents are screaming on the sidelines, 'Don't bunch up.' And, every person there is saying, 'Okay? Let's stop for a minute. You run over--.' It's useless. It makes the parents feel better. But, here's where I don't think it carries over into academic life. So, let me give you that. When I think about a badly written essay where inference is inadequate, where background knowledge is insufficient, where vocabulary is limited, often I think sufficient advice in many situations is: 'Read more and write more.' Practice. Practice doing this funny thing called thinking and this funny thing called writing. And, I think--I know I became a much better writer when I wrote more regularly. I don't think I ever, ever wrote thoughtfully in the sense that I stepped back and said, 'Now what can I do to make my writing better?' I just wrote more. And I read more. Which also helps: they're obviously related. It's just not obvious to me that--maybe it varies by age. But, it's really hard--what you're really getting at when you say, 'Let's do something more than tell people: Infer more thoughtfully,' is, you got to teach them how to think. And, teaching people how to think, we're not that good at. And, the process by which we do it here at Shalem College, which I adore and love and have become a total drinker of the Kool-Aid, is: Read difficult texts in the presence of thoughtful people and discuss deep questions. And, in that process, your brain will change in ways we don't really--I call it a black box. I don't really--we don't understand that process very well, but I have no doubt that if it's taught well, meaning not explained, but then exploration takes place, that your brain will get better. Daisy Christodoulou: So, again, some of that I'd agree with, some I disagree with. I think some of it probably does--some of the differences I'm going to make here probably do apply to age. So, maybe what you're saying is true for the age group and the cohort you're teaching, whereas I'm thinking perhaps about younger and less able students. Look, I love reading and writing and I would agree with a lot of what you said there in my own personal life in terms of you read more and you write more and you improve. And, I would also say in some ways that chimes a bit with the point I'm making, in that you can read things that don't immediately seem directly linked to the thing it is that you are interested in, but the more you read and the more you know, the more you will be able to make links. One of the things I'm very keen on and which I've written about in my first book is the value of knowledge and the value of committing knowledge to memory. Because, people often talk about creativity as being about making connections in unusual ways, but to make those connections, you have to know some stuff to begin with to connect between them. So, my argument is that a large part of education really does involve remembering things. And, people don't often like the word 'memorization,' but I think it is a crucial part of education. And I don't think--I really dislike it when people talk about skills like creativity and communication in the abstract that actually they are tied to specific bodies of domain knowledge. And, I do think that the more knowledge you have and the more insights you have into different fields, the more connections and unusual connections you can make across those fields. So, to that extent, I'd agree with you about reading more and writing more being important. I really don't like it perhaps at primary and early secondary and even upper secondary when the solution to everything is read more or write more. Russ Roberts: Fair enough. Daisy Christodoulou: And, one of the things that teachers will say a lot is--they say, 'A lot of students, they just need to build their writing stamina.' There is a problem akin to maybe trying to move on to running the marathon too soon, if you try to get students to write too much too soon when they haven't built the building blocks. Because, they will build bad habits, but they will embed those bad habits through repetition and through practice. This is a real, real issue: that if you have a student who has poured their heart and soul into a piece of writing and they really want you to read it and give feedback and engage with it and they have so many technical errors that it is hard for you to read and engage it, that is a real problem. And so, I do think there is an issue--a big issue--with running before you can walk, and that sometimes there is a pressure on teachers to--and I've seen this in the United States and the United Kingdom--to think that the way you show advancement is to get students to write more, to get students to do more. So, when you said--the way this often transmits itself is learning by doing. And I want to nuance something I said before because I talked about Polanyi and I talked about, 'Look, you can't just read a book about riding a bike: you got to ride the bike.' And so, people often say that's learning by doing. I want to really heavily nuance learning by doing. Okay? And this goes back to my marathon point. Let's think about--this is the best analogy to have with it. What I've said with a marathon is you do not want to go out and run a marathon in your first training session. You need to do some other activities which are going to help you build towards that. But, obviously, if one of those activities is just reading a book about the marathon, that might be some help, but it's not enough. So, I do agree you have to learn by doing, but what I don't agree is that the doing has to look like the end goal. Sometimes the doing has to look different. Russ Roberts: Agreed. Daisy Christodoulou: And for me, the goal of the teacher and the curriculum designer, and the assessor who is building the assessments into this to give us the feedback, is to design the right model of progression. Now there can be wrong models of progression, because obviously what I'm saying here is you need to do stuff that doesn't look like the end goal. But there's a lot of things that don't look like the end goal and some will be rubbish and won't help you to the end goal. So, let's take the marathon example. If all you do before your marathon is read books about the marathon, that's not the right model of progression. That's not going to get you to run the 26 miles. And, I agree, that can be an issue because I think there's--obviously, sometimes going out for a run is--it's cold and raining: is there something I can do that doesn't look like going out for a run? I'll stay in and I'll read the latest guide to the gear or whatever. So, that is an issue. What I'm saying is you've got to construct a model of progression; and we have to test and get data and feedback on if that model of progression is the optimal way. And, this is where sports are so interesting because so many people are doing this for sports; and it is an incredibly intensive area, and people will get very irate about whether the Yasso 800 or the Lydiard Method are the particularly best ways of getting you to your end goal of running a marathon in whatever time you want. There are big arguments about the best types of progression model. But my point is that for all of us, we have to be building that progression model; and there will be some things in that progression model that do not look like that end goal. And, if the only advice and feedback you are giving is just to read more and write more, I just don't think that's enough. Russ Roberts: Oh, fair enough. Fair enough. Daisy Christodoulou: And, perhaps when you get closer--look, you are teaching at university, I'm guessing you've probably got a pretty selective cohort. You've got students who can do the mechanics of reading and writing and have large vocabularies and already have some idea of different topics. And so, yes, for them, perhaps, that's the right thing. Although, I would say I do talk to university professors and a lot of them say to me that even their undergrad and post grad students might be--it might help them to have some kind of a refresher or some kind of understanding of sentence structure. And, I think that sentence structure is something that perhaps isn't taught enough and people think you can just pick up from the environment. And another point I would make here is the challenge of just picking things up. The 'just pick it up'--I call this the 'just pick it up' fallacy--is that if you're just relying on absorbing things from the environment, that is often not enough. So, go back to my Polanyi bike-riding example. Yes, I think you need to learn by doing, but I do think when you look at the way students do learn to ride a bike, it often isn't just get on and have a go at. There will be a parent helping them and riding them along for a bit. So, what I'm saying is you learn by doing, but you learn by a kind of heavily structured--to begin with--to begin with, a very heavily structured method that is avoiding you going off onto rabbit holes and dead ends, that is keeping you on that straight and narrow. And then, as you are getting better and better and better, obviously some of those guardrails or stabilizers literally can be taken away and you can have a little bit more freedom and the feedback that you get can perhaps be a little bit more on the order of the read-more/write-more. But, in those early stages of learning anything, you need structure, you need guidance. You do need to be doing, but you need to be doing within the guardrails. So, that would be my point. And, to go back to all of this, prose is not optimized for any of this. Russ Roberts: Well, I agree with all that, and I especially take the point that for more advanced students, older students, read-more/write-more is actually--I would say is a life strategy as much as anything else and a lifelong learning strategy. It's not necessarily that helpful for, quote, "mastering" a particular discipline or subject at all. It's a much more general set of skills you're acquiring. |
33:34 | Russ Roberts: I want to pick--I'm going to take us in a different direction. I want to pick a different example of pedagogy and see what you think of it. I learned this from Orson Scott Card, the science fiction writer, and he told me that he taught creative writing. He would ask people to write things and he would grade them on it, but he would also grade them and maybe overwhelmingly base the grade on the grades and feedback they gave their classmates. So, you would be assigned a classmate and you would have to read their essay and give them feedback; and you would write your own essay and take feedback about it from a classmate. And, the reason I love that insight is that you could argue--not everybody would agree, but I would agree--you could argue that becoming a great writer requires being a great editor. Most of us don't write a great first draft. In fact, many people would say the goal of the first draft is just to get it down on paper. And, that the way you become a good writer--meaning the way you produce a great essay or a great book or even a great blog post or even a great tweet--is to improve the first draft. And so, what this insight of Orson Scott Card is, is that that's the bicycle that you need to learn how to ride, is the--how to give yourself feedback, how to evaluate and judge your own writing. And, you do that by practicing on the writing of others. And, eventually, you can find mistakes and ways to improve your own essays. And, sometimes it's easier to see that in other people's work. I think that's a profound insight about--at least about writing. I'm not going to say it's about anything else, but about this strange craft of becoming a better writer. Because--and I'll add one more thing. You can write down an algorithm for how to ride a bicycle, and I agree that reading the algorithm is not that helpful: it's through the experience of riding the bicycle that you acquire the skills of the algorithm. There's no algorithm for how to write a great essay. And so, you're stuck. Okay? You can disagree with me, but I would say you're stuck. There's certain rules of thumb: vary the length of sentences, don't use clichés. There's certain rules of thumb, but it's hard for me to imagine an algorithm. Maybe you'll prove me wrong. Go ahead. Daisy Christodoulou: So, I think there are more algorithms than you think. Russ Roberts: Okay. Daisy Christodoulou: And, I think that one of the--so again, this is also--we have to think about what age range and what level of expertise are we dealing with. Now, what you've explained there is something that for a very expert small group of learners with a very expert teacher, I can see working well as a learning activity. Even then, I would say what that activity depends upon--and again, this is my marathon point, and again this is like the iceberg point--is the small 1/8 of the thing you see above sea level is dependent on so much below sea level that is unseen. And, whilst I can accept that that activity would work in that limited context, I suppose why I get slightly antsy about it is what I see in my day job and in my life is people saying, 'We are now going to take that and we are going to roll this out across the board, across ages 5 to 18, any kind of school, anywhere, and we're going to make it a model for also how we give high stakes grades.' And, that happens. Russ Roberts: Terrible idea. Terrible idea. I agree. Daisy Christodoulou: Yeah. Right. And, that is really, really problematic, because what a lot of those students actually need is what I said before. So, one of the biggest things which just gets taken for granted is vocabulary size. That vocabulary size correlates and is probably causative for so many other things. And there are so many students who simply--one of the things that is holding them back as writers is vocabulary. And, actually, I would say you could argue that's pretty true of even some older students and post-graduates. And so, maybe some of the students on this case, too. So, you are saying to me: There's no algorithm for becoming a good writer. What I'm saying is: Actually, I probably think 90, 95% of what it takes to be a good writer is--there are some algorithms that are teachable; and then there's probably some magic inspirational last mile 5% which separates the solid and the good from the absolutely superb. But, what I'm saying to you is: Yes. Okay. The person you mentioned, that if they're operating up there at that top 5% or even, like, 1% of the magic, the inspiration--what I'm talking about is I think there's an algorithm that gets you 95, 99% of the way. But actually for the types of writing most people need to be doing for most of their life, I think it is possible to teach quite a lot of that. And, all the stuff you're saying about varying sentence length, and sort of eye rolling a bit, I would say those things are quite important and they do reward practice. And, I personally say, as someone who writes a lot and has written a few books, that I do pay an enormous amount of attention to that, still. Even as someone who is a published author, that I go back and I will look at every single sentence wherever I can and try to identify the subject, identify the verb. Steven Pinker writes very well about a lot of this: that if you go back over every sentence, identify the subject, identify the verb, that makes it really clear what's going on in that sentence. And, that is a really useful discipline, I would argue, for seven-year-olds and for professional writers. Russ Roberts: Well said. Daisy Christodoulou: And, I'd be interested to see if you agree with that. Russ Roberts: Oh, I agree with that a hundred percent, in fact. Daisy Christodoulou: Right. So, there are some algorithms. And I would argue that a lot of published writers--we're in a world where anyone can open up a blog or a substack or a Twitter tweet. A lot of published writers would benefit from that. So, that algorithm--yes, there's an algorithm that would help a lot of people. Russ Roberts: Well, actually--yeah. There's some tricks. That would be one of them. You could also--I have a trick as a writer where when I'm not happy with a passage, I take each paragraph--I just do a single sentence--I lay the sentences out. And then I realize, 'Oh, this sentence doesn't belong here, it belongs over here,' or, 'I've said the sentence two different times, two different ways.' And, it's basically--it's post-writing outlining. I don't like the outline. I'm not sure it's a good algorithm. It's not a good algorithm for me. It could be for other people. And, I do want to--kidding aside and not mentioning my own marathon in 1977, which we're getting close to the 50th anniversary of--I'm very excited about, and like you, I've only run one, in the blistering-- Daisy Christodoulou: Yeah, one and done. That's me. One and done-- Russ Roberts: blistering pace of 4 hours and 20 minutes. Daisy Christodoulou: Aha. A 3:55, I've got you there, Russ. Russ Roberts: Well done. Bravo! So, putting that to the side, I think there are tricks, and it's certainly the case that you can reverse-outline like I did. You can outline beforehand--it's a good--can be useful for some people. You can look how many times you use the word 'it,' which can often be a form of sloppiness and leads to confusion. I would say there are rules of thumb and tricks that help. But what I think is harder is to teach people an algorithm for attacking an essay from scratch. That's what I meant to say. I think there are many algorithms for improving an essay and certainly for becoming a good writer. Ernest Hemingway said when asked by a young aspiring writer, 'How do you become a better writer?' He said, 'Read other writers so you know what the competition is that you have to beat.' I don't think that's particularly good advice, at least for that reason. |
41:15 | Russ Roberts: Let's shift gears--and again, conceding that much what you said is true, I want to talk about the use of large language models [LLMs], AI [artificial intelligence], that you write about, because I think it'll highlight some of the challenges that we're talking about and where technology can help things and not. And, you, in an essay, you start off by saying--I think it was 2024, you thought by--it was January, 2024--by January, 2025, we'll be able to do this, that, and the other with AI. And you realized when January, 2025 came along, that wasn't true. Talk about what was disappointing. Where has AI struggled? And, in particular, I'm interested in the role of hallucinations and even of order--which you'll explain, it's kind of shocking. Daisy Christodoulou: Yeah, absolutely. So, we have been working quite intensively to try and get large language models to improve feedback and improve grading. So, all the things that I've been talking about before about written feedback, there are now a lot of organizations in this space trying to use large language models to provide students with better written feedback. We are starting from the position obviously that written feedback is not necessarily the optimal way of giving feedback. And so, we are not just looking at ways LLMs will give the traditional written feedback, but ways they can actually help move students on. So, we are trying to do something even more than just provide the nice paragraph at the end of the piece of writing. So, that's the world we're in, and we've been working at very intensively. And, you refer to a blog post we wrote just a couple of weeks ago where we talked about the journey we've had over the past few years and the ups and downs. And, one of the major major downs has been the propensity of large language models to hallucinate. And, this is a huge problem if you want to automate something. So, by hallucination, I just mean the types of errors--the types of mistakes--the ways that large language models are capable just of making stuff up. And, actually, I listened to an episode you were doing a couple of weeks ago--was it with Reid Hoffman?--and you were talking about how much you love using some of these large language models. But at one point you asked it something--you typed in a recent medical test you'd had and the range of some numbers and you said, 'Is this range okay?' And, it came back and said, 'Yeah, brilliant. It's great. You're in the range.' And, actually you went and looked it up: you were not in the range. Russ Roberts: No. Daisy Christodoulou: Now, that is the kind of thing I'm talking about. And honestly, we've been tearing our hair over this for two years. We all at the organization--the company I work for, No More Marking--this is what we dream and have nightmares about. Because, the problem with these models--the thing that's so seductive and kind of reels you in and then they let you down--is when they are good at something, they do something really well, and it's brilliant. And you think, 'Oh my goodness, this is amazing. It's going to change the world.' But then, what you need for it to really change the world is you need to be able to crank the handle and for it to do that consistently. And, what they do instead is they spit out these random, quite dangerous errors that they're incredibly confident about, of which your example is a very good one. And you are left there going, 'Well, why did it do that?' And the answer is: Nobody knows. Because these models really are a black box. So, the problem is, is that a lot of software systems depend on automation. So, these things--and a lot of software, it depends on being deterministic, that you press a button--you press button X, and Y happens. And, we are now in a world where you press button X and maybe 95% of the time something of the order of Y happens, and 5% of the time something like Zed [Z] is happening. That is not what you wanted; and you don't know how to fix it. It's not a matter of just going in and debugging it. So, this is a really big deal. And, it's not a big deal--I get why people, just on their own terms, want to go in and use LLMs. I get that. I do that, too. But, it is a big deal if you want to start plugging stuff in and letting it work on its own, which is what--if you're a software company, effectively, which we are, people are used to being able to do. So, the big question then is: Can you fix hallucinations? And, obviously people will argue and argue and argue about it, and this turns into a huge philosophical argument about: how do we know if technology is going to improve? Like: Yes, on the one hand, Moore's Law has been holding true for a very long time. On the other hand, people have been predicting energy that's too cheap to [inaudible 00:45:43] since the 1950s, and that hasn't happened. So, which predictions come true? which don't? And you can argue that out. A little [inaudible 00:45:50] experiment I like to play with people is: Would you recommend a 17-year-old today should learn to drive? Russ Roberts: That's a really good question. Daisy Christodoulou: Yes or no? Russ Roberts: Good question. Daisy Christodoulou: Yeah. And--it is. What's really interesting is a lot-- Russ Roberts: And, the answer is-- Daisy Christodoulou: [inaudible 00:46:01] previous-- Russ Roberts: The answer is Yes. Daisy Christodoulou: Well, I live in London, so the answer in London is possibly not connected to our self-driving cars because you have good public transport here, whatever. But, I think in most parts of the developed world, the answer is yes. And, what's really interesting is when I ask people this, even people who have been really gung-ho about LLMs, and say, 'Oh, the hallucinations, they'll be fixed before long. Don't worry your little brain about those.' When I say to them, 'Would you ask a 17-year-old to learn to drive?' They'll say, 'Yeah, yeah, for sure.' Okay? So, that's just a way of putting a bit of a number on it of how certain [inaudible 00:46:35]-- Russ Roberts: Daisy, you might not teach them to drive a stick shift, which used to be-- Daisy Christodoulou: Hey, oh my God, everyone in the United Kingdom drives a stick shift. Russ Roberts: Well, in America, you wouldn't. In America, you wouldn't. In America, you wouldn't. Daisy Christodoulou: Russ, I live in London and don't own a car and I drive a stick shift. There's no such thing as an automatic in the United Kingdom. Russ Roberts: Depends on the place-- Daisy Christodoulou: No, you're right. And, obviously, all the electric cars now don't have a gear. We tend talk about the gear and the automatic, and actually, my colleague bought a new electric car the other day and it didn't have a gear. I was like, 'Oh my God.' |
47:06 | Russ Roberts: But, to give people a better idea of this hallucination problem, I think what most people think hallucinations are about is making up authors and getting facts wrong. Talk about--let's do a little review here because it's so--I love this insight of yours about comparative judgment. Talk about why comparative judgment is so powerful with humans and how it failed with LLMs at least so far. Because, it's shocking. Daisy Christodoulou: Right. So, I think a lot of the people who are trying to get LLMs to support with marking and assessment, what they will do is they will send the essays off to the LLM and ask them to give it a mark out of whatever or give it a grade. And, we've done a lot of that. And, the thing is, humans are not very good at that. And, actually we've found the LLMs are not particularly brilliant at it either. But, obviously, our big insight from our organization is that comparative judgment--humans are much better at comparative judgment than at absolute judgment. So, the comparative judgment is instead of looking at one essay or one piece of student writing and trying to place it on a mark scheme, you look at two, you say which is better, you collect hundreds of thousands of judgments like that from lots of different teachers. You can use the algorithm to combine them all, create the measurement scale. So, that's how comparative judgment works. And, what is really interesting: human comparative judgment is the gold standard of human judgment. And, most of these large language models are trained on human comparative judgment. So, when you read any of the LLM research papers about how they create them, they will talk about using a corpus of human comparative judgment as the training data for the LLM. And, you yourself--if you use LLMs a lot, I'm sure you've come across this--do you ever see--sometimes you ask it a question, it'll give you two responses and it will say, 'Which do you prefer?' Russ Roberts: No, I haven't had that. Daisy Christodoulou: You come across that? Russ Roberts: No. Daisy Christodoulou: You haven't had that? Russ Roberts: No. Daisy Christodoulou: Okay? Maybe it's just me. This happens to me all the time. Maybe I'm just more aware of it. What's happening there is the LLMs, they are gathering the comparative judgment data from the users in real time. So, they will have two slightly different models, each model will spit out an answer. The LLM will ask you, 'Which do you prefer?' You will pick the one you prefer and they will use that data and all the other data they're getting to decide which model is better. Okay? So, human comparative judgment is being used all the time in the creation, the refinement of larger language models. So, what we thought is, 'Well, why don't we get the LLMs to do some comparative judgment?' Humans are better at comparative judgment than absolute judgment. Maybe LLMs will be better at comparative judgment than absolute judgment. And so, yes, they were better, but they also displayed some very interesting biases and errors which are not present in humans. And, one of the most interesting was position bias. So, position bias: if you think about comparative judgment, you're being presented with two responses. You've got one on the left and one on the right. And, as a human, you have to look at both of them and say which you think is better. Now: we, I think, have processed about 40 million human judgments using our software, and we have found that humans do not display significant position bias. So, I think the--I can't remember the exact percentage, but I think something like 50.5% of human judgments are for the left and 49.5 are for the right. So, because that's such a big number, that is significant; but it is tiny. It's very small. So, essentially, our humans are not displaying any kind of position bias. They're not really influenced by whether something is shown on the left or shown on the right. That is not true of large language models. So, large language models in the research we've done and the research lots of other people have done, they do show position bias--by which I mean they will change their mind if a piece is presented on the left or the right. So, you have Piece A and Piece B. If the LLM sees it in order of A on the left, B on the right, it [?would?] pick A. And, if those are reversed, it might pick B. And, it does tend to be--I think what we've seen--the position bias is to the left. The piece--that it's--yeah, on the left. It's not just us who found that: others have found that. The extent to which it is a problem varies. And there are some models where it's not as bad as others. And, the ones we've been looking at, it has kind of varied from 10 to 25%. That is quite a big deal. So, in terms of getting artificial intelligence--large language models--to support with human judgments, that's an issue. But, what I'm going to say is there are ways we can mitigate these issues. And, what I want to talk about is where we are at the minute with hallucinations and with these kinds of errors is that there are a lot of people who are just like, 'Look, hallucination will be fixed, they'll be solved, don't worry about them, just crack on as if they didn't exist.' We actually don't think that, I think they're a bit more of a feature than a bug. But, I think there are other ways that LLMs can and are improving, particularly to do with cost, that allow you to come up with some workarounds for some of these problems. And, if you can--the big buzz phrase at the minute, 'the human in the loop,'--if you can use a human in the right way in your process to correct and address some of these errors, LLMs do have a role to play. So, the place we are at the minute is we're not ignoring hallucinations. We don't think they're going to be solved anytime soon. We're trying to find ways that we can plug them into a system and mitigate these problems. |
52:26 | Russ Roberts: So, let's talk about the human in the loop. And, in particular, we're--I want to bring it back to our earlier conversation about written feedback versus actionable follow-up steps. So, talk about where you are: how you are using humans with AI to improve and scale as part of the biggest problem, feedback. Because grading and feedback are both incredibly time-intensive. And, the reason that the AI is so deeply appealing and the reason people want to believe that the hallucinations will get fixed is that it takes a tedious and frustrating and incredibly time-intensive process called grading or marking, and then feedback, and it solves it really cheaply, but if it does it poorly, that's not a solution. So, talk about how you're integrating humans and AI. Daisy Christodoulou: Absolutely. So, in terms of that specific grading issue, what we can do--and this is where the recent cost reductions matter. So, as I say, I'm not so optimistic that these elements are going to start to completely eliminate hallucinations at anytime soon. I am more optimistic that the costs will keep coming down. And so, obviously DeepSeek--this sort of new cheaper Chinese model--has already sparked a bit of a price war. And, that makes a difference, because it means you can start to do some interesting things. So, one thing that was probably a little bit prohibitive before these cost reductions is--what we can do now is we can send every piece off to be comparatively judged twice, each way around. And then we can identify the ones where it's disagreeing with itself. And then, once you've identified those, what we've also done is some analysis--this was really important--is: What kinds of pieces is it disagreeing on? So, is it disagreeing on pieces where you just look at it and go, 'Well, I have no idea that those two pieces--it's obvious, one is so much better than the other?' Or, 'Is it pieces that are actually quite close in standard?' And, currently, what we seem to be thinking it is pieces that are a little bit closer. So, that's a bit more reassuring. And then, what we can do is we can say, 'Well, if the 75-to-90% is getting right, that's okay,' that we can then get the humans to mop up the remaining ones. So, we can direct the human attention that doesn't have that position bias on those pieces, which the LLM is finding trickier. There are other potential solutions, again, some of which would be more facilitated by having cheaper LLM requests. So, there are all of a sudden things you can start to do. What you can also start to do is we've built a very powerful statistical model, as I say, based on assessing over 2 million pieces of writing over the last seven or eight years. And, you can start to--it's not about training a model necessarily, but actually starting to use that statistical model in ways to identify the flaws and the failings of the LLM, correct them, and then use the human judgment very strategically in a targeted fashion to address those weaknesses. So that's where we are with grading. And, I would say as well, this is all a work in progress. Like, you talked to me, we've got a big trial project running at the minute that will report at the end of this month and we will know a lot more then about what is and is not possible. But, that's where we are at the moment, saying we can use the human judgment in this--as I say, more strategic, more targeted way to address some of the weaknesses of the LLM. And, that's specifically on the grading issue. |
56:02 | Russ Roberts: But, you also have something creative on the feedback, which was the--making an audio recording. Talk about that. Daisy Christodoulou: Absolutely. So, on the feedback side of things, so yeah, talked about the grading and how we're using the human in the loop there. On the feedback issue, again, we've been struggling with this for a couple of years. So, one model we started out with is: Let's just do direct AI feedback. And, a lot of people are looking at this, too. So, you give the essay to the LLM and you say, 'Give me some improvements for the student: give me some feedback, give me that typical written paragraph.' And, right from day one of a lot of these LLMs, they would do this in a way that was superficially incredibly impressive. They would turn out a beautifully polished paragraph that it would have taken a teacher hours to do that for everyone in their class. So, again, your initial thing is, 'Wow, this is amazing. If it can work, the amount of time this is going to save.' But then when you start to dig deeper, you hit problems. So, the first problem is, it is very good at producing something that is relatively superficial and generic. And, once you start to push it and ask it to be more specific and do the things I've talked about before about we have to be constructing some specific action steps and a model of progression, a recipe for feedback--actual things you need the student to go away and do to get better--that's when it starts to break down and that's when the errors start to creep in. And, those errors are a big problem because if you're just going to--again, automation, you're just going to push it back to students--if those feedback items and actions have mistakes in them, that is incredibly, incredibly confusing for students. Okay? So, our initial thought was, 'Well what we'll do is we'll get the artificial intelligence--,' and this was all kind of about two years ago, 'We'll get the LLM to provide the written feedback. Before it goes to the student, the human in the loop will be the teacher who will read the student's essay, check the LLM feedback and make any edits.' So, we built that editable interface. The problem with this is that all of the teachers, universally, came back to us and said, 'The time it takes for me to read the original essay, read what the AI has written, decide if it's good enough or not, and make the changes, I may as well have marked it all myself.' And, that--they are right about this. And, this is a massive problem in the literature of all--even pre-LLM--all automation research. And, there is actually a whole theme in the literature called the Paradox of Automation. And, the Paradox of Automation is that--and you see this with self-driving cars; it's a huge issue for self-driving cars--is that when you have a system that is 90, 95% okay, but there's 5-to-10% of errors, actually asking humans to stay switched on to spot the 5- 10% of errors, they would actually be easier and you would get better outputs overall if they just did the whole thing themselves. And, self-driving cars have this problem in spades: that they can drive quite effectively--depending on the model, depending on where you are in the world--they probably can do 80, 90, maybe 95% of the driving themselves. But--and you see this for Teslas--they will then need the human to be able to react to something that they can't handle. And, what we know from real-life practice and research is that if the human is sitting there texting and not paying attention to the road and they suddenly get a warning to something the AI can't handle, the human does not have enough time to quickly switch on and take over the controls and attune themselves to everything that's going on and start driving again. They just can't do it. It's safer for them to just be driving all the time because they would be more switched on and attuned to their environment. Russ Roberts: Imperfectly, of course: driving, they will have accidents. But it's a different kind of accident. Daisy Christodoulou: Yeah. Yeah. Let me take that point because that is a really crucial point, and this is a really crucial point where we've all had a little bit of a sea change here: is that, when LLMs first came along, and indeed same with self-driving cars, the thing you hear a lot of people say is, 'Yes, the self-driving cars have accidents. Yes, LLMs make mistakes, but humans do, too.' Obviously, humans have car accidents. Obviously, teacher-written feedback is not optimal, it's not perfect. Russ Roberts: Sometimes it's bland and generic. Daisy Christodoulou: Right. And, sometimes it's really delayed. So, I was chatting to a friend's child recently and I said, 'What do you think about feedback in your school?' And, she goes to a very expensive independent school and she said, 'Well, I handed in a piece in November and it's now February and I still haven't got it back.' Okay? So you don't get that with an LLM. Right? Russ Roberts: Yeah. Daisy Christodoulou: So, if you'd asked me three years ago--the big debate that I'm referring to here is expert human judgment versus statistical algorithms. And, there is a big literature on this. And if you had asked me three years ago, I and everyone at my company, we would have come down firmly on the statistical judgment side of things. And, that was, like, almost our calling card. One of my colleagues Dr. Chris Wheadon, he'd written a big paper in specifically the context of exam assessment where he talked about all the issues with human judgment and all of the important and good things about using statistical judgment instead. And, the classic kind of statement of this in the research--Paul Miele[?sp?], who wrote a really big thing about this--about how even a really simple basic algorithm will often outperform a human expert, and that is because the algorithm will be consistent. It will give you the same result every time. That's a major, major advantage of an algorithm over a human. And, people at the time of Miele's[?sp?] research, they hated it, swung. And then, another debate a bit later that's really good on this: Daniel Kahneman versus Gary Klein. And they talk about experts and they talk about experts versus algorithms. And they actually have a very, very good, productive paper where they are both on opposite sides of this divide, but they write a very good paper teasing out where it is they agree, where it is they disagree, which is just fantastic. It's really, really good. And, it's actually--the answer is: it's all to do actually with feedback. It's about what are the kinds of expert fields where you get really good feedback? And, in those fields, the expert will be a true expert who will actually make really good decisions. But, in the kinds of fields where the expert doesn't get great feedback, they will often end up actually sometimes making bad decisions and then you're better off relying on the algorithm. So, feedback is important everywhere. But, to go back to this point: If you'd asked me three years ago, I was on the statistical judgment side of things. And I'd seen so many issues with human judgment, human error, the thing the algorithm will always have its consistency and its speed. And, by implication, it's cheap. Right? And these things will mean it's always got that advantage. I have really changed and nuanced my position on this for a number of reasons. One of them is: LLMs, that they do not have the same consistency as the old-style algorithms. So, LLMs isn't that you crank the handle and they spit out the same thing. Like, in one sense, they are more like humans: so they are inconsistent. But then in another sense, they are not like humans because the type of errors they make are not like human errors. So, even though the overall--so what you ended up in a weird situation is you're ending up where the overall consistency and accuracy of an LLM and a human perhaps in certain cases it is the same. But, the type of errors LLMs make are different. And, again, I gave you that example of the position bias. That's an error the LLM making that the human is not making. And, the same is true again with self-driving cars. They make weird errors that humans don't make. And why this matters is you have a system--the road system, the examine/assessment system--that is built to handle human error. And maybe it doesn't always handle it very well; but it's a system that has safeguards built in to address the typical human error. So, an obvious example with roads is if a human makes a really crazy error on the road, there's often some kind of sanction. Or they might be breathalyzed; they might get banned from driving. What do you do when a self-driving car just does something really weird and unexpected? Like, you can't just ban the car. Do you ban the entire model? Like, what do you do? And, the same is true with exams and assessment. And there are loads of people in exams and assessment trying to wrap their heads around this at the moment. And, it's really difficult, because if you have a human marker in a high stakes exam who makes a mistake, we have a system to handle that and we have a hierarchy and you can send that back for a review to the senior marker. And, I'm not saying those systems are perfect and I know very well that they're not perfect, but they are a system that we have built to capture the kind of errors and manage the kind of errors we have. And, there's also a kind of social ethics: that, if a kid has been given a really rogue wrong grade, the kid and the parent, they want to know that there's some kind of thing that you can't just shrug your shoulder and go, 'Well yeah, the AI--I don't know, it's a black box.' There has to be something built into that. So, these are all the things--I'm not saying I'm giving up on algorithms, but the whole debate is way more complicated and way muddier than it was three years ago. |
1:04:29 | Russ Roberts: But, you're going to--I want you to talk about the way you did figure out a way to use AI with--go ahead. Daisy Christodoulou: Exactly. So, my point was that the very simple, straightforward way of saying, 'Well, we'll just get the teacher--we'll get the human--to look over what the AI said.' No: That is a bad. That is the equivalent of the Tesla ride[?]: Just wake up, just come off your phone, when there's a big truck bearing down on you. Not going to work. What we did instead: We flipped things around. And we said, 'Okay. Whilst the teachers are judging,' so whilst they're making their decisions, 'they will leave an audio comment on the piece of writing.' And, because of the way we set things up, that means every piece will get several audio comments. And then we use the AI in two ways. The first thing to transcribe the audio comments; and the second is to combine all of those audio comments into some written feedback for the students. So, how is that different from what we've done before? Well, the feedback is coming from the human. It is human teacher feedback. It's not direct AI feedback. But, we are using the AI to do things that it is much better at, where it makes fewer mistakes, fewer issues around hallucinations: transcribing and combining. And, what is also really good about this is the time reductions, the cost savings for teachers. What I realized when I started using this is that so much of the time I would spend writing a comment on a student's work, a lot of the time I would spend it trying to nuance it and not be too mean or too blunt to the student. Whereas, what's great about this system is you can be quite blunt with your audio feedback and we've set it up so the AI will soften it. So, I would probably not, on an eight-year-old's piece of work write, 'Terrible handwriting.' But, with this you can say, 'Terrible handwriting,' and four people could say 'Terrible handwriting,' and the AI will nuance it really nicely and it will say, 'You've made a fantastic response to this question, but you do need to work on the legibility of your handwriting because it is making it hard for the teacher to read.' Or something like that. It will make it easier for the student to understand. So, that is a huge time saver. However, does it fall into the pitfalls I was talking about of, I've just been saying all this written feedback's not worth the paper it's written on? I'll address that. I can address that in a moment, too. |
1:06:45 | Russ Roberts: Well, why don't you address it now? Because, I want to shift gears and close with something really different. Daisy Christodoulou: So then: Yes, this written feedback that I'm talking about, you might be thinking, 'Well, I spent all this time saying prose is not optimized for improvement,' and here I am spending all this time essentially getting the students a nice paragraph of prose. Why are we bothering with this? Okay. So actually, what we can also do with all of these audio comments is we can put together a feedback report for the teacher. So, this will be a summary-- Russ Roberts: That's beautiful-- Daisy Christodoulou: of all of the audio comments, of all of the students in your class, and it will categorize them, and it will say: What are the themes that kept coming up again and again? Russ Roberts: Beautiful. Daisy Christodoulou: And we hope actually in time, we will then add to it with then suggested activities, suggested model of progression action step. Russ Roberts: That's really nice. Daisy Christodoulou: And our thinking is that is far more useful--to go back to my original thermostat/thermometer point--that is far more useful in terms of being the thermostat. Because, that is going to give the teacher the insights they need to re-plan and adjust their sequence of teaching to address the issues they've seen and make sure they're not being seen in the next piece of writing. So, actually the way I'm seeing it is that written feedback report is actually the best output of what we're doing. But, if as a by-product that takes no extra time, we can add the paragraph in for the student. The other thing we have also realized is I've been very harsh on written feedback and I've had a go at it, but one of the things we've not considered about the value of it--and this is something that recurs again and again in the human technology debate--is one of the big pushbacks I'll get from teachers who do broadly agree with me on written feedback not being that optimal is they will say, 'But, the kids like it. It motivates the students. They want to feel seen.' And a lot of the teachers I talk to will also say that is why they're actually quite reluctant to move to direct AI feedback. Because, if you are in a situation where the student knows that their work is not being read by a teacher, actually, where is the motivation for them to want to put their heart and soul into it? And what people have loved about the initial prototype for this version--and we have got some incredible feedback on it--is that you can actually say to the student, 'This feedback you're getting, this paragraph, it's from, like, four or five different teachers in the school.' And what we've found as well is the best kind of audio feedback is when the teacher picks out something--a nice word or a phrase--in the student's writing and says, 'I love it when you say X.' And if a couple of teachers do that, that will feature in the feedback the students get. So, our initial findings are that the thing people like the most about it and the thing where I will say written feedback does offer something is the feeling of the work being seen and the student feeling seen and the student feeling motivated to want to do their best because a teacher who they know and respect is reading it and paying it attention. So, we think where we are now, we've alighted on something that kind of is working, but it is early days and we ourselves need to gather more feedback. Russ Roberts: And, of course, one would hope that a student-teacher interaction is more than a paragraph at the bottom of the page. That,-- Daisy Christodoulou: Of course. Of course-- Russ Roberts: after they've received the piece of paper with the reaction, there could be a future conversation or three where it would be awkward if the teacher said, 'Well yeah, I didn't actually read it. The machine read it, but it makes some good points.' So, it keeps the human connection part that's important. Daisy Christodoulou: Yes. Yes. |
1:10:04 | Russ Roberts: I want to close with a radical shift here, and I think about this a lot--you know, I'm president of a college and we're grappling of course with the fact that, as I mentioned earlier, the essence of our academic pedagogical approach is the reading and writing around not just Great Books but Great Books and difficult texts and challenging things. And, we want our students to struggle and to work and to improve. And, suddenly they have access to a tool that you really can't figure out whether they've used it or not, which allows them to take some extraordinary shortcuts. And, I just want to mention: We teach in Hebrew, and I'm sure you could get a nice published paper out of redoing the A/B comparison in Israel where we read right to left because I have a feeling that maybe some of that ordering disagreement is because you read it first. I don't know. And, people-- Daisy Christodoulou: Yeah. I did actually know, I [inaudible 01:11:15] think about that. But it's interesting, it's not present with the humans: it's present with the AI. Russ Roberts: A little bit--no, you said it's a little bit present. Daisy Christodoulou: It's tiny. Tiny. Yeah. Russ Roberts: Because it's sitting right there next to it. It's not like you read it and then put it down; you read another one, you put it down. But, anyway, so we're grappling with this. And, our students are grappling with it. A non-trivial proportion number of students, we hope almost all of them don't want to cheat. And, I don't mean cheat in the grade sense. I mean, they don't want to cut corners. They want to do the work because they, like we, believe that the work is the black box that creates much of the power of the educational transformation that we think happens here. So, that's the frightening part of AI for education: that it'll be very difficult to assess people's actual abilities. Because they won't do the reading: they'll ask the AI to summarize it; they won't do the writing: they'll have the AI write it. And, people have suggested, 'Well, that's why there's going to be a lot more in-person grading, in-person essay writing, so you can't use the AI,' and so on. I don't know how that's going to play out. I think it's going to get really complicated because it's really--I think it's going to have some incredible changes to the educational system. And I want to focus on one of those, which is AI as tutor. And, one of the things I've been impressed with--there's a handful; it's not a long list, but it's long enough to change the world--if you want to understand something or if you want to drill: like you, I believe that facts are really important and it's hard to--when you say students need better vocabulary, I would suggest that--this is just my own personal bias; I have no idea if it's true--the way you get a better vocabulary is to read more and to see words in context. And, it's hard to--some people don't grow by drilling, memorizing words, and repeating them back. They only, maybe--I'm thinking maybe the human brain is better at just seeing them in context. So, let's think about how we might use AI in a perfect world. And, I'll say what I mean by that. In a perfect world, you're not being motivated by grades, marks: you're just trying to educate. So, this could be--you could think about this as a homeschooling world. And, I'm thinking about the following. I have to write an essay. I'm not doing it for a grade, so I don't care that the first draft's not very good. So, you're my human tutor and I go--and you give me an assignment, and I write an essay. And then we sit down and look at it and we realize it's not very good. And, we could talk about--forget the paragraph at the bottom--we could have a conversation about what's imperfect about it. And, if you were a great writer--and you're a very good writer, maybe a great one--and I was a bad writer, I'd love sit at your feet and be tutored by you and walk through in an ongoing dialogue about how I can do that better. It would be constrained by the fact that that's incredibly time-intensive. You don't have time for every one of these curious students who want to improve. But that AI never gets tired. It never gets bored, at least so far. And so, I'm thinking, wouldn't I be able to use that in an extraordinary way? Not in the way you need in your company to assess, say, school performance or curriculum performance, but let's just say the only goal was to help me, the student, be a better writer. Forget about the schools. And so, what I'm going to do is I'm going to write my essay. And, I know it's not very good. I know AI could have done it better. Claude could improve it. So, instead of saying, 'Claude, make this better so I can hand it in and pretend it's mine,' I say to Claude, 'How would you have written this and teach me what I did wrong?' In other words: Don't give me the paragraph at the end, because that's awful. Maybe we'd go paragraph by paragraph. We might also step back and look at the whole essay. I could say, 'Could you give me an outline that would have done a better job than mine?' There's so many ways to use AI now as a tutor to help me learn how to ride the bicycle. Right? I don't want to give you--if the goal was to give you a video of me riding the bicycle, so instead of bothering to learn, I just have AI make a video and send it in as my assignment; and, you go, 'Oh, I guess he did learn.' Instead, I'm going to say, 'AI, help me do this.' But--it won't do very good at riding the bicycle. But, I think it could be very good in helping me to become a better writer and for it to both give me assignments, challenge me for what I think needs to be better about this paragraph, maybe. Having said that, I'm not so impressed with its ability to write coherent essays. But if I'm a bad essay writer, I think there's a lot of room for improvement that AI could tutor me on. What do you think of that? Daisy Christodoulou: Where do I start? This is all really interesting. I could do another do another hour on this. Can I just--I want to say one thing about where you opened that question with, which is a little bit tangential, but I really want to address it because it's very important to me. Russ Roberts: Please. Daisy Christodoulou: That you said, 'In an ideal world, we wouldn't have an assessment. Maybe we'd be homeschooling, and we just want to be learning for the sake of learning without a grade.' This is something I hear again and again. And I want to--I want to push back. And, one of the things I love about your podcast in general is how you really get the price function. And you get that the price function in economics is not this nasty, oppressive, evil thing that is just used to impose scarcity on people. You get that the price function is a marvel. It is the thing that allows us to measure our wants, and direct energy and attention and innovation to where it needs to go. And, my point is: Assessment is the price function of education. You can't just turn around and go, in an ideal world, my ideal education has no assessment. That's like saying, 'My ideal market-based system has no price.' Russ: You Soviet. Russ Roberts: Oh, bruh [bro?]. Oh, I'm so proud of you. It's a great, great critique. You win. I love it. Okay? Yeah, of course I meant there'd be--I didn't-- Daisy Christodoulou: I know-- Russ Roberts: But, that's a great critique-- Daisy Christodoulou: I like to to call myself--I like to call myself a pro-exam romantic. Because I know how exams go wrong. I'm not naive. And, I know how the extrinsic motivation, if you just only have that and you don't have the love of learning, of course I know how that goes wrong. And of course, my dream as a teacher and as a student all the time is to get your students up to that bit where they love the learning. Just as you want to be in a job where you love the job and you're not just doing it for the money. Of course, I get that. But, I also, again--that's why I say pro-exam romantic, okay? That I think all that can happen. And, I think all that is not inimical to exams or prices or salaries. That actually is part of it. And again, I talked about Dylan Wiliam earlier, he has a little phrase where he says, 'Assessment operationalizes the curriculum.' That, you have all these highfalutin' dreams about what you want to do, but it's the assessment where the rubber hits the road and you really kind of say, 'This is what we want.' So, I had to get that out of the way before addressing-- Russ Roberts: And before you do--before you get to the addressing--I agree with you a hundred percent. I didn't mean to imply that in a perfect world we'd all learn for the sake of learning. Because learning is hard work; and mastery is hard work. Daisy Christodoulou: Exactly-- Russ Roberts: And, the assessment-- Daisy Christodoulou: Exactly-- Russ Roberts: should tell you where you need to work harder in some areas and where you can rest on your laurels a little bit. So, a hundred percent. And, the analogy is an A+, and I'm giving you 100. In fact, I'm going to 103. Because the scale doesn't even allow me enough room to fully praise that answer. So, carry on. Daisy Christodoulou: Brilliant, brilliant, brilliant. All right, so then addressing your point about the AI tutor. Now look, we haven't delved into, enormously in this episode, kind of cognitive science and the way we learn and the bottlenecks on our cognition. I alluded to them a little bit when I was talking about my marathon analogy about breaking things down, about how you can't just focus on the end goal if you want to get better at the end goal. Russ Roberts: True. Daisy Christodoulou: I haven't addressed all of that. You, in a previous episode have had Daniel Willingham on. Daniel Willingham is a huge influence on my work. My first book, Seven Myths About Education, is very, very Willingham influenced. And, essentially, when we are thinking about how we learn, we have to take into account a lot of what we know about the science of learning. A lot of the science of learning is providing a rationale for a lot of what I've said about why you have to break things down. And, the reason you have to break things down is we have a very limited working memory, which is a limit to the amount of information we can process in the environment. And, we have this vast long-term memory, which we can use all the information we have stored in long-term memory to address the limitations, the bottleneck of working memory. And so, this is one of the reasons why I think education--this is a solid scientific rationale: it's not just kind of prejudice and traditionalism--for why education should be focused on memorization and remembering things. The concern I've always had with letting kids loose either on an AI, an LLM, or indeed--20 years ago, what was all the rage? They don't need to learn things anymore. You can just Google it-- Russ Roberts: [Inaudible 01:20:43]. Daisy Christodoulou: Yeah. Right? And there's a solid scientific rationale why you can't do that. You can't outsource memory. You have to have the facts in long-term memory. My concern with LLMs at the moment is they are--it's the new: you can just look it up. It's the new--you don't have to learn to write: the LLM will do it for you. You don't have to learn to do this: the LLM will do it for you. And also, that you can just type in a question and get an answer from the LLM. Now, even if the LLM--even if we overcome hallucinations, and as I say, I'm not sanguine that's going to happen in the short term. Right? But even if we can overcome hallucinations, even then, my worry would be that it's almost a form of discovery learning, and novices in particular, are they going to be asking the right questions? Do they have enough background knowledge to interrogate the answer that they're getting? That, all of the problems we had with it where you can just Google it--issues 20 years ago--we're just going to have a game of LLMs; except we've got this other added worse problem, which is--they're not even that reliable. Again, I think we have a slight expert/novice issue. I am just geared because of my job towards thinking about younger, less-expert student. You are geared a bit more towards thinking about older and more-expert students. I would agree, they have more applications for the older, more-expert student, fewer for the younger. I do have concerns about how they're used. I do agree with you. What's the thing? They never get tired. They can always provide another example. For a more-expert, advanced student with the background knowledge to interrogate them and to ask the right question, they can add value. I also worry--and I worry this about everything that LLMs do--is they are so focused on language and they're so good at language. And, look, my day job is writing assessment--language assessments--and my degree is in literature and--but what you realize is: Language is not always optimized for truth and precision. And, I can't remember if I said this in our previous episode, but if you look at what people think what is the evolutionary origin of language--that the evolutionary origin language is to kind of lie and tell nice stories about ourselves. And, the reason I think we kind of as a species invented numbers and mathematics is to get 'round some of the limitations with language. So, again, it comes back to that Polanyi point about prose: that, I worry that you can read a really nice example in an LLM, and maybe it is accurate, and you can think you've understood it; but what you really need is more practice and you need to be having--another insight from cognitive science: You need to be testing yourself. You need to be self-quizzing. You need to be--we know at the moment that when you ask students to revise, their Number One go-to strategy is the least effective strategy: which is they will reread their notes and highlight them. And, I've seen this in action. I've seen students who reread their notes; they'll highlight them. I remember one student: they'd highlighted the whole page apart from one word, 'the.' Well, what good is that? What you need to be doing, all the research says, is testing yourself. Quizzing yourself. So, put the book away, put the notes down. How much can you remember? Flashcards. Fill in the blank. Can you write down this equation? Can you remember exactly what Paul Samuelson talks about when he says, 'stated and revealed preferences'? Can you draw me a graph? Doing all these things, that's what really helps you learn. Now of course, I can see ways in which an LLM--you could train one to understand all that and to prompt you for that. But, I am worried that just letting students loose on them, there's going to be all those issues. So, I'm really wary about younger kids. I'm super-wary, and I hear things that I just feel, 'Oh gosh, this is awful.' I'm maybe less wary about post-graduates. Very good undergraduates. But I still have my concerns. |
1:24:24 | Russ Roberts: Well, I think we'll probably need a human in the loop, the younger you are, for sure. I don't think we'll ever solve that problem perfectly with an LLM that you go lock a child away for eight hours a day, and at the end of the year, they're going to be-- Daisy Christodoulou: Well, the other point I'll say about--and my third book, Teachers vs Tech, I write quite a bit about to what extent education technology can support. And, the whole thing about technological substitution is there are a lot of places where we get it working really well and it's brilliant and it means you get more stuff cheaper and it's fantastic. There are a lot of areas where the human is kind of the thing you want. And, that actually bringing in something that is fungible and changeable--with young children, you have a different teacher every day, that doesn't work. There has to be that stability and that human connection. And, my book, Teachers vs Tech, I look at a lot of the ways technology can help with education. But I think that the huge, huge issue, which--I wrote my book about this just in 2019, and it was published the week before all the COVID lockdown started. But, I would say it's pretty much been kind of vindicated by COVID in that if we really were willing to have kids sit at a screen all day and learn, and if that was an optimal way of learning and they could learn, COVID would have made it happen. And in reality, we had the kind of almost perfect chance for technology to prove itself: that if it was possible for a seven-year-old to learn what they needed at school--not at school, but by sitting in front of a screen--I think someone would have cracked it during COVID. And, the reality is that in most countries, people really wanted the schools to open. And people really thought, 'Hang on a minute: they're not learning what they need to on a screen.' So, I think there's some stuff you can deliver down a wire and on a screen and people are fine with it and you don't lose very much. I think there is other stuff where you do lose stuff. And, I think the education of young students is something where I think we just had the perfect global experiment. If we had the answer, we would have found it then. So, that makes it harder to scale things up. And, that doesn't make me necessarily happy because I'm someone who thinks to increase quality and to get consistent quality, you do need technology and you do need to find a way of doing things at scale. So, I'm not saying that's a great thing, but I'm saying this is not an easy problem to crack. And, a lot of the tools and techniques we've used in other sectors where it is easier to deliver stuff down a wire, don't really read across in the same way. Russ Roberts: My guest today has been Daisy Christodoulou. Daisy, thanks for being part of EconTalk. Daisy Christodoulou: Sure. Cheers, Russ. Thank you. |
READER COMMENTS
Jonathan Andrews
Mar 17 2025 at 2:23pm
This was fascinating and inspiring. I’m a Mathematics and Economics teacher of over thirty years experience who is soon to retire.
The trouble is that nobody knows how to teach; it is an extremely complex process and, while I suspect I don’t make as many errors as years ago, I still don’t know what I’m doing.
However, the one thing that I believe does matter and does make a difference is my interest in my subjects; I keep wanting to understand these things. I doubt I could express why I believe this any better than saying I’d like my students to get as much interest in these subjects than I have.
Listening to this conversation full of curiosity and added to greatly by your laughter at the flaws and errors of your own work. Perhaps this humility and enthusiasm is all we can offer.
Greg Eubank
Mar 17 2025 at 5:39pm
Fascinating discussion. It made me think about something that I sort of find miraculous in how it happens…and that is, how we all learn to speak our native language. Having raised two children, and having now 5 grandchildren, and obviously having learned to speak myself, it seems effortless to me….at 1 year just sounds, at 2+, communicating well, speaking in sentences, whole paragraphs, using contractions, the correct tenses, etc. Feedback required from mom and dad and others to learn to speak properly, aka, not “I do it” but “I will do it, or even “I’ll do it.” Learning to speak basic language seems to happen through listening to mom, dad, siblings, and others just speak….no study required.
Catherine Wright
Mar 18 2025 at 4:06pm
Thank you for such a thoughtful discussion. I clicked on this episode planning to disagree with everything in it. (I prefer AI to be kept as far away from my children’s humane education as possible.) My husband is a high school English teacher. As I wash the dishes and listen to the podcast he is literally writing detailed prose comments on rhetorical analysis essays that his students have just submitted.
I am now 7 years into homeschooling our three sons. Listening to several podcasts about AI in the spring of 2023 spurred me to learn about and follow the teaching methods of Charlotte Mason, and education reformer in Britain died in 1923.
Charlotte Mason said that students should read daily from several well-written books that are written in an engaging narrative style and then the student should “narrate” or retell the reading without any interruption or questioning by the teacher. Students in 1st-12th grade are required to orally narrate every reading, and beginning in 4th grade, they should begin to write their narrations starting with 1 written narration per week and gradually increasing to 1 page-long written narration per day in high school.
Miss Mason did not want the teacher to give ANY feedback to the child except “Thank you for your narration.” or maybe, “oh yes, that part was interesting to me too.” Because, as Russ pointed out, the only “action plan” that a child needs to become a better writer is to read and narrate more…every day…and to read excellent books with challenging vocabulary and syntax that contain engaging ideas about history, geography, nature, world cultures, the arts, human nature, government, economics, natural science, etc. And then to have the opportunity for their mind (a spiritual organ, according to Miss Mason, a “black box” in this episode) to work on the ideas in the reading and to assimilate them into their own person by putting the ideas into their own words orally or on paper.
I understand what Ms. Christodoulou is saying about how the action plan should be different in the way that training for a marathon is different from running a marathon. But are you really going to come up with an action plan that is SO much more beneficial than just read more and narrate/paraphrase more that it is worth bothering with ANY teacher feedback??
I’ve also read about and heard a lot of the arguments about how students don’t just read and then learn to write by osmosis, that they need structured writing instruction. But these theorists and practitioners seem to always be operating with an underlying assumption that children cannot be expected to read the amount of challenging text that would be required in order to assimilate the vocabulary and syntax via reading and narrating. I think this is why they think that students should be taught how to write a sentence and a paragraph. But then you still end up with extremely bad writing. The third graders can write a 5 sentence paragraph, but at what cost? We spend time teaching them how to write a 5 sentence paragraph with the time that they could have been soaking up poetry.
I went to check on my husband, who says he doesn’t even make any comments on language use/syntax because there is nothing the students can do about awkward wording. He gives specific comments so that they can revise and raise their grade on this paper according to the rubric, but it doesn’t translate generally to their writing on the next paper or in the next class. Only greater competence in the English language will help with that. Read more, write more. Student writing will improve, teachers will not have to lose sleep to marking/grading.
Alan Clift
Mar 24 2025 at 9:12am
Is learning without assessment like being rich? To purchase without a concern for price.
Brian S.
Mar 24 2025 at 12:32pm
I haven’t listened to this episode yet, and look forward to it, as I have read Daisy Christodoulou’s articles on education. For example, Myth Four – You can always just look it up., which summarizes a chapter of her book Seven Myths about Education.
Giving credit to EconTalk, I first learned about Christodoulou’s work from Ian Leslie’s terrific book Curious, which I listened to after hearing Leslie discuss the book on EconTalk in 2022.
Thank you, Russ!
Krishnan Chittur
Mar 24 2025 at 4:06pm
“Assessment is the price function of education”
Well put.
How else will the student get a signal of what his or her learning is “worth” or how “valuable” – and use that signal to do something about it – perhaps change subject areas of study where they may become more “valuable” or where their “skills” are in higher demand and so price!
Eugenia Papaioannou
Mar 25 2025 at 6:28am
Good morning, Mr. Roberts,
I have just listened to your “No More Marking” episode with Ms. Daisy Christodoulou and found it exceptionally insightful. The discussion on assessment and feedback, particularly regarding the impact of human versus AI tutors on student progress, was truly eye-opening.
As co-founder of EDWIBO, an academic organisation that has recently launched the AI Assessor (ai.edwibo.eu), I was particularly intrigued by the potential for AI to effectively train the mind. I’m keen to explore Ms. Christodoulou’s, or your, perspective on the crucial elements required for AI to facilitate meaningful cognitive development in students.
Thank you for such a thought-provoking conversation.
Sincerely,
Eugenia Papaioannou
EFL Teacher, Teacher Trainer, Author
linda
Mar 28 2025 at 5:37pm
This was a fantastic episode. I listened to it on the day it came out, while walking to the campus where I have a part-time gig as a writing instructor for graduate students studying law and urban planning. I always learn something by listening to Russ and his guests, but rarely are these lessons quite this timely and germane. Thank you!
Bob
Mar 30 2025 at 6:02pm
On the problem of partial automation, the successful efforts focus on checking whether any important errors are detectable automatically. This is the same whether you are using an LLM or an old schoom mechanical model.
If you have a certainty score, and you can trust the certainty score, then the automation can win, because you just cut a big percentage of the problem space: Some things really don’t need to be checked. I’ve had success with second passes like that. For instance, LLMs are not always great at transcribing audio, but they are very good at figuring out where errors occurred, and can even be prompted to fix the errors! Just don’t ask them to do it all right in the first pass.
As for LLM and writing, I suspect the biggest value comes with interactivity. If the LLM is happy to grade you as often as you want, and offer actionable feedback and clarifications when asked, you are way ahead of a teacher that is, say, leading a class of 15 through 15 different history papers on different topics. There’s no way the teacher can read your sources, check for factual mistakes and do a good writing check in a timely fashion. But the LLM is getting closer every time at doing all of that well enough that it’s helpful, even if it’s not perfect.
Eric Lin
Apr 7 2025 at 9:14pm
Christodoulou suggested that grades function as the price mechanism of education. The idea is compelling at first glance. Like prices in markets, grades – supposedly – help allocate attention, shape incentives, and signal value. But the more I thought about it, the more the analogy began to unravel.
When we talk about prices, we’re referring to a very specific kind of signal. A price emerges from voluntary exchanges in a market and reflects, however imperfectly, something about collective valuation. It communicates scarcity, prioritization, and tradeoffs. Prices work because people can interpret them in context. If apples are more expensive than bananas, I can make a decision about what to buy based on what I need, what I like, and what I can afford.
Grades, though, don’t really operate that way. They’re not set through decentralized negotiation. They don’t emerge from collective judgment. And they don’t carry much embedded information about how much effort something is worth, or whether what’s being asked of the student is truly valuable in a broader sense. A grade is, at most, a signal about how well a student conformed to a particular expectation. But whether that expectation is aligned with long-term understanding, relevance, or personal growth is often an open question.
This is especially true when students are unclear on what’s being asked, or when the effort required to earn a grade is uncertain. The relationship between effort and outcome is often opaque. A student may work hard and receive a low grade, or may exert minimal effort and do quite well. In this context, grades do little to help students decide where to allocate their energy or attention – unless their sole goal is to chase the grade itself.
Of course, we need structure in education. We need feedback. And grades do serve a role. But we need to be clearer about what that role is – and more importantly, what it isn’t.
To that end, I find it more helpful to think of grades – and assessment more broadly – through a different analogy: fitness training.
People who are working to improve their fitness rely on a wide range of metrics: the number of miles they can run, the pace they can hold, the weight they can lift, the number of reps, or how many rounds they can complete in a circuit. These are all measures that people track over time to mark progress. But crucially, they are not necessarily meaningful goal in themselves – they are only in a certain context, and taken all together. A person doesn’t want to run a six-minute mile or deadlift 225 pounds for the sake of those numbers. They want to become faster, stronger, or healthier. The numbers are useful only insofar as they help someone understand their trajectory and their goals.
And people who train seriously understand this. They don’t rely on a single metric. They take multiple forms of feedback and interpret them together. They adjust their routines. They reflect on their performance. They don’t confuse the metric with the outcome they care about.
Here’s another interesting observation – there isn’t really a big market for fake weights. Please who are training are not really interested in kidding themselves into thinking they can lift more than they can. They are not really interested in buying stopwatches that run slower, so they can say they run faster. This is not only because they are intrinsically motivated. It is because fooling yourself doesn’t help you reach your goals. It makes it harder, because you’re not really sure how well you are doing, and where you can improve.
One of these days, lying to yourself will catch up to you. So why bother doing it?
Comments are closed.