Coase, the Rules of the Game, and the Costs of Perfection (with Daisy Christodoulou)
Feb 3 2025

71eHl4TpSjL._SY522_.jpg Surely perfection is better than imperfection. But applying technology to improve decision-making can backfire. Listen as ed-tech innovator Daisy Christodoulou and EconTalk's Russ Roberts talk about the costs of seeking perfection when technology is used to improve refereeing in sports. They also talk about ways to embrace imperfection and how the economist Ronald Coase can help us understand the power of the rules of the game, both in sports and in life.

RELATED EPISODE
Michael Munger on Sports, Norms, Rules, and the Code
Michael Munger of Duke University talks with EconTalk host Russ Roberts about the role of formal rules and informal rules in sports. Many sports restrain violence and retaliation through formal rules while in others, protective equipment is used to reduce...
EXPLORE MORE
Related EPISODE
The Power of Nuance: Lessons for Public Health (with Emily Oster)
Public health officials should tell the truth, even when it's complicated. Even when some people might misunderstand. Otherwise, says economist Emily Oster of Brown University, the public will come to distrust the people we need to trust if we are...
EXPLORE MORE
Explore audio transcript, further reading that will help you delve deeper into this week’s episode, and vigorous conversations in the form of our comments section below.

READER COMMENTS

Shalom Freedman
Feb 3 2025 at 8:28am

The Randolph Nesse quotation made me think about the many medications I take. There is not one that I can think of which does not have its side-effects its downside. Trade-offs are the rule. Yet I do not understand why this situation is absolutely necessary and why there cannot be in the future greater and greater approximations to no side-effects, or some dramatic discovery that puts an end to the trade-off. Perfection then might be thought of as extremely rare but not impossible.

John Notgrass
Feb 7 2025 at 3:55pm

There will still always be tradeoffs, or perhaps we would call them transaction costs. It may be the cost of manufacturing or delivering the amazing new drug. It may lead to competition or even conflict over access. It may make people feel so good that they start taking too much of it, which then causes problems for those people or disrupts the supply for others.

Mark
Feb 3 2025 at 11:14am

Surprisingly excellent conversation! I’m not sure what the #1 episode for 2024 is, but so far this is at the top of my 2025 list.

I think what football fans are going to discover is that AVR will become a part of the game to the point that future generations are going to think of pre-AVR football as an equally unthinkable change. Indeed, this seems to be the nature of rules changes in sports. They’re hotly contested immediately pre- and post-implementation, but once they’re routine they become part of the game. Similar to how, in American football, players try to draw the other team offsides or into a false start. The rule becomes an element of the game used by players and coaches, to the point where elimination of the rule would change the game. Same with the bonus foul shots in basketball that changes the last two minutes of closely contested games.

Meanwhile, I’m intrigued by this measurement system Daisy outlined. I work in clinical research, where we frequently have to give a single up or down grade to continuous phenomena. “Is there cancer on this CT scan?” “How much did this medicine help your back pain?” “How bad is the patient’s rash in these photos?”

We run into the same problems, like where we’re not just concerned with inter-rater variability, but also intra-rater variability. I’ve worked with multiple panels of physicians, where we’re trying to figure out how to make the wording of some grading system exactly right. A frequent frustration in these conversations is that someone will bring up that some language is too precise (or too imprecise) because it will exclude some case or another that, “Everyone would recognize is clearly severe disease.” A relative grading system would solve a lot of these problems.

I’m going to be thinking about ways we might work some kind of relative grading system into our blinded clinical trials. Perhaps it’s not ready for use as a primary endpoint just yet, but it would be an interesting exploratory endpoint.

Mark Sundstrom
Feb 3 2025 at 12:13pm

Like another commenter, this episode is also at the top of my 2025 list.

When I’m asked to rank the EconTalk episodes of the past year, it’s a struggle in part because those early in the year can be hard to remember. Since it’s still near the beginning of the year, I’m going to start now and use relative grading to keep track of my top 5-7 episodes, which should make my choices much easier a year from now.

Steve Bacharach
Feb 3 2025 at 9:01pm

I love your podcast, I love the English Premier League (Go Arsenal!), and I’ve always hated VAR for all the reasons Daisy discussed.

This episode was perfect for me! Thanks!

JT
Feb 3 2025 at 11:28pm

A novel and instructive episode. Thank you! Ordering a copy.

I also love football but have been put off by VAR’s failing to solve many of the in-game competitive issues. A breakdown of these challenges with insights into broader themes made my week.

I wondered during the subjectivity vs. objectivity and comparison vs. parity discussion if purely judged competitions (i.e. figure skating and gymnastics) would surface. After watching a season of the Bad Sport series a couple years ago which examined fixing allegations in figure skating at the 2002 Olympics, I was convinced that the matter couldn’t be settled because no one could be certain what the competitors’ scores “objectively” should have been.

Avram Levitter
Feb 4 2025 at 2:24am

The section of comparative judgement reminds me of a recent video done by the fantastic mathematician Matt Parker on numerical pain scales: each of several participants experienced the sting of four different insects, and were asked after each sting to say how many times worse each sting was than the previous. He found that while everyone was agreeing that the stings were increasing in pain, there was very little agreement in the factor the pain increased by.

At the end of the video, he talked about how while his methodology was flawed, there is a much better methodology using the comparisons of “which between these two insects has the more painful sting” and a much larger group of participants would produce a surprisingly stable scale.

Aidan Twomey
Feb 4 2025 at 3:41am

Absolutely fantastic episode, and I don’t even follow football anymore, I gave up during lockdowns precisely because the love of football had become idolatrous.

Most fundamentally, the problem with VAR is that fans don’t want the correct decision, they want a decision they agree with. And if the decision is backed by technology they get even more annoyed if they don’t agree with it, it begins to feel impersonal, faceless and extremely unfair.

KM
Feb 4 2025 at 8:50am

VAR is indeed less useful for fouls and handballs, but when determining whether the ball crossed the line or a player was offside, it is both accurate and sometimes indispensable. It is a relief that thanks to VAR grave mistakes, like the one in the 1966 FIFA World Cup final between West Germany and England, will not happen again.

Trent
Feb 4 2025 at 3:42pm

Great episode! I’ve opposed referee instant replay from the onset because the rules are not meant to be enforced at the millimeter/millisecond level, nor can you write rules that can be enforced at such a microscopic level. Further, the adage that they’re only going to review objective rather than subjective calls is folly because we’ve found out that all calls are really subjective. I believe both these points complement the guest’s comments in using VAR with the offsides rule.

With respect to the VAR discussion, I don’t think you mentioned that the referee on the pitch is buzzed by the VAR referee whenever the latter thinks the former may have missed something. So there is already a predisposition towards changing the call whenever the referee trots over to the sideline to view the monitor (e.g. “He wouldn’t ask me to look at the replay if I hadn’t missed something”).

A few more reasons I oppose referee instant replay:

* It has introduced a new error component into sports: overturning correct calls. We’ve seen that happen in US football myriad times.

* The guest seemed to imply that it hasn’t slowed down US sports much because our sports already have regular stoppages. But it’s increased the frequency and length of these stoppages, particularly at the end of basketball (especially college) games. I can’t begin to count how many college basketball games I’ve seen in recent years where it takes 15 actual minutes to play the last 2 minutes of game time.

* It has sucked the joy out of parts of the game. The guest spoke about having to wait to celebrate a goal, but it’s worse than that in US football. It’s every touchdown. You can’t have the huge roars/spontaneous celebrations until instant replay has spoken, and by then, it’s never so memorable. And forget about charging the field/court at a college football/basketball game when the clock hits zero….how many times in recent years do officials have to clear out the fans because they claim there’s 0:01 or even 0.7 seconds remaining?

* It opens up the leagues further towards conspiracies that they have their proverbial fingers on the scale. The NFL has a 3 person committee in New York reviewing all plays – Walt Anderson (ex NFL referee), an NFL bureaucrat, and an exPR staffer for the New York Giants. Those are the people who made the controversial 4th down call in the Chiefs-Bills AFC Championship Game, not the officials on that field. For those who believe the NFL favors the Chiefs and wants them in the Super Bowl….well….there’s the way they claim the NFL does it.

* It has changed the sports in unintended ways that fans never asked for. In MLB, fans used to enjoy seeing the likes of Earl Weaver and Billy Martin go head-to-head arguing with umpires. Sometimes the managers did it to fire up their teams, sometimes because they felt the call was indeed wrong, sometimes to try to influence the next call. Now all fans get is a manager waddling meekly out of the dugout to ask for replay.

And so on. For fans, what’s got to be most frustrating is these are changes they never asked for. They’ve never been surveyed…never been focused grouped…just forced to sit through years and years of tweaking a process/system that will never be perfect. And you also have entities like F1 that makes arbitrary “technology assisted” decisions with different drives in the same race (What necessitate a 10-second time penalty vs. a drive-thru penalty vs. a timed stop-and-go penalty? Whatever the particular race stewards decide – typically about 10 laps after an incident happens and they’ve viewed 20 different angles of instant replay).

So why not just let the referees on the field/pitch call the game? Traditionally if an official blew a call, he’d know it from the replays they’d show in the stadium (or maybe another official would tell him), and he’d make it up later on. Announcers even said “well, that looked like a make-up call to me.” Nobody threw a fit (well, except for the likes of Weaver and Martin)…we didn’t waste 10 minutes with referees looking at a peep-show monitor on the side of the field…the game kept moving, and we played on.

Matthew Kelly
Feb 6 2025 at 4:01am

I have a tangential interest in comparative judgement, I’m a season ticket holder at an EPL club and a long-time listener to Econtalk. This was clearly going to be an interesting episode, and it exceeded my expectations.

And then I look at Daisy’s book and find that its foreword is by someone who regularly appears on one of my other favourite podcasts, and one of the quotes on the back is by a different person who used to co-host one of my other favourite (sadly no longer made) podcasts.

It’s the Venn diagram with the deepest overlap I’ve ever encountered!

Luke J
Feb 7 2025 at 7:55pm

I like the video replay because it enhances the game; to marvel at the players skill and power as they make (or almost make) one amazing play after another.

I suspect video replay is leading to rulebook expansion. The NFL rulebook on Completing a Catch (Rule 8 Section I Articles 3-4), reads simply enough but then you remember that kids don’t need several clauses to know a catch vs non-catch vs catch and fumble. But slowing down the motion in video replay and being able to compare with other plays naturally leads to the need to define the catch criteria for some kind of consistency.

The discussion on idolatry is really interesting too. Why do we care so much about getting the calls right? These are just games, right? Right?

Bob
Feb 9 2025 at 7:03pm

Yuck.

First off, if we compare how many serious controversies we had prior to VAR, vs after, the difference is huge, just on the areas where major, undisputed injustices happened all the time. Go watch the South Korea world cup. The kick in the chest in South Africa, with the bottom of the shoe? We can always complain about anything, but the podcast itself falls straight into the issue of focusing on the continuous when there really is the discrete.

The talk about objective criteria vs comparative criteria was also aggravating. No, it’s not that one thing is easier than the other but equivalent: You are measuring completely different things, ontologically speaking, and it’s not that an aggregate ob subjective views ever gets you an objective anything, especially when grading papers: If you read the paper evaluating this, all you see there is statistical smoke and mirrors, which will agree with some people’s biases, so they’ll swallow it whole.

It’s the typical Russ problem, which he seems unwilling to look at: When agreeing with the guest, the insight disappears, along with any attempt at rigor. And as we all know, he agrees with a very high percentage of guests.

neil21
Feb 10 2025 at 5:12pm

Loved it, shared the episode with my wife, who loved it too. No higher praise.

LEAVE A COMMENT

required
required
required, not displayed
required, not displayed
optional
optional

This site uses Akismet to reduce spam. Learn how your comment data is processed.


DELVE DEEPER

Watch this podcast episode on YouTube:

This week's guest:

This week's focus:

Additional ideas and people mentioned in this podcast episode:

A few more readings and background resources:

A few more EconTalk podcast episodes:

More related EconTalk podcast episodes, by Category:


* As an Amazon Associate, Econlib earns from qualifying purchases.


AUDIO TRANSCRIPT
TimePodcast Episode Highlights
0:37

Intro. [Recording date: January 6, 2025.]

Russ Roberts: Today is January 6th, 2025, and I want to remind readers that voting for your favorite episodes of 2024 closes on February 9th, so please go to econtalk.org to vote.

My guest today is Daisy Christodoulou. She is the director of education at No More Marking. She has a Substack with that same name, No More Marking, and she has written three books on education. Her latest book, which is our topic for today, is I Can't Stop Thinking about VAR. VAR, in this case, is the Video Assistant Referee, a technology used in football--what Americans call soccer--but we'll call football throughout this conversation. And, VAR was added to the game to improve decisions made by referees about fouls, offsides, other matters.

And this book that you've written, Daisy, is on the surface about football, but it's really about much more than that. It's about something much deeper. It's about our endless human desire for perfection, the challenge of designing policy and rules that actually achieve what we want. I love this book. It actually captures what I think of as the essence of the so-called Coase Theorem, which we may get to later. Daisy, welcome to EconTalk.

Daisy Christodoulou: Fantastic. Thanks for having me, Russ.

1:54

Russ Roberts: When was VAR [Video Assistant Referee] introduced, and why has it been controversial? Isn't it a wonderful thing to have more accurate decisions? Isn't that more fair, more just, and leads to more accurate outcomes?

Daisy Christodoulou: Well, definitely, no one would argue with any of those things. We all want more accurate decisions in sport and in life. And, in fact, before technology was introduced to football's decision-making process, the thing you heard people say all the time, is that we just want more right decisions. We just want more right decisions.

And, they would say the livelihoods of players and managers depend on more right decisions. And, if you get a really bad decision at a crucial moment--football is a billion-dollar industry, there's so much money riding on these decisions.

So, VAR, which was the system to get more right decisions, had a bit of a staged introduction. It was first used--it was used in the World Cup in 2018, and then it was introduced in the English Premier League, which is the league I watch. I follow a team. We're currently in the Premier League, and which is the world's most-watched league. It was introduced there in the season 2019-2020, which obviously, the second half of that coincided with COVID, with all the stadiums emptying, everyone watching on TV at home.

And so, kind of in a way, sort of appropriate it was introduced in that season, because it felt we were all watching everything through a screen. There were no crowds in the ground anymore.

And so, it's been around for a while now--been around for quite a few seasons. And it's been enormously controversial, so much more so than I think, certainly, anybody anticipated, and much more so than the similar systems that you have in other sports.

So, American football was one of the first to introduce a system like this. I don't know lots about American football--you'll know a lot more than me--but my understanding is it's been fairly well accepted as part of the game.

And, you've got equivalent systems in cricket and rugby, which work very well. Tennis, you have the Hawk-Eye System, which has worked so well they're actually getting rid of line judges in a lot of tournaments.

But, football just seems to be this outlier. It doesn't seem to be working. And they keep making tweaks to it and it still doesn't work. The tweaks kind of make it worse.

So, it's been this really interesting case study to explore a lot more issues around technology, progress, authority, transparency. It's thrown up all of these issues.

4:14

Russ Roberts: We had a recent episode with Emily Oster on healthcare. We talked about the importance of nuance. And, nuance is about giving people information and recognizing that some things are complicated. And, what I thought after reading your book, is VAR overemphasizes nuance. Using it is to say that 'Well, close enough isn't enough. We've got to do better than that, so let's get it right.' And, why is that so problematic, the desire to get something right?

And, I would just say, by the way, in American football, the introduction of electronic replay--the big issue there is whether something is actually a catch, which is very similar to the issues that come up in European and world football, because we all know what a catch is, and we all know what a goal is. We all know what a handball is. When we see it, we know it. And, yet, once we get down to these details of making sure, absolutely sure, somehow it gets harder, not easier.

Daisy Christodoulou: Absolutely. So, I listened to that episode with Emily Oster. I'm a big Emily Oster fan. And, I think actually, some of the things you said about the challenges of being nuanced in discussing complex topics like public health, I talk about them in a chapter called "Transparency"--maybe we could come back to that.

But, just on that issue, the question you've asked there, about why is it so problematic to want more right decisions? I think first of all, another thing you talk about a lot in this podcast--you know, I'm a fan--is trade-offs. And trade-offs are inherent to thinking as an economist. And, I think for all of that talk about more right decisions--'We want more right decisions. We want more right decisions'--what we realized when VAR came in is actually that is not the sole thing we want to optimize for. That is actually not the only thing we care about. We care about so many other things, some of which we hadn't realized we care about.

So, one of the issues is: some of the checks, some of the time it takes to make these checks, it's a really long time. So, you get these decisions where they're scrutinizing replays for five or six minutes. And, that's a big difference with American football--and cricket--because those are sports that have more natural breaks in play. Football [soccer] is not like that. Football is very fluid, and I don't think any of us pre-VAR appreciated how much the fluidity and the spontaneity of the game really mattered. And, that is actually potentially something we should be thinking about, and we are trading off against that.

And the biggest tool people have is you have these--football is a very passionate game. It's the origins of the working class sport in England. You get these big, big crowds. And, one of the joys of watching football--and I have a season ticket; I know what this is like--is people leap up in joy when a goal is scored. And, you now have the situation where they do that, and suddenly they go, 'Oh, no, no, no, no. We've got to check it.' And, you have this three-, four-, five-minute pause sometimes, and the players are standing around getting cold.

So, that's one thing. There's a trade-off in there.

And then, the other thing is: It's not just the trade-off, but even when you've done these five-, six-minute checks. People are not convinced that the results are the right one. And that's something we could explore a because a bit more as well.

7:28

Russ Roberts: It's kind of shocking. As I said, you'd think--well, you look at it, and of course, in modern technology, of all these different angles, often to evaluate and assess. And, sometimes, when you look from all those different angles, and you take five or six minutes, you're still not sure.

In American football, what happens is that while that five or six minutes is going on--they're talking to an expert--in New York usually--who is a former referee. And he explains what the right decision is. And, a lot of times, it's not what they decide. And it's infuriating to the team, obviously. They've got their good fortune, or good play reversed.

But, as you point out--and I think this is a great insight--there aren't that many goals scored in football [soccer]. So, when one of those is reversed, the dopamine surge that you enjoyed is now shot.

Daisy Christodoulou: Absolutely. So, that is one of the things about football, that it's very hard to score a goal. So, compared to other sports, the goal really matters. And, you get one reversed, it's a really, really big deal. And, yes, this point about scrutiny, and this point about when you apply more scrutiny, you don't always get more clarity: you actually potentially get more confusion.

And, I think you kind of touched on this a little bit in that Emily Oster episode, which I thought was a fascinating discussion, and I actually wrote down one of the things that I think you or Emily said. She said that public health officials, they can erode trust by being unwilling to accept uncertainty. And, I think that's true. But I also think the reverse is true: that you can erode trust by engaging too much with uncertainty.

And, as you say, if you are watching a match, and you see an expert on the TV pronouncing various, 'This is absolutely the right answer, and this is what this decision should be,' and then the expert on the pitch disagrees with them: That kind of uncertainty is really corrosive. That you've got two people in positions of authority who should be agreeing, disagreeing with each other. And, we've seen exactly the same thing with football.

So, one of the TV channels for the tournament in the summer, employed--interestingly it was an American referee--they employed her to discuss the big decisions in the breaks. And, there was one very high profile decision where she completely disagreed with what the on-field referee and the VAR ended up doing. And, this was a very high-profile match that everybody was watching. And it was just incredibly corrosive for the authority of the officials.

So, if you have these situations where--and I had a long chapter where I discussed transparency--if you have situations where the transparency is exposing some genuine flaws in how the system works, this is quite problematic.

And of course, what you hope is that the transparency is shining a light that will lead to improvement. That's the justification for transparency: that we can see the problems and we can solve them.

But, one of the things I discuss in that chapter--and I am in favor of balance for transparency, but I think we have to be honest that unless you can use the transparency to get improvement, potentially your first step is you've made things worse.

10:39

Russ Roberts: Yeah, you're right. I don't want to miss this quote--it's one of the great EconTalk quotes of all time that isn't about EconTalk. You quote Randolph Nesse on the topic of tradeoffs. You say--this is Nesse:

The body is a bundle of tradeoffs. Everything could be better but only at a cost. Your immune system could react more strongly, but at the cost of increased tissue damage. The bones in your wrist could be thick enough that you could safely skateboard without wrist guards, but then your wrist would not rotate, and you could throw a rock only half as far. You could have an eagle's ability to spot a mouse from a mile away, but only at the cost of eliminating color vision and peripheral vision. Your brain could have been bigger, but at the risk of death during birth. Your blood pressure could be lower at the cost of weaker, slower movement. You could be less sensitive to pain at the cost of being injured more often. Your stress system could be less responsive at the cost of coping less well with danger.

That's the end of the quote of Randolph Nesse.

You go on to write:

There are a lot of tradeoffs in other walks of life. Indeed, you can make the argument that all solutions to any problem, technological or otherwise, are not really solutions, but tradeoffs.

And, of course, that's one of our mottoes of this show, 'No solutions, only trade-offs.' Along with 'It's complicated.' And they both apply incredibly well to these issues. You want to add anything about trade-offs?

Daisy Christodoulou: Yeah, absolutely. So, trade-offs are absolutely central to this story. And, as I say, I think what we didn't realize, there were all these things that we did like about football, and we didn't realize that introducing a technological review system that they would be intention with these other things, like simplicity, spontaneity, the flow of the game. I think that's really important.

The other thing I would say about trade-offs is--like, I want to say this--I go on to say this in this chapter that you quoted from, is that what you want to do, and if you're mathematically modeling this, you obviously want to put your parameters into a spreadsheet, you want to draw the curve, you want to optimize. It's based under the curve.

Obviously, there are some parameters we don't really have metrics for, like simplicity.

So, what do you do when you don't have a number for it?

And then, the other thing is, is that you can't keep trading off limitlessly. And often, what you're trying to do is get the best of both worlds. So, I talk about essentially, the conflict, the trade-off at the heart of--a lot of decision-making is consistency versus common sense.

Russ Roberts: Talk about that.

Daisy Christodoulou: That's a fundamental trade-off.

Russ Roberts: Yeah.

Daisy Christodoulou: Yeah. So, we want a kind of common-sensical approach to rules in all walks of life. We don't want absurd outcomes. But, if you do that, you then have to really allow individual decision-makers quite a bit of discretion. And, when you allow discretion, what you get is a lack of consistency. So, because you have individuals operating in different ways, they will be inconsistent. And you have that in football.

And, before VAR, the thing everyone would do, is you could align up lots of videos of handballs, and you could say, 'Well, why was that given as a handball and that wasn't? That was inconsistent.'

And then, once you get inconsistently, you get people have an accusations of bias. And, the same happens in the justice system.

And, the reason why most countries have introduced some kind of sentencing guidelines for judges, is, if you allow judges to completely have discretion over the sentences that they give someone, then often they will respond to the particularities of the case, but you'll often get very, very inconsistent decision-making, and you will see one person getting 10 years in prison for something that someone else gets a slap on the wrist for. People will say, 'How can this be possible?'

So, you have this tension between consistency and common sense. And, as I say, I think you're trading off.

And, because they're not things you can really put a number on, and you can't plug it into a spreadsheet, as I say, and actually literally optimize for their space under the curve--what you are doing instead, is you're kind of trying to do your best.

And, the problem you can do, is you are aiming for the best of both worlds, but the worst outcome is when you get the worst of both worlds. And, I talk about that in the evolutionary biology analogy as well.

I talk about optimizing race horses for speed. So, this is something where race horses has been bred and bred and bred over time to get faster and faster and faster. And, the trade-off--as Randolph Nesse is talking about--the trade-off is their bones get lighter. And then you get to the point [?], as I say, where their bones break and they have zero speed. So, you've ended up with nothing.

And, the thing I say is: 'You've given up everything for nothing.' And, I think that's what a lot of people feel about VAR now. They feel like, 'Well, we were promised better consistency and we haven't ended up with better consistency. We've still got the issue where people can collect videos of different handballs, and say, 'These are still--some of them being given as handballs, some aren't. And, we've lost the common sense and we've lost the speed and the flow of the game.' So, it's like the racehorse with broken bones: we've tried really hard to get the best of both of them. We've ended up with nothing. And, that's the situation I think we're in with VAR.

15:50

Russ Roberts: So I think--for me, this was the most profound part of the book. I like lots of different parts of it, and it resonated with me in different ways. But, there was something in here that I'd never thought of, and it comes to this point of certain things you can't put a number on.

There are also certain things you can't observe, and one of those is human intention. So, you obviously cannot intentionally hit a ball with your hand into the goal, or advance the ball well before the goal is scored, which is a whole other can of worms that that VAR has opened.

But, you're allowed to have, quote, "incidental"--I don't know what the wording is; it actually doesn't even matter. But, if in passing, in unintentional contact, a kick is made that brushes against your arm with zero intent, you don't want to stop play for that. But, of course, we can't observe intent. Not only can we not measure it, we could say it's a zero-one in this case--most of the times in life it's something actually not zero-one; we think of it that way. But, I think the most extraordinary insight that you have, is that in many of these situations we're trying to impose a categorical system--on/off, onside/offside, goal/not goal, touched by your hand/not touched by your hand--using something that is in fact continuous. So, this comes from a essay by Richard Dawkins called "The Tyranny of the Disontinuous Mind." So, give us Dawkins' argument and how you've applied it to football and this issue.

Daisy Christodoulou: Absolutely. So, I think Dawkins makes a really good point, that a lot of the things we think of as being categorical are not categorical: they're continuous. So, he gives a lot of examples, again from evolutionary biology. He talks about quite a weird one, where he talks about if we were to go back to your 200,000,000th ancestor, great-great-grandfather, whatever, it would be a fish. But, there's a smooth unbroken continuity between that fish and you. And that's quite difficult to get your head around. He says there's no sort of discrete break, really, where you can say the fish turns to a human.

The other example he gives, which is a bit more every day, and then has practical applications for everyday life is, and very relevant to the discussion you were having with Emily Oster last week, is safety and risk. So, people, they'll ask scientists, 'Is this safe, yes or no?' And, safety and risk are not discrete categories, they're continuous. And, you saw that really, really so much with, obviously, the COVID vaccines and with COVID in general.

So, what you're doing with a lot of these really difficult situations, is you've got a continuum, and you, in many cases in life, have to draw a line on the continuum, and you have to say, 'Well, we've got to draw the line. The stuff on this side is something, and the stuff on this side is not that something.' And, that line is very often quite arbitrary.

And, the word 'arbitrary'--so, the reason I'm really interested in this side, I work in assessment; and these things are hugely important in assessment, and people spill lots of ink about assessment, and they spill lots of ink about the meaning of the word 'arbitrary.' What does it mean for something to be arbitrary, where you draw that line? And, because in an assessment you have a similar thing, that student attainment is on the continuum, and we often have to draw lines. We draw lines for there to be grades, and we also draw lines to get into certain programs or not.

So, you will say, 'We will draw a line, and if you are this side of the line, you can be eligible for this very elite university or college, or what have you. And, if you are this side of the line, you can't.' And, that line is often quite arbitrary. And, often, the measurement error of the underlying distribution will mean that two things either side of that line, we cannot say with any certainty that they really are. That, as far as we know, they are actually probably very similar.

Russ Roberts: Or reversed--

Daisy Christodoulou: So, this is a huge issue.

Russ Roberts: Or reversed.

Daisy Christodoulou: Or reversed. Reversed. Yeah, yeah.

Russ Roberts: When you rank a little bit higher above the line, and this went a little below, it's actually--there's error, and so--

Daisy Christodoulou: Yeah. So, Dawkins is completely right to talk about the tyranny of the discontinuous mind: that we do view a lot of things that are continuous as categorical.

But, I also say that whilst Dawkins is completely right about that, the opposite--a mirror image--cognitive distortion, is the anarchy of the continuous mind. Which is where you think, 'Well, because everything is continuous, and because everything is on a continuous scale, well, then there is no difference between one end and the other end.' And, it's like, 'Well, no: there is.' There is still a difference between a fish and a human. There are some things which really are quite safe, and some things which are not.

So, the real burden of authority is that sometimes you have to draw a line, and you have to hold the line. And, that is hard, because there will be moments where you have to hold that line and you will probably know deep down, if you're being honest with you yourself, that that line has an awful lot of uncertainty. And, that's true in so many walks of life.

21:13

Russ Roberts: Yeah. And, economists are very sensitive to this, in a way that I think non-economists aren't. The safe thing: 'What do you mean there's continuous? It's either safe or it's not.' And, that's a desire. We wish that was true, but it isn't true. Everything that is, quote, "safe" has some downsides, and everything that is--it's a longing, not a fact.

But, I think the other deep point here--I think there's a couple of things I wanted to add. Your point is that the line is arbitrary. Coase's insight, and I'm not going to go into why this comes from Coase [Ron Coase]. People--we'll link to some other episodes we've done maybe that talk about this. But, Coase's insight is once you draw that arbitrary line, you're going to change incentives. So, if you draw it way over here or way over here, you're going to get different behaviors by the different participants.

And, that will have implications for the quality of the football that you watch. It may be more just, but it means that the game might be less entertaining.

And so, that's the fundamental tradeoff I've already talked about. But the other insight--and this, I thought, was also incredibly deep--that you bring, is that: where you draw the line fundamentally has to be a question of language, not measurement. And so, what you're talking about--say, a handball--it might involve intention, or the phrase 'Clear and obvious,' or something about 'Made a difference in the play.' And, once you do that, you're un-moored. You don't want to be; in fact, that's why you put the language in there, is to exactly get the right thing. But, you're fundamentally taking a continuous variable and using a non-measurable metric to decide where the line is drawn. And, by definition, it's going to lead to discontent.

Daisy Christodoulou: Absolutely. So many really important things there. I think the point from Coase, about where you draw the line will start to then affect the underlying reality--in extremely competitive environments, completely. So, where people are targeting the line, absolutely, it will. And obviously, that happens in football, because that's literally two teams competing.

But again, where I saw that happening in my world was in assessment, because it's very important for students to get certain grades, and it's very important for schools who are being judged for accountability that their students get certain grades.

So, the reason I got interested in this line on a distribution point, is because there's a number of metrics in the English system, accountability metrics, where schools are judged by the number of students who get a certain grade. And so, the way this transmitted itself, is you would just have lots of schools running really intensive revision and preparation sessions for students just the other side of that line.

Russ Roberts: Yeah.

Daisy Christodoulou: And forgetting about everybody else. Right? And, if you did that effectively, you could really boost your position in the league tables.

So absolutely: where you draw those lines, it may be arbitrary, but it'll have really, really big real world consequences. Yeah. So, you see that as well; so I think, yes, I would agree with that.

I think the point you're saying about heights, that's something measurable. I think the thing I say in the book is: lots of things are measurable. And, one of the things about sort of measurement theory, and I think Lord Kelvin says this in some way, that: even if you can just improve your measurement a little bit, that's still an improvement. That's still giving you a better grasp of reality. So, trying to put a number on something can be quite valuable.

But, with the things like height and weight, mass, with these measurements of some of the physical world--temperature is a really good one--we have centuries of really incredible science behind them that have led us to--there is still measurement error, and there is still uncertainty of these phenomena, but we have really reduced that measurement uncertainty down to something that in everyday life really doesn't matter. And there are all these other metrics we have--new metrics--where I think they're better than nothing, but they have nothing like the precision of the physical phenomena that we've become accustomed to.

And, a lot of these are the inventions of economists. And this is why sometimes people love and hate economics.

So, I give a couple of examples in the book. There's one that's used to often decide whether new drugs are worthwhile--so, it's quality-adjusted life years.

There's another one--we talked about risks and safety--the micro-mort, which is the chance of something leading to a one in a million chance of death. So, you can measure riding on a motorbike versus riding on a car versus riding an airplane: what's the micro-mort of each? And, I think these are really useful, and I quite like them, but they're not as precise as a lot of other measurements of the physical world that we're more used to.

Russ Roberts: I'm sure I've quoted this on the program before, and I think it's in my book, Wild Problems, but it's carved into stone, or it was when I was there at the University of Chicago, this quote from Lord Kelvin. It's not quite the way he actually said it, but this is the way it gets repeated: 'If you cannot measure it, your knowledge is of a meager and unsatisfactory kind.'

According--I forget where this story comes from--but supposedly, George Stigler was giving a tour of campus to Vernon Smith. Both men would later win Nobel Prizes in economics. And, Stigler took Smith to this quote, and he read it, and he said--he read the quote--which is again, 'When you cannot measure, your knowledge is of a meager and unsatisfactory kind.' And, Stigler allegedly said, 'And, when you can measure, it's of meager and unsatisfactory kind.'

Daisy Christodoulou: Brilliant.

Russ Roberts: That's very Stiglerian, always a very funny man. But, it's of course, very deep as well.

27:00

Russ Roberts: So, this whole question of precision--this human desire for precision--is very normal, but I think a lot of times, it gets applied to areas where it does not fit.

Daisy Christodoulou: Yeah. And, I think--that's a great quote. That was the quote, that was the Kelvin quotation I was thinking of, but I hadn't heard that follow-up, which is good as well.

Something which isn't in the book, but which I've been sort of speculating on writing about since then, is: Yes, this desire for precision and this desire for accuracy--I think there's a couple of things going on. So, I think partly it's the incredible success of precision measurement in the physical world.

Russ Roberts: Yeah.

Daisy Christodoulou: And, I think we should just dwell on that for a second, because it is crazy, the levels of precision. And, it's got to the point where we take it for granted. The technology that we use in everyday life is dependent on a level of precision that is insane. And, the methods that are used in the really high-tech fabrication plants that they have for microchips--the level of precision they have to achieve and the extent to which they have to go to achieve that, are, in some ways, just beyond belief. And we take that for granted.

And, I think there is an element here, of we take it for granted and we kind of assume then that you can get that in every walk of life, or that it should be simple to get that. And, I think what we forget is firstly, how unbelievably brilliant that is. And, secondly, just all of the work that went into that.

And so, I quote a little bit in the book some things about the development of temperature--early thermometers and the enormous hard work that went into it. And, a lot of the issues that we see--you were talking about with language--with VAR, that one of the first Frenchmen who was investigating temperature, investigating the boiling point of water, he had all these different words used to define the different stages of water boiling. And, I think one of the things we see in the development of measurement, as I say, in the physical world, is the replacement of words with numbers.

Russ Roberts: Yeah.

Daisy Christodoulou: And, when you talk about words--and I am an English literature graduate--and my day job is: the tool we use, comparative judgment, we use mostly to assess writing. And, the terrible thing is, I've become increasingly dubious about language and the ability of language to give us the kind of precision we crave, for all of the reasons that you've been talking about.

And, I sometimes think--language is not designed to give us this precision. And, in a sense, the history of the invention of number, and the way that number is used, is almost a human invention to give us something that gives us the precision that language does not give us.

And, the person who writes best about the problems of language and the inability of it to give us what we really crave, is Michael Polanyi. And, I quote him in the book; and I know you've talked about him on the podcast quite a bit. And he has this concept of tacit knowledge, which is: we know more than we can tell. There are things that we can do and we have as a skill, but we cannot really explain them in words.

And, he gives examples like learning to ride a bike. You could read a book about riding a bike, you could hear someone explain it, you could explain it to someone else; and that doesn't mean you can ride the bike.

And, that is a very good description of the issues with handball: that it is something that people who watch a lot of football, they all know what a handball is when they see it. My contention is there would actually be a lot of agreement from fans about what a handball is. But attempting to define it in words is incredibly difficult. And so much more difficult than anybody thought.

And, you've got to the situation now, where--before VAR, when the rules just existed, something that a referee would just use to interpret with a bit of common sense--the handball rule was 11 words long. Since VAR, and since we've applied all the scrutiny to it, the handball law is now 11 times as long. Has that 11-times increase led to any more clarity about what a handball is? No, it has not. There are, if anything, just more arguments about it.

And, this is probably a good example as well of another thing you talk about a lot: Hayek's "Use of Knowledge in Society"--the difference between legislation and law.

Russ Roberts: Yeah.

Daisy Christodoulou: And, what has happened in football, is you're now having this very top-down imposition of very wordy rules onto something that I think before, was more of an emergent bottom-up process.

Russ Roberts: Yeah. You talk about the opportunity of players to challenge. One way to solve this problem is to not look at every decision, but to give players, or managers, a chance to challenge a call a limited number of times. That would reduce the number of interruptions.

But, you also see this in playground basketball. Playground basketball doesn't have a referee. But everybody who plays--and it's different where you play--but on a particular court where people play frequently, there's an understanding of what's a foul and what isn't a foul. Even though, of course, you can't write it down. And, it might be much more violent in certain games, in certain courts, than in other locations and other places, but the players enforce it themselves. And, a player who repeatedly invokes a foul for his own advantage is shunned. And, that decision of what is a foul, emerges from the bottom-up of all these countless interactions and the play that they have together.

I want to just say one thing, I don't--

Daisy Christodoulou: Yeah.

Russ Roberts: No, go ahead.

Daisy Christodoulou: You did an episode about 10 years ago, with Michael Munger, on just this, on different sports and how they police themselves, and often, how a lot of that policing is done from the bottom-up. And, I think not just how they self-police, but I even think a lot of the way that referees apply laws, and officials apply laws, there is an element of a bottom-up tacit knowledge about how that was applied--at least that was the case in the pre-VAR era.

And, one of the things I say, is: what you've got with all of the technology in sport, is you have rules that were drafted in--in a lot of the English sports--the 19th century. These rules were drafted in the 19th century, in a completely pre-technological era, really. And, what we are now attempting to do is to graft kind-of 21st century technology onto this system of laws that was not designed for that level of scrutiny. And that's often where you're getting the tension and getting the clash.

Russ Roberts: I'll just say one more thing about tacit knowledge, then I want to segue to something else.

Poetry is an attempt to explain things that we can't say in normal words--in prose. And, in general, you'd think that would be inferior: an obscure set of words that has multiple interpretations, that some people can't even access--that that could more capture these ineffable human experiences like love, regret, sadness, poignance, bittersweet. These are all things that prose struggles with. And, we have really good languages, but they're not enough. And, poetry often gets closer. And sometimes music gets closer without any words, which is maybe ironic, maybe not.

Daisy Christodoulou: No, I think that's very true. And, just in the way, as I said, sort of non-linguistic ways of communicating meanings, music, mathematics. We're a wordy culture; and we are, in lots of ways, a legal culture, and the law is based on words. And, as I say, I'm a literature graduate, and my day job is assessing writing, so I don't want to be too down on this. But, I guess because I run up against it every day, I see the limitations of it as well. I see the strengths; I see the limitations.

I mean, if you want to take, again, an evolutionary psychology perspective, what is the point of language? Is language a truth-seeking missile? Is that its aim? And, there's a lot of people in evolutionary biology who say, 'No, that's not its aim. Its aim is essentially to help you to tell nice stories about yourself,' or 'Its aim is to help you to lie.'

So, language--the vagueness and the imprecision of language--is not a bug, it's a feature. That would be what a lot of people would say. And, maybe I don't want to go that far, but we can't just communicate with a language. It does have its strengths, but it can't do everything. And, I think we're trying to press it into service in places where it's just not equipped to do the job.

35:03

Russ Roberts: I want to try for you to give the flavor--you have a lot of different potential improvements for this current world we're in with VAR and the Premier League, and serious football fans can dig into those and make their own assessment.

But, there's a piece of your suggestion--I would say a piece of your set of suggestions--that's pretty impractical but extremely interesting. And so, I want to digress on it, and I want to apply it alongside your day job. Why don't you talk about comparative judgment and how you use it in assessing writing? And then I want to talk about how one might--even though I think it's a bit far-fetched, but extremely interesting--how you might apply it to, say, enforcing football regulations.

Daisy Christodoulou: Absolutely. So, yeah, comparative judgment is what I do as my day job. So, the organization I work for, No More Marking, we use it to assess writing. So, I'll just say a bit about what it is. So, comparative judgment, it rests on the psychological principle that as human beings, we are not very good at making absolute judgments. We are much better at comparative judgments.

And, I'll give you a simple example. If someone walks into the room you're in at the moment, and I say, 'How tall is that person?' That's an absolute judgment. If two people walk into the room you're in, and I say, 'Who is taller, the person on the left or the person on the right?' that's a comparative judgment. And, I hope you can tell from that very simple example, that the comparative judgment is just much easier. You are always going to get that right. The absolute judgment is much harder.

If you ask 100 people that, they will come up with different numbers. Whereas you ask them, 'Left or right, who is taller?' they'll get it right. They'll agree.

And, what you can do, is you can have lots of people make lots of comparative judgments, and then you can use an algorithm to combine all of those judgments to create a measurement scale. And, the person who first developed this algorithm--the law of comparative judgment and the theory behind it--was an American, Louis Thurstone. And, he developed this back in the 1920s. So this is not a new idea. What we have done--and other people have done, what you can do now--is you can plug that algorithm into a piece of software, and you can instantly crunch all of the decisions. And you can start to do very interesting things.

So, what we do with students' writing is we will put all of the students' writing into our system. So, let's say we will run assessments with maybe 100,000 pieces of student writing. And then what we'll do, is we'll get the teachers--all the teachers [?actually?]--and we'll get them to make lots of decisions. So, not just one or two, but lots and lots of decisions: lots of these paired decisions, these comparative judgments, where they'll be looking at two pieces of student writing, and they will say, 'Which is better, the piece on the left or the piece on the right?'

And, this is where we go back to the issue about language and about what is quality, is that: before comparative judgment came along, what people would be doing, is they would be looking at those essays one by one, using absolute judgment, and they would be using a rubric, a mark scheme. They would be using a prose description of the quality that that piece of writing should embody.

And the problem with that, is all the problems we've talked about, is that the prose rubric--the mark scheme--does not do a good job of capturing what a good piece of writing is. Now, when you do comparative judgment, the teachers can look at the two pieces. There is no rubric. We have one criterion, which is: the better piece of writing. And the teachers can make a professional judgment about what they think the better piece of writing is.

And then--this is the crazy thing, which I never get tired of, even though I've never been doing this for nearly 10 years now--is that when you get people to make these judgments, in what seems this incredibly subjective way, with no click list, with no mark scheme, with no rubric, with just one very open-ended criterion, they have very, very high levels of agreement. And, when you get them to mark those essays the traditional way, with the mark scheme and the rubric, they have much lower levels of agreement.

So, you have this weird paradox, in that what feels like an incredibly subjective method of assessment, the data shows it is actually really quite objective. And, the flip side is true: That when you have this very objective measure--seemingly very objective measure of assessment--which has all these tick lists, and you can say, 'Does it feature this? Does this piece of writing feature that, does it feature this?' But, when you crunch the numbers, people do not agree at all. You're very low level, so it's actually very subjective.

So, we use this at scale, in a number of countries. We use it in a lot of U.S. schools: we work with a number there as well. And it works really well; and teachers like it. And, the way I like to talk about it is it's almost like a machine or a method for capturing tacit knowledge.

Because people always say--they get worried before they've done it--and they go, 'What are people judging on?' You know, 'What are they making their decisions on?'

And, they're making their decisions using tacit knowledge. Just that tacit knowledge of 'I see that piece of a student's writing, and there's just something that's maybe ineffably good about it. And, there's this piece here, which just isn't as good.'

And, they make those decisions and they agree.

And, the other thing it does--and going back to the point about Coase, about incentives, and where you draw the line causing incentives--is the problem, again, with words, is that when you try and define things in words, you will often get these distortions. And, we have a little collection of all the distortions that the traditional rubric, or mark scheme, causes.

And a classic example--let me give you a classic example. In England, we have a mark scheme, which, it's a part of the curriculum, which talks about fronted adverbials as being a kind of marker of sophisticated writing. And, a fronted adverbial is really just--it's when you say 'Suddenly, I woke up.' 'Suddenly' is the fronted adverbial. It's an adverbial at the front of the sentence.

And, this has become something that people have--it's rewarded in the system. And so, you'd think, 'Well, isn't that a good thing?' because it does make their sentences more original, and it does make them sound nicer.

And no, because what happens is people teach to that. And then you have children using fronted adverbials that just don't make sense.

So, my favorite example is: 'Forgettably, he crept through the darkness.' And so, that's the point where you have this incentive. And that student will then get a better mark than a student who has not used a fronted adverbial but has written something really very good, because they've got the tick on the list.

And, comparative judgment eliminates that. And, I've given you that example, but I could multiply examples. There's examples about shifts in informality of register, which lead to students shoehorning in slang wherever they can. There's all these kinds of things.

And, the same thing happened in football--and this is why I got interested in it--because in football, what you see with the handball rule, is you've got to a situation where the ball brushes the hand and it can be given as a penalty. So, you now see defenders, when they are defending, particularly in the penalty area, they will tuck their hands behind their back, which is completely unnatural. If you were running to try and defend the ball, you would have your hands by your side, but they tuck their hands behind their back. And, even sometimes, you suspect that there are players, attackers, who are trying to hit the ball against the defender's hand to win the penalty.

So, this is all very economic, see, because it's all about incentives and how incentives shape behavior. But, what I'm saying about comparative judgment, to go back to that point, is that, as I say, it's a method of capturing the tacit knowledge of what people think is a good piece of writing, and potentially what is or is not a handball.

42:23

Russ Roberts: Before we get to the application of how we might improve football decision-making, I want to talk a little bit more about the no more judgment--I mean, No More Marking and the comparative judgment. You said, as if this was a crazy idea, that some attackers might try to kick the ball into the hand of a defender to draw a penalty. But, of course, attackers fall down all the time in the penalty area, in hopes of having a different kind of violation, a foul, an illegal tackle; and this of course, changed the game. As soon as we said, 'Well, we need to protect attackers, and we need to make sure that tackles are not illegal,' of course that changed the behavior of attackers radically, because the reward is enormous.

So, if you can fool a referee--and this happens in almost every sport--especially in basketball, it's very well-known; it's called diving in football, and I can't even think about basketball anymore.

But, I want to go back to the grading, the assessment of writing, because it's such an interesting thing.

I taught economics for 30 years in the classroom. I graded all my own papers. Occasionally, I would use a TA [teaching assistant], but when I had a midterm or final which were a huge portion of the grade, I felt it was my obligation, even in a class of hundreds of people, to do my own marking. And I of course, understood marking very well, until at the end of a long night, I'd realized I'd read 12 essays--and I'd read, by the way, I'd grade one question at a time, so that I could be consistent. I wouldn't want to just grade a whole set of questions by one student: I'd grade Question 3. So, I'd read 100 examples of Question 3; and by about the 20th, I started thinking, 'Wait a minute, did I give that a four out of five? What did I give that one 10 minutes ago?' And, I'd go back and look at it, and think, 'Wait a minute, that wasn't a four'; and etc., etc., etc.

And eventually, often, I would do exactly what you're talking about. I would line them up by quality. But, when you have 100 or more--even when you have 50--it's extremely difficult to do it. And, it started to really bother me, because students cared a lot, and they of course, felt it was unfair. I'll have a lot more to say about that another time.

But what I want to say is this, just to let you clarify something I think listeners might be unclear about: when you hear 'comparative judgment,' you think, 'Oh, it's all relative'--meaning it's your standing compared to your peers. That's not really the point, because you don't have to give 20% Fs, Ds, Cs, Bs, and As. Everybody could get a B or higher, for example, if they were all really good. But, where you drew the line between an A and a B, you'd have to decide that when you looked at the actual essays. You might choose a rubric of some kind. But this idea of using massive numbers of paired comparisons is a really genius idea, and it's fascinating to me.

Daisy Christodoulou: Thank you. Yeah, we think that, too. So, you're absolutely right, you've hit on the absolutely crucial bit at the end there. And, this is the number one misconception we have about comparative judgment, that people say it's all just rank ordering.

It is so much more sophisticated than that. And, within the literature on assessment, rank ordering is actually a completely different technique. So, you read literature papers on assessment, and comparative judgment is one form technique, and rank ordering is another. So, comparative judgment will give you a rank order, but it gives you so much more than that. And, when you use it at scale, it gives you a tremendous amount more than that, and it will essentially start to give you, if you use big enough numbers, effectively, kind of an absolute standard.

So, let me give you an example of what I mean. We started doing what we're doing in the United Kingdom, in 2017, running these very big--basically crowd-sourced--comparative judgment sessions. And, we would get teachers from schools to upload the writing of all their students, and those same teachers would judge them. And, we would set it up such that all of the pieces of writing from the different schools, they're all linked. So, a typical assessment window for us will have maybe 50,000 pieces of writing in it, and we'll have maybe 5,000, 10,000 teachers judging, right?

And, this is in the chapter where I talk about the wisdom of the crowds. This is very wisdom-of-the-crowds as well, because you're aggregating judgments. And, it's also--you talk, as I say, about Hayek's "Use of knowledge" quite a bit--this is decentralized. Every teacher, it doesn't have to be--the traditional way of assessing writing, you have this very hierarchical system. We have an expert examiner, and then lead moderators. And, our system is very decentralized and quite democratic, like every teacher inputs into it. We do have metrics where we can see that they're taking it seriously, and we can measure their internal reliability. But, we're essentially trying to get lots of different opinions from lots of different people.

And, then, to go back to your point about not being rank ordering, what we do--as I say, we started in 2017--we link all of our assessments over time, okay?

In our first assessment, we were coming up with a distribution of the scores for 50,000 students, maybe. And then, when we ran our second one, we linked it to the first one, so they're on the same scale. And then, we keep linking. So, we've now got this giant scale, that's effectively representing the scores of all the students who have taken part over these last few years; and then you can decide where you want to draw the line, as you said.

So, the first step, is you use comparative judgment to create the distribution. And, once you start doing that, over time, you're getting a really robust distribution. Then what you do, is you decide where you're going to draw that line.

So, if you are someone who has incredibly high standards, and you think, 'Well, I think all these students are not writing as good as an 11-year-old should be writing,' you can set your line to be really high. You can set it at kind of the 95th, 99%, or you can set it at only 1%. You can say, 'I don't think any of them are achieving at the level I would deem proficient.'

And, you can set a bunch of different lines. And, I know in America, a lot of people use beginning, developing, proficient. You can set those lines. Go for it. Or if you're feeling pretty generous, you can turn the slider all the way down.

But then, what you can do, is once you've set that line, comparative judgment is very good at holding that line over time. So, you can then hold that line consistently into the next session, and the session after, and the session after. And, in fact, in England, a lot of the big exam boards here--so the big assessment organizations--they use comparative judgment behind the scenes to guarantee consistency over time, because consistency over time is actually one of the hardest things to achieve in assessment, right?

Russ Roberts: Oh, impossible. Yeah.

Daisy Christodoulou: And, I'll just say as well, all your anecdotes about marking, these are so true. And, again, you get into the literature on this, and go back to that point about 'Does the admission of certainty increase or erode authority?' There are a lot of issues with traditional assessment and traditional marking, which I think a lot of the public are not aware of.

And, that example you gave--one of my favorite studies is quite a small-scale study, is giving people 100 essays to mark one week, and then the next week, give them another 100 to mark. And, what they don't know is six of the essays are the same both times. And, you are laughing, and you are laughing because you know that those six essays did not get the same mark each time.

And so, the point I'm always really keen to point out here: whenever I talk about this, I get people who I can see what they're thinking. They're thinking, 'Look, all the other markers don't know what the standard is, and they might disagree, but I know I am right.' And, what I say is: If the issue is just disagreement between human beings, we could potentially resolve that by having an expert. But, the issue is so much worse than that. The issue is not disagreement between human beings, the issue is disagreement within the same human being, and that is not going to be resolved by hierarchy because that expert will disagree with themselves.

Russ Roberts: Yeah, I mean that's the whole idea that you think to yourself, 'Oh, he or she is a really tough grader. And, that's not fair in the other section: they get--' But it's you, yourself, are a tough grader at the end of the night, or the beginning of the night, depends on what you had for dinner. Oy. It's a really interesting thing.

50:24

Russ Roberts: But, I want you to talk about how you might apply this to football, and the idea that you would show an enormous diverse audience of players, fans, and referees, a set of videos, and let them rank--use that comparative judgment--and create a--it's not a rule book, it's going to be a--how would you describe it? A landscape, something of a continuum of what is a handball?

Daisy Christodoulou: Absolutely. So, yes; and I'm prepared to accept, as you said earlier, this is not necessarily something you're going to do overnight. It is a little bit out there. I get that. But, I think it's worthwhile thinking about potential ways you could use this. Because the other analogy I use is one of the issues with the current iteration of VAR, is that people say, 'Oh, it's using technology in sport. We've got to use technology. We've got to stay up to date.' And, my argument is: it's actually a relatively backward use of technology, because what it involves is essentially using technology to get another human second opinion.

So, my analogy is it's like going to maybe the bank for a loan in the 1980s, and you talk to the bank manager, and then you fill in an application, and the bank manager faxes that application to another branch, and gets the manager at that branch to look at it. That is not the most high tech use of technology. And that is what VAR is.

Obviously, what happens now is you fill it in online and an algorithm makes the decision. And so, what I'm saying is what we need is not a human second review: What we need is some kind of algorithmic decision support.

And so, what I think about is how would you do that for these subjective categories like handball and fouls, where we've realized that they are probably on a distribution and that there is that element of subjectivity in interpreting them, and that attempts to define the category in words has just caused distortions, and has not provided the clarity, and has given us the worst of both worlds.

So, my argument is, can we apply comparative judgment to this?

So, as I say, the first step would be you want to crowdsource it, because the other problem with VAR and with handball, is that it's felt very top-down. It's felt like you've had these rule changes and tweaks that are being imposed by a very small set of officials, on everybody else. And, what you want to do is flip that round and go bottom-up, and that's what comparative judgment is very good at.

So, my idea is you get lots and lots and lots of video clips. Let's just stick to handball for now, right? You get lots and lots and lot of video clips of incidents; and some will be handball and some won't. And, what you get, is you then get people to--when I say people, fans, players, officials, managers--as I say, crowdsource it. Get as many people as possible involved in judging.

So, you see your two clips, and you don't say 'That is a handball, that isn't,' you just say, 'Which is more of a handball? Of these two clips, which is more of a handball?' Okay?

And, everyone is a judge. You get to, hopefully, a reliable situation, a relatively reliable situation, and you get your distribution of your video clips, or incidents. And, at the bottom, this is zero, and at the top, this is 100. So, that's the first step, you've got your distribution.

Your second step is: where do you draw the line? So, again, you could do that in a crowdsourced way: you could get people to vote. So, we do this: in some of our things, we get people to vote. You could do it statistically. You could say: Well, we'll put it up at some percent. We could decide in advance. You could just get the officials to vote on that. But, the point is you then choose your line and you draw your line.

So, Step One was, get the distribution. Step Two, draw the line. If it's this side of the line, it's a handball. If it's this side of the line, it is not a handball.

And then, Step Three is the hardest step: How do you apply that to a live match?

And that's where you would need some form of artificial intelligence [AI]. Artificial intelligence is very good at pattern recognition. So, the theory would be you would get, of the live incident in question, you would get a clip of. You would send that clip to the AI engine, which has all of this distribution within it, and you would say, 'Where does this clip sit? What does it most fit on this distribution? Is it above or is it below?' And, obviously, to begin with, you could have it that the referee has to sign it off, to check it--someone has to check it.

But, I think that, for me--are we going to be using VAR in its current format in 20 years time? I don't think so. I think we're going to be using something that is much more, as I say, a decision support, as opposed to decision review. And, that is the best way forward I can think of, of doing that.

Russ Roberts: So, I love that. I can't put into words how much I like that, even though I'm a bit of a skeptic about AI. Because, what it's saying is--

Daisy Christodoulou: I am a bit of a skeptic, too.

Russ Roberts: Okay.

Daisy Christodoulou: So, this is why I'm suggesting that. But, I am a bit of a skeptic.

Russ Roberts: Heard.

Daisy Christodoulou: I think this is something it would be good at, yeah.

Russ Roberts: Heard. I agree. And, I think what's beautiful about it, is it's saying: 'I can't put into words what a handball is, but I know it when I see it. And, when I don't see it, I know that that's not a handball.' And, you and I might disagree about where that line is, but we know what's more of handball and less of one. And, that's the genius insight here, of taking advantage of comparative judgment.

And then you have an encyclopedia of handballs and not-handballs; or percentage of a handball, 63% of a handball. But, since that doesn't work in real life, you can't say, 'Well, we'll give you 63% of a penalty kick because it took'--or if you score, 'We'll give you a penalty kick, but you only get 0.63 of a goal if you score against the keeper.' Instead, you're saying: 'Let the AI figure out, because it doesn't need words, and it will look at other things, and it will find those patterns.'

Really, it's a genius idea. I don't think it's going to be used, but I love it, and may it prosper.

Daisy Christodoulou: Thank you. Thank you.

56:00

Russ Roberts: I want to talk about liberals, conservatives, and postmodernists, because I think--I see distinctions between people who hold various policy positions--we'll call them liberals, conservatives, and postmodernists for now--in how they parent. So, I think people who have certain political views parent in a certain way, because--well, that's a big topic for another time. But, you apply those political ideological differences to this issue of, I would call it consistency versus common sense, or where do we decide to make the trade-off in calling a foul or not a foul. So, talk about that idea.

Daisy Christodoulou: So, yeah, I have a section on this, the sort of liberal, conservative, and postmodern kind of responses to this. And, I think the issue when VAR first came in, is that it didn't neatly map onto political positions; but essentially, people who were probably liberals, and perhaps a bit more comfortable with progress, were saying, 'Look, this has to happen. This has to happen. It's very backwards to not be using technology in sport.' And, you had conservatives--

Russ Roberts: And, it's correcting an injustice.

Daisy Christodoulou: Right, yeah.

Russ Roberts: You know: we can't let an injustice stand, so anything to--

Daisy Christodoulou: Absolutely, yeah.

So, it was about, I think, liberals being in favor of progress and being in favor of correcting injustices, and maybe being forward-thinking, comfortable with the modern world, that kind of thing. And then, your conservative is saying all the classic things conservatives might often say, which is: Unintended consequences; you don't know how this is going to play out. Change doesn't always make things better; change can make things worse. You need to be aware of that. And, yeah, why does it need to change? If it ain't broke, don't fix it--this kind of thing. And actually, football was relatively late to adopt technology. So, the conservatives did hold out for maybe a bit longer than in some other sports.

And then, I talk about postmodernists. And the postmodern reaction is almost to be skeptical that there is such a thing as justice, and there is such a thing as any of these meta-narratives, and maybe it's just all about the entertainment, and just who cares about that.

So, I talk about these three different approaches, and particularly with the conservative and the liberal, what I say is there are truths. There are absolute truths on both sides.

And, I think the central truth of the liberal in this particular case, which is really hard to avoid, is that change is always inevitably happening. And, I think you talk about this again, in one of your books about Hayek. He's talking about how you can't just--because the whole reason you have to accept that prices change is that things change in the certain supply and demand of goods, and that happens, and you have to have a framework that allows for the price to rise or fall. And actually, that kind of thing, conservatives often don't like prices rising or falling or changing. Hut Hayek would say you have to live with that.

And, in the sense of football, the ultimate thing, which I think did in the end--a lot of conservatives said, 'Look, we've got to go for this'--is that you could keep the technology out of the game officially, for sure; but everybody in the country is watching on TV, and watching slow motion replays, and maybe there's fans on the ground who have a phone and can see the instant replay. And then, you just have the risk that you are bringing the game--you're making the laws a mockery because everyone can see that was a terrible decision.

And, there was a classic example--I don't want to go too much into example--but there was a very, very famous example, a couple of examples in the 2010 World Cup, and they happened to be on the same day, where there were two goals scored, where immediately, seconds later, everybody watching at home knew the wrong decision had been given.

And, it was at that point where everyone said, 'This is just not feasible. You've got children growing up in the world who have never known a world without a million instant replays.' And, they are looking at this game and going, 'Why couldn't you just intervene and turn that decision over?' Right?

And that is true, that's all true, right? The liberals are right on that. And that has applications to a lot of other things.

And then, the central kind of conservative truth is probably the unintended consequences of you could always make things worse. And all change, maybe as well as that all change is a form of loss. Even when the change make things better, there might be--most people may be better, but there might be some people who are losers.

And, the unintended consequences in this case have been enormous. And, the unintended consequences have been all these things about the flow of the game, the spontaneity, the joy of the celebration, the fact that you have actually ended up not getting necessarily any of these right decisions, and you have got fans chanting in the grounds, 'It's not football anymore.'

So, it's--for me, why I got so interested in it, is a classic example, as I say, of the truths of both the kind of progressive, liberal attitude and the sort of conservative wariness of change. So, those are the kind of conservative and liberal positions I sort of stake out on this.

Russ Roberts: Yeah. I think that's fantastic. And, of course, we haven't talked about offsides, because it's complicated for non-football fans. But, offsides is extremely important, because it changes whether the game is more offense-oriented or defense-oriented. And the tenor of the game changes.

And, I think the conservative mindset is about preserving some of the flavor of the game. And, the reason this is going to change--this current world--is that liberals don't like it because it's arbitrary, and conservatives don't like it because it's changed the game in ways they think have made it worse. And so, I can't see it. I don't think the status quo is going to last. I want to close--

Daisy Christodoulou: No, I don't. I don't, yeah.

Russ Roberts: Go ahead.

Daisy Christodoulou: No, I don't think that the status quo is tenable.

And, I think just on that point of offside--again, offsides is the most technical rule in football and the most complicated rule. So, we won't get into the weeds of it. But all I will say is in terms of it proving a conservative case, the unintended consequences of tinkering with the offside rule are enormous. So, if you want a case study of unintended consequences, offside is a great example, because minor tweaks to it have these really big knock-on effects. And, the other unintended consequence of it is everybody thought--and I was one of them--everybody thought that offside was perfect for technological review--

Russ Roberts: Of course--

Daisy Christodoulou: Because it is a binary decision about whether somebody is in front or behind a certain player. So, everybody thought it would work. A lot of people did have wariness about handball and fouls. They did say, 'I'm not sure about this.' Everybody thought offside, it would work brilliantly for.

And, the biggest shock of VAR in the first few months of it working, was it was almost the most controversial decisions were the offside ones. And nobody could get their head around it. Like, 'How can this decision, which is tailor-made for technology?' Because the fundamental thing about offside is it's basically impossible for a human to actually make the judgment accurately, because you're asking them, literally, to look at three places at once.

So, the thing that nobody could believe--and it's your classic kind of conservative unintended consequence--is: How has this turned into the biggest controversy and the biggest mess of all?

And, that, I really struggled with. I had to really stop and think about it, because it was crazy.

1:03:36

Russ Roberts: So, I want to close with your last chapter, which is called "Idolatry," a surprising name for a chapter title in a book about football. But, it's a masterpiece. It's very short, this chapter. I want to start, if you could tell the story--which, this is magnificent--a bit of economics of the student who wrote an essay in your class about it's unfair or unjust that athletes make more than nurses and firefighters.

Daisy Christodoulou: Oh, yeah, this is one of my favorite stories from teaching. So, I had a student--who was great; he was a great kid. And he was a big Charlton Athletic fan. They're not a very big club in the UK; but they're a really nice family club. And, he was a big Charlton Athletic fan, and as he [inaudible 01:04:22] get older, and he--the kind of thing he always used to talk about, and he gave a big speech in class for an assessment about how unfair it was that footballers get paid so much and nurses and firefighters, and those were his two examples, don't get paid anywhere near as much. So, even though he was a big football fan, he was looking at it, and saying, 'This is crazy. You've got nurses and firefighters who are saving people's lives,' they didn't pay very much. And, these footballers on insane sums of money, and they're just kicking a ball around the park.

And, he gave the talk, a really good talk. And, all the kids in the room are nodding along, and a lot of them have moms and dads who are nurses and firefighters, they're like: 'Yeah, this is totally right, this is true.' And, it was really good.

And, the time came for questions and answers, and I said, I'm feeling a bit--maybe a bit grumpy. And I said, 'Look,' I said to him, 'Who is'--I think it was something like this--'Who is the second best goalkeeper in League One?' And, he had an answer. He's a big football fan[?]. 'Who is the third best under-21 in the Premier League?' He thinks about it, he's got an answer. And, I said, 'Who is the best up-and-coming young firefighter in the London South East division? Who is the best nurse the whole of London?'

Russ Roberts: Or the third best?

Daisy Christodoulou: Yeah, right. And, I said to him: 'Oh dear.' I was just trying to get across--I hope I wasn't that sharp. We are the problem. We are the problem. Because we are all obsessed with football, and we should be more interested in firefighting and nursing. And we're not. And, the reason footballers get so much is we're all obsessed with it. And, the extent to which, particularly the English Premier League, people are obsessed with it, is crazy.

And, in my lifetime, it's only got crazier. You have my favorite thing, you have the random president--you know, Hollywood actors--Thai energy drinks, all obsessed with English football. And, none of it makes any sense. But it's the truth. It's the reality.

And I am as guilty as anyone. I can't stand outside this. So, I was the last person really, to be having a go at this kid in this class. And, this is the issue: I've written a whole book about it. I mean, that's crazy. We shouldn't be doing this. We should be putting our time and our effort and our thoughts into other things. But we're not.

We're not. And, that is why the title of the chapter is "Idolatry," because idolatry is when you take things that are maybe okay and good things, but you make them too important. You have the wrong priorities. And, yeah, arguably, the book is an example of that.

1:07:10

Russ Roberts: So, explain what disordered love is.

Daisy Christodoulou: So, this is a concept from St. Augustine, and it's the idea that disordered loves--that's the term a lot of the translations use. And, the idea is basically--today, I think we would say you've got the wrong priorities. You've got the wrong priorities. And, there are things which are good things, but they're not ultimate things--that's the other sort of translation.

So, St. Augustine would say--perhaps you can give an example--the love of food is a good thing. The love of wine is a good thing. If it becomes the ultimate thing, if it becomes the only thing, that's obviously very bad. You have things in the wrong order. And, the love of sport, there are lots of good things about the love of sport; but if it becomes the only thing, the ultimate thing, you have a problem.

And, he calls that, those disordered loves, that's idolatry. When you are making things idols, when you are looking up to things that should not be idols.

Obviously, what St. Augustine thinks is that God--that Jesus Christ--that should be, that is the ultimate thing, and that is the thing you should put at the heart, and everything else should be ordered around that.

And, there's a lot of people who--obviously, we are in a pretty atheist, secular age--who wouldn't agree with that. But, even if you don't agree that Jesus Christ, or God, should be at the center, I think the idea that there is an ordering and a priority we should give to things, is one that we're probably all relatively sympathetic to.

And, I think, obviously--the point the student was making--was should footballers be in a hierarchy above nurses and firefighters? Have we got our priorities right? And, I think it is important to think about that.

And, in that chapter, I go on to talk about 19th century amateurs. So, I don't actually know if you have the same kind of cultural influence in the United States. But, in the United Kingdom, all of our big team sports, the rules were laid down really by a group of quite wealthy, quite privileged amateurs, who didn't believe that you should get paid for playing sport. And, I know this might be a hard one to--again, people outside the U.K. context--to maybe get their heads around. And, I say in the chapter, 'I found it hard to get my head around this as a child, that you could play sport and not get paid to play it.' And, the whole thesis of the 19th century amateurs is that you played sport--and amateur, the root of the word is love--that you played sport for the love of it. And you played it because it built your character, and that it should not take precedence or priority in life. It was something you did to build you up for the more important challenges in life.

And, arguably, that's something we forgot.

Russ Roberts: My guest today has been Daisy Christodoulou. Her book is I Can't Stop Thinking About VAR. Daisy, thanks for being part of EconTalk.

Daisy Christodoulou: Thanks very much, Russ.