Jerry Muller on the Tyranny of Metrics

Home /

Jerry Muller on the Tyranny of Metrics

Apr 16 2018

Tyranny%20of%20Metrics.png Historian and author Jerry Muller of Catholic University talks about his latest book, The Tyranny of Metrics, with EconTalk host Russ Roberts. Muller argues that public policy and management are overly focused on measurable outcomes as a measure of success. This leads to organizations and agencies over-focusing on metrics rather than their broader mission. The conversation includes applications to education, crime, and health care.

LISTEN NOW:

Comment

●

READ TRANSCRIPT

●

DELVE DEEPER

DOWNLOAD

RELATED EPISODE

Nassim Nicholas Taleb on Black Swans

Nassim Taleb talks about the challenges of coping with uncertainty, predicting events, and understanding history. This wide-ranging conversation looks at investment, health, history and other areas where data play a key role. Taleb, the author of Fooled By Randomness and...

EXPLORE MORE

Related EPISODE

John Ioannidis on Statistical Significance, Economics, and Replication

John Ioannidis of Stanford University talks with EconTalk host Russ Roberts about his research on the reliability of published research findings. They discuss Ioannidis's recent study on bias in economics research, meta-analysis, the challenge of small sample analysis, and the...

EXPLORE MORE

Browse our archive of 800+ episodes containing over ten years of podcasts

Explore audio transcript, further reading that will help you delve deeper into this week’s episode, and vigorous conversations in the form of our comments section below.

READER COMMENTS

READ COMMENT POLICY

Allen Hutson

Apr 16 2018 at 11:01am

RR brings up the use of VaR in financial institutions in ~ minute 47 as an example of the tyranny of metrics.

I do not think that there is a better example of the conflation between the tyranny of metrics and the appropriate use of metrics in organizations.

RR’s example – I think – dramatically overestimates the flaws in VaR. Or, potentially, the “flaws” are really just the very worst examples of people intentionally ignoring the definition of a metric and redefining it in their own interest.

I believe that there are completely appropriate applications for metrics such as VaR, and similarly, there are wildly inappropriate applications of the same metric.

In my experience, probabilistic measures like VaR, are extremely useful for illustrating to managers how much more variability there is in the world than they realize. Typically, people underestimate the potential losses on a transaction at the 97th. Then there is another discussion about extraordinary circumstances that move beyond 2 sigmas.

I have enjoyed this discussion a lot, but I think piling on to VaR because it is “imperfect” is precisely the kind of thing that ought to be avoided. In my view, the managers who misinterpret and analysts who fail to correct this misinterpretation are to the ones who support this tyranny of metrics, not the metrics themselves that can be very useful.

Texas Red

Apr 16 2018 at 11:07am

Excellent podcast, Russ! I want to push back a bit. Let’s say that I’m a physician, and I’m being measured on one metric. By golly, I can game that metric! No problem at all. But…if I’m being measured on 10, 20, or 100 different outcomes, the situation changes entirely. Most likely, I’ll just throw up my hands, give up on trying to game the metrics, and just try to be the best doctor that I can be by my own lights.

So, paradoxically, by increasing the number of metrics, I have increased the diagnostic accuracy of the metrics relative to a smaller number of metrics. As the number of metrics runs to infinity, the behavioral distortion approaches zero.

Obviously, the tradeoff is the cost associated with collecting the data, and that will vary from case to case.

Krishnan Chittur

Apr 16 2018 at 11:27am

There is a lot to agree with – about how metrics are used/misused. The problem gets worse after K-12 (IMO) – when institutions are competing for revenues and go “game the metrics”.

The “metrics” about “retention” and “graduation” rates for example are often used as a hammer by administrators who are totally disconnected from “education” (learning, not the earning of a credential). The effects of lowering standards and sending the wrong message to students will not be obvious for several years – and as someone down the road examines how different supposedly “educated” graduates do in the real world, they may be shocked at how little these graduates actually know. Some evidence of the depreciation of the quality of the degree in STEM is already evident – students now feel the need to get an advanced degree to be able to compete. Administrators who are responsible for this degradation in the quality will be long gone when it becomes obvious as to the damage they have inflicted on the education/teaching process.

My slight disagreement is with the use of the word “business” – I have often argued that I wish universities and colleges work like “businesses” – i.e. in a world where there is real competition AND if a “business” were to offer/sell a poor product, they will go OUT OF business. The problem ofcourse is “rent capture” – Public (and even private) colleges/universities are far too powerful in that their failures (yes, even in STEM) will not be evident/obvious till the harm inflicted is serious. I hope we can turn this around, for all of our sakes.

Nonlin_org

Apr 16 2018 at 11:50am

Interesting topic. However, the failures discussed are not due to the “tyranny” of metrics, but to other factors – the Amazon truck will never run around like the laughable socialist UK ambulance.

Education is just as measurable as healthcare and any other business – they’re all complex “arts”, but don’t expect the govt. bureaucracy to do a good job measuring and improving these fields. And yes, you can easily quantify judgement, experience and everything else desired.

Doctors declining tough cases is a market signal – an unsatisfied segment that only top doctors should address – nothing wrong with that. Same with flight duration padding – someone will be more aggressive with their scheduling if it pays. These are all very important market signals – it’s a mistake to advocate silencing them.

Russ, how do you know you’re doing a good job if not measuring? Maybe you should add number/quality of comments to number of downloads, emails, episode searches on the net, pos/neg comments, etc. etc. Sky’s the limit.

Fixing the Education problem in the US is easy:
1. Put children first, not the teacher union. The goal should be knowledge for the children, not welfare for teachers and not employment for the support personnel.
2. Open the system by letting former professionals teach – teaching needs not be a career, but a skill that any decent professional should have already.
3. Import good and cheap teacher from the abroad if needed. They would be delighted to earn US teacher salaries.
4. Measure everything and run the system on the results. Everything gets measured in real life. Every job candidate is tested before getting the job.
5. Allow and encourage competition. Children should know where they stand on a statistical chart. Schools should compete to attract children just as universities compete with each other.
6. Make education the community’s responsibility, not Washington’s or State Capital’s – local decisions mean better accountability.
7. Copy best practices from abroad. Not everything translates well, but certain patterns, such as the number of hours spent in school, are easy to recognize.

Earl Rodd

Apr 16 2018 at 2:05pm

Regarding measurement in education: Education is like some other areas in which the true goals cannot really be measured in the short term. Educators correctly say that education includes many intangibles like critical thinking, ability to adapt to new technologies etc. Even if we could agree on what and how to measure, such a measure would require waiting a decade or more from the time of a change in education methods to be able to make a measurement. Even measuring something as simple as the effectiveness of a method of teaching reading really requires waiting a decade to see its actual effects on ability to read, read critically, comprehend, and read effectively in other subjects. Such long term measurements don’t feed the quick measurement, reward cycle criticized in the podcast.

Isaac Moses

Apr 16 2018 at 2:13pm

I’m not sure about the airline flight example you discussed briefly:

Russ Roberts: Yeah. But your point about ambulances reminds me of something I’ve never seen but I suspect is true, which is that, I’ve noticed that airlines will tell you the scheduled arrival of a flight, and it seems way out of line with how long the flight’s going to take. I assume that’s because they keep track of how many times a flight it–a late flight is 15 minutes or more past expected arrival, and so they just build a cushion in, to reduce their bad performance measurement.

Jerry Muller: That’s exactly the causal chain.

If Prof. Muller has industry knowledge to back his assertion up, fine. Otherwise, I’d suggest that there is value other than for gaming performance numbers in providing an ETA with buffer built in. Travelers making their plans and even internal engines for putting together connected flights can be better-served by an ETA that will be met (at or before) with a certain level of reliability than by one that expresses the minimum flight time but is exceeded frequently.

Of course, the internal software could potentially use some expression of the flight time distribution that’s more sophisticated than a single ETA, but there’s only so complicated airlines can get when communicating to travelers; it’s not like they’re going to express scheduled arrival times as PDFs! Supposing that they can go so far as to publish two indicators – ETA and on-time percentage – which would be more useful to you as a traveler: “3:05, 70% likely to be on-time” or “3:20, 95% likely”?

DWAnderson

Apr 16 2018 at 2:56pm

Russ’s point that the reason for use of metrics in allocating rewards is that we don’t trust managers’ judgment seems right and an important insight. It can be a crude attempt to give people skin in the game (to use Taleb’s phrase) when they don’t otherwise have it, but it carries with it significant inefficiencies of its own.

Ideally you would create that SITG in another way, e.g. an ethos of professionalism, or a bond with those in the foxhole with you in the case of teh military. But that is hard and not easily replicable in all the circumstances we might like.

BTW, Isaac Moses is absolutely correct that the main reason for longer than expected flight times is the need to accommodate connections. (My wife works in management as a major US airline.)

Jack in the Box

Apr 16 2018 at 2:59pm

[Comment removed. Please consult our comment policies and check your email for explanation.–Econlib Ed.]

Andy McGill

Apr 16 2018 at 4:40pm

1. The same people who are going to game the system on metrics are going to game the system no matter how they are evaluated and promoted, so that is no knock on metrics.

2. The fatal flaw with metrics is that it reveals too much that is better left lied about. People can’t handle the truth. Half the school kids are below average. Half the schools are below average. All races would not be equal if everything at school was equal.

3. The only way to close the “gap” between black and white kids is to hold back white kids until they do as poorly as black kids. Nobody talks about the gap between white and Asian kids, do they? The problem is not the “gap”, the problem is poor results in black kids, but that can’t be said in today’s world.

Andrew Bellay

Apr 16 2018 at 7:52pm

I, like Russ, had a hard time believing this ambulance claim. But it appears to be true.

Unfortunately, the original report from the Commission for Health Improvement (CHI) seems to have been taken down once another bureaucracy swallowed the short-lived CHI.

Oddly enough, since the shutdown of the CHI in 2004, there haven’t been very many government generated reports that were critical of the National Health Service system in the UK — weird.

Here’s the citation for the original report in case someone else can find it:
Commission for Health Improvement. 2003. What CHI Has Found In: Ambulance Trusts. London: The Stationery Office (http://www.healthcarecommission.org.uk/NationalFindings/NationalThemedReports/Ambulance/fs/en).

Here’s the archive (HTML only).

But here’s another article that hits on the same themes.

Floozy

Apr 17 2018 at 12:14am

Very interesting listen. It reminded me of a job I had some years back. We got to annual review time and my manager came to me and said, “You are doing a great job but it is not reflected in the metrics. That makes it hard for me to justify a decent raise for you.” I told him that it was his job as a manager to figure out how to do that if I was really doing well since metrics are imperfect. I knew people at the company who were great at working the metrics to appear to doing a lot of work while putting in minimal effort. It was pretty clear then, some 25 years ago, that metrics were information but not the whole story. But in a number of jobs since then I have seen managers who think that metrics are about all they need.

A VP that I worked with at one point used to talk about how metrics were the only way you could manage a large organization. I thought that if the VP had good managers under them, and could trust them to do their jobs, then that should work. Metrics might help validate that, but it would not do the entire job.

This trend sort of appears in other places as well. Zero tolerance polices and mandatory minimum sentences are similar things, where someone decides that we have to automate decisions since judgement might go awry. It is about preventing someone from erring but it creates errors as well and often worse one.

Peter Pitsch

Apr 17 2018 at 6:10am

Today’s discussion reminded me of the old joke about the nail factory in the Soviet Union. First, the commissar tells the factory manager that he will be judged on the number of nails the factory produces. Accordingly, the factory makes lots of small nails. Then the commissar, seeing his mistake, states that performance now will be judged on the weight of the nails produced. The factory produces only big nails. And so on.

The right answer is to produce that mix of nails that consumers value most given the opportunity cost of the inputs required to make them.

This is the consumer sovereignty metric. Given the prices of the inputs and outputs, what do consumers value most? How does an individual consumer weigh all of the relevant quality and price tradeoffs? The result is the emergent order that RR often speaks of.

As Coase recognized (The Theory of the Firm)the firm defines a boundary between the market and authoritarianism. The “firm” in this case includes the public k-12 schools, hospitals as well as business entities discussed in today’s talk. Authoritarianism needs to rely on much more simplistic metrics that very imperfectly measure consumer sovereignty and often lead to the gaming and other problems discussed in today’s topic.

Andy McGill

Apr 17 2018 at 9:45am

If you measure graduation rates, you get great graduation rates, even a DC High School with a perfect graduation rate.

https://www.npr.org/sections/ed/2017/11/28/564054556/what-really-happened-at-the-school-where-every-senior-got-into-college

Half of the graduates missed more than three months of school last year, unexcused. One in five students was absent more than present — missing more than 90 days of school.

According to district policy, if a student misses a class 30 times, he should fail that course. Research shows that missing 10 percent of school, about two days per month, can negatively affect test scores, reduce academic growth and increase the chances a student will drop out.

John Pinkerton

Apr 17 2018 at 7:14pm

“[T]he most dramatic example is: Teaching to the test. . . . Including simply having the students practice taking a test more and more. What could be more pedagogically deadening than that?”

Perhaps no. A study published in Science and reported in the NY Times concluded that taking a test reinforced long term learning better than studying or mapping out the material.

https://www.nytimes.com/2011/01/21/science/21memory.html

Test taking is hard. It truly engages the brain. Students can’t fool themselves. Compare, say, highlighting which can be done in a brainless trance.

Scott Todd

Apr 18 2018 at 12:21pm

Voting is a metric.

“Democracy is the worst form of government, except for all those other forms that have been tried from time to time.” (Winston Churchill House of Commons Nov. 11, 1947)

The fact that metrics have pernicious effects, which they do, does not make them bad. Nor, does it obviously make them worse than the tyranny of experts, which has many horrors of its own.

If this was brought up later in the interview, I apologize. I was not able to make it through this episode, which is far and away the exception rather than the rule for EconTalk.

Jakob Engblom

Apr 18 2018 at 1:54pm

I very much agree with the overall sense of this talk. Pure metrics without sense will invite people to game the system and will drive bad behaviors, as well as making the good employees demoralized.

I was reminded of a book I read in my undergrad days and which made me eternally skeptical of measured incentives: “Öststatsekonomi”, by Stefan Hedlund. In the book, he basically describes the long series of failures in using metrics to replace the market that happened in the Soviet Union and its satellite states during the communist era. People did what you measured, regardless of sense or actual usefulness. I have seen the lessons of that book apply over and over again in corporate life, as employees optimize pay at the cost of corporate well-being as a result of poorly constructed incentive systems.

Dilbert also has some good commentary on this:
http://dilbert.com/strip/1995-11-13

A.G.McDowell

Apr 18 2018 at 2:03pm

1. To continue the discussion about enlarging the number of metrics used, I would like to suggest that the person measuring chose a metric at random at the last minute, and/or make random inspections. This makes it more practical to have a large pool of potential metrics. This should be familiar to all in the case of an examination where the students are not given advance notice of the questions. The fact that journalists are able to grab the front page by essentially acting as “secret shopper” inspectors suggests that this would be an advance over the current situation, as if this was done most newsworthy situations would be detected and fixed by the official inspector before a journalist noticed them.

2. To measure an education system by its ability to reduce gaps might be to use a dysfunctional metric. An advance in education which increased efficiency uniformly, allowing all students to progress at a 10% higher rate, might have the effect of increasing gaps, thus being rejected by such a metric.

Andy McGill

Apr 18 2018 at 8:15pm

Have you ever been on an airline and they hurry to get everyone seated, then pull back the extended doorway, then just sit without moving for a while?

They “departed” on time.

Doug Iliff, MD, FAAFP

Apr 18 2018 at 9:20pm

The medical profession is being inexorably driven to “pay for performance” by the regulators who run Medicare– which, with our aging population and $100,000 per year cancer drugs, pretty much dictates policy for the rest of us.

RR’s comments about the importance of judgment combined with metrics are right on target. Let me give a couple of examples.

Our local Blue Cross organization, by far the dominant insurer in Kansas, has limped its way into P4P over the past few years. I get small bonuses for hitting a target for mammograms; fair enough, because I have a lot of influence over patient behavior. Even better, I got bonuses for meeting the standards of a national organization for Heart/Stroke and diabetes management. Those are really important. But they needed more metrics, and that’s when we drifted off into Never-Never Land.

Take, for instance, the worthy goal of reducing antibiotic usage for predominantly viral illness. So they instituted a metric mandating that no more than 20% of claims with a diagnosis of “bronchitis” be treated with an antibiotic within three days.

Here’s the dilemma. In my office, a patient calling on the phone talks to a registered nurse right off the bat– not a phone tree attached to a voice mail, or a medical assistant incapable of a clinical judgment. That way, all the folks calling with viral symtomatology are screened away from an office visit. This leaves only patients who are potentially suffering from pneumonia or a more serious bronchitis.

Therefore, because of my efficient office, I will see only the sickest patients, and my antibiotic threshold will inevitably exceed the 20% metric.

But wait. There is a solution. Everyone who gets an antibiotic is coded as an “upper respiratory infection,” or “pneumonia,” or “sinusitis.” In this area of the body, the diseases overlap, and the diagnosis is fungible.

Does that mean I’m gaming the metric? You bet it does. Does that mean I’m dishonest? Only if you believe that the letter of the law is more important than the spirit, or that metrics trump judgment and common sense.

Oh, and by the way– they dropped the metric for certification on Heart/Stroke and Diabetes. Not enough participation. Go figure.

SaveyourSelf

Apr 18 2018 at 9:46pm

This episode might be better titled ‘Bias, Hobbling the Scientific Method, and the Market’.

The take home I got from the discussion with Jerry Muller is that when people interfere with other people’s freedom—be it through punishment or reward—the results are alarmingly unscientific.

I’ve been developing a new model to explain what was discussed in this podcast with Jerry Muller. I’ll share it here. I’m sorry that it’s long.

There are two parts to this important story. First is the ‘coordination problem’ in experiments using the scientific method of trial and error for problem solving. Second is the problem of ‘bias’ distorting the outcomes produced by the scientific method, messing with our understanding of the world, and leading to broken models.

The scientific method, as I have come to understand it, is a practice of running multiple prospective independent experiments and comparing their results. There are more rigorous definitions for the scientific method which are highly useful in very simple systems (like a vacuum) but the definition I have given is useful in complex systems, which gives it an advantage over the more rigorous forms of the method, at least in Economics. In any case, the scientific method as I’ve just defined it is a formalized method of trial and error. Understanding trial and error is critical to understanding this model and markets.

Science is hard. Running multiple, separate, independent experiments is difficult for individual humans because a single person’s mind has a really hard time trying to think detached from itself. This is why, according to this model, freedom is so important to the function of markets. Because—in Ideal markets—individuals without coordination will approach identical problems in different ways. That fact means ideal-markets are—literally—machines of trial and error in compliance with the scientific method. They run multiple independent prospective experiments and compare the results of those experiments naturally and as a matter of course.

As it happens, human brains are incapable of considering experiments without bias. So we use statistics [called Metrics in this podcast] to try and make a discipline of eliminating bias when thinking about, running, and comparing experiments. But statistics, even when done well, is a simplification of a complex set of results. It is inherently reductionist. Statistics ignores and destroys information in the effort to make it more ‘manageable’ and ‘understandable’. What gets destroyed and what stays is a judgment call. Judgment calls are bias. In addition and unfortunately, according to Daniel Kahneman in Thinking Fast and Slow, statistical information literally flows in one ear and out the other of smart college students in studies. Our brains can’t remember statistical information, apparently. So instead what we do when contemplating statistical results is convert it into something we can understand—even simpler causal stories also called models. Models are also inherently reductionist. Thus more information is lost. What information is left out and what remains in the model is a judgment call. Thus models are inherently biased and models built from statistics contain bias to the second power.

The light at the end of the tunnel is that statistical studies, their proper design, and proper interpretation are not required for the scientific method to work well in markets. All that is required for markets to work as ideal havens of scientific experimentation is freedom. Well, that’s not entirely true. Markets function ideally when Free [with high amounts of Freedom excluding Justice], Competitive [large numbers of independent participants], Informed [easy access to information], Reliable [enforceable private contracts], and Responsible [Original Property rights belong to Creators]. The requirement of freedom, though, is the key to this argument about the scientific method. People must remain free to act independently of one another for the scientific method of trial and error to function. We need the scientific method to work because the universe is chaotic, the future uncertain. Since the future rarely can be predicted, reason and deduction are usually unreliable tools. That leaves only trial and error for determining what is best most of the time. And, when it comes to trial and error, the greater the number of independent trials, the higher the probability that a solution will be discovered. Freedom makes us smarter!

This model in action – Corn Subsidies.

Farmers, fearing market fluctuations they cannot understand, lobby the government to give them money for what they are already doing—growing crops. The types of crops subsidized have an enormous influence in what crops are grown and later sold. That’s the coordination problem I mentioned earlier. Paying farmers to grow corn results in, not surprisingly, more farmers choosing to grow corn instead of other products which necessarily means fewer farmer trying to grow other types of food. Equally unsurprising, people don’t really want that much corn. So creative problem solvers have to find ways to deal with corn surplus, hence the enormous variety of our food products that have corn as an ingredient. Here are just a few: corn on the cob—with or without leaves, cream of corn, corn flakes, canned corn, corn syrup, corn flour, corn meal, corn bread, high fructose corn syrup, corn starch, corn chips, popped corn, and the nearly endless variety of items with corn or corn derivatives as ingredients including ethanol, toothpaste, yogurt, salad dressing, bubble gum, makeup, milk, shampoo, diapers, cola, glue, perfume, and aspirin.

The other problem caused by the corn subsidies is the bias. The subsidies tell farmers demand for corn is highly prized by market participants even though, in reality, they are drowning in corn. So farmers respond to the signals and provide lots of corn to the market. But they make too much corn! More than people are willing to buy. So the government has to purchase corn and give it away to other countries to keep the price of corn up for farmers. And even then farmers find it affordable to feed subsidized corn to their livestock. In any case, the bias introduced by the subsidy is experienced on the other side of the market as less food variety and products that contain corn ‘appearing’ significantly less expensive than products without corn. So in spite of shoppers desires—whatever they would have been in the absence of the bias—something like every processed product in a supermarket contains something made from corn. That’s a lot of carbohydrates, which is unnatural, and is—likely—one of the main contributors to the modern diabetes epidemic.

That’s right. Diabetes is caused or at a minimum made worse by bias introduced by subsidies.

In summary, subsidies and legislation are simplistic answers to complex problems. When they are applied in complex systems, the function of the scientific method in markets comes to a screeching halt and bias is inserted in its place disguised in the few answers that the market does provide. Violating justice or instituting artificial reward systems inside complex human systems is counterproductive. Worse than that, it is cyclical—the biased information produced by the market which inaccurately describes the real world—is then used to create new causal models which are again used to make additional rules and rewards to solve problems which are then subjected to the original + additional biases, producing even more skew in our understanding of reality.

Freedom from injustice and benevolence—both… surprisingly—make us smarter!

Chase Steffensen

Apr 19 2018 at 1:14pm

So people talk about police juking their stats and that whole problem, and people talk about the great decline in violent crime since the 80s. Could the first explain the second? Or is there reason to believe juking stats can only explain short term declines, or that criminologists are aware of changing definitions and policies and they make attempts to adjust for them?

Floccina

Apr 19 2018 at 4:13pm

I made a bad error typing in my comment. Here is a fixed version:

I think as far as schooling that Muller’s ideas dovetail nicely with Bryan Caplan’s.

Muller’s data says that we should stop overusing metrics and just hire school principals and let the them hire teachers and run the schools, and Caplan’s work shows that if those principals make a few mistakes it probably would NOT matter. The cherry on top is that you would save money, and you could still probably afford to pay teachers more because of how much our schools spend on administration.

[I’ve removed the old version of the comment—Econlib Ed.]

Gary Goubeau

Apr 19 2018 at 5:36pm

I enjoyed this but was disappointed you guys never got to baseball and all of the recent metrification.

Maybe you can do a show focusing on this more important and less depressing topic. (Less depressing if you’re not a hitter.)

Tom G

Apr 19 2018 at 6:25pm

More metrics seems possible.

More AI using metrics, combined with manager / principal judgement should lead to better outcome.

With more transparency over the judgement, so the decision maker knows others are looking at and seeing his decision.

Trent

Apr 20 2018 at 9:14pm

Note to Andy McGill:

[People can’t handle the truth. Half the school kids are below average. Half the schools are below average.]

Not necessarily.

Suppose there are 10 schools in total, and you’re measuring them by some metric of quality that grades on a 100 point scale. Suppose further that 9 schools each grade out at 90, and the 1 remaining school grades out at 80. By definition, 90% of the schools would be above average.

It’s along the same lines of Garrison Keillor’s adage that ‘all children in Lake Woebegone are above average.’

Marilyne Tolle

Apr 24 2018 at 12:02pm

I’m surprised that neither Jerry nor Russ mentioned Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

(Charles Goodhart’s original formulation (1975) was “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.“)

The Wells Fargo account-fraud scandal is a good illustration of this. When the number of customer accounts became a sales target, it stopped being a good measure of the bank’s retail success as it encouraged bank managers to create fake accounts to reach the quotas.

Nick Ronalds

Apr 25 2018 at 8:11am

I’m sorry to say this was one of the most underwhelming Econtalks I can remember. So: all metrics and no judgement turns out to be a flawed model for decisions? I wouldn’t be at all surprised, but so would a model based on personal judgment and no metrics. (I wish I felt more confidence in the judgment of the members and management of the NEA and the rest of the educational establishment.) This is Econtalk. Whatever happened to thinking on the margin? If metrics are overdone, a dollop of sound judgment would doubtless yield big improvements. But if decisions are based on personal judgment alone, adding some sensible metrics would surely be salutary. The guest does acknowledge at various points that metrics can be useful.

And all this insight is worth an entire book?

Second, the evils of “teaching to the test” are an ever recurring and senseless criticism. Anyone who’s studied to master a subject knows that one of the best methods to achieve mastery is to practice problems–a.k.a. tests. If a test tests for the wrong thing, fire whoever designed it and hire someone who can design a test that measures skills that are relevant to the subject.

Kendall

Apr 30 2018 at 10:32pm

Second, the evils of “teaching to the test” are an ever recurring and senseless criticism. Anyone who’s studied to master a subject knows that one of the best methods to achieve mastery is to practice problems–a.k.a. tests. If a test tests for the wrong thing, fire whoever designed it and hire someone who can design a test that measures skills that are relevant to the subject.

Excellent point!

Comments are closed.

DELVE DEEPER

EconTalk Extra, conversation starters for this podcast episode:

Measuring Ourselves to Death by Amy Willis. Apr 20 2018.

This week's guest:

This week's focus:

The Tyranny of Metrics, by Jerry Muller on Amazon.com.

Additional ideas and people mentioned in this podcast episode:

Measuring educational testing, teacher performance
- Bryan Caplan on the Case Against Education. EconTalk. February 2018.
- Robert Lowe, Matthew Arnold, differing views
  - Robert Lowe. Wikipedia.
  - Robert Lowe, Viscount Sherbrooke. Britannica.
  - Matthew Arnold. Wikipedia.
  - Matthew Arnold. Poetry Foundation.
- No Child Left Behind: An Overview, by Alyson Klein. Enacted 2002, replaced by the Elementary and Secondary Education Act, December 2015. Edweek.org.
- Diane Ravitch on Education. EconTalk. April 2010.
- Books by Daniel Koretz on Educational Testing, Amazon.com.
- Eric Hanushek on Teachers. EconTalk. August 2011.
William MacAskill on Effective Altruism and Doing Good Better. EconTalk. September 2015.
Unintended Consequences, by Rob Norton. Concise Encyclopedia of Economics.
CompStat and police department classification of crimes
- "When policing stats do more harm than good: Column," by Joseph L. Giacalone and Alex S. Vitale. USA Today, Feb. 9, 2017.
- "CompStat: Its origins, evolution, and future in law enforcement agencies," by Bureau of Justice Assistance, Police Executive Research Forum, 2013. PDF file.

A few more readings and background resources:

Measuring Ourselves to Death. Feature EconTalk (Extras) 2018. Complementary questions for further thought and discussion on this episode.
Henry Mintzberg.
"Emergency Response Times Longer than Reported". San Diego Union-Tribune. July 23, 2011. Clock started later for ambulance help by 911 center.

A few more EconTalk podcast episodes:

Taleb on Black Swans, Fragility, and Mistakes. EconTalk. May 2010.
Cass Sunstein EconTalk Archive. EconTalk.
David Epstein on the Sports Gene. EconTalk. September 2013.
Leonard Wong on Honesty and Ethics in the Military. EconTalk. April 2015.

AUDIO TRANSCRIPT

Time	Podcast Episode Highlights
0:33	Intro. [Recording date: March 20, 2018.] Russ Roberts: My guest is historian and author Jerry Muller.... His latest book is The Tyranny of Metrics, and that book is the subject of today's episode.... What is the tyranny of metrics? Jerry Muller: The tyranny of metrics is a widespread pattern in contemporary organizational life that runs across everything from business through medicine through policing through higher education, K-12 [Kindergarten-12th Grade] education, and even philanthropy. It's a pattern that I define as follows. It's first, it's based on several beliefs which taken on their own sound plausible; but in combination turn out to be counterproductive. Are often counterproductive. So, the first is the emphasis on standardized measurement: the notion that our judgment is unreliable. Experience and talent don't really matter so much. What really matters is measuring performance. So, the first part of it is the metric part. That is the belief that it's possible and desirable to replace judgment with numerical indicators of comparative performance based on standardized data. And the second related notion is that the best way to motivate people within organizations is by attaching rewards and penalties to their measured performance. So, sometimes those, often those rewards are monetary rewards. And often enough they are reputational. And then the third notion is that, is connected with the idea of transparency and accountability. And that is, that the way to make professional organizations--professionals or government organizations--accountable to the public is by making the standardized measures of their performance public. And, as I say, each of these ideas sounds plausible. You measure reward and punish. You make public. But, they often end up having unintended negative consequences. And that's what the book is about. And that's what I mean by the tyranny of metrics. It's not about the evils of measurement. Measurement is often desirable. It's not about the evils of rewarding people through remuneration in other forms. But, it's about the way in which this use of, as I say, standardized metrics to replace judgment--that's really the key theme. That, you can use standardized metrics to replace judgment and experience and come up with workable organizations. Russ Roberts: Of course, the challenge is that judgment can be capricious, wrong. Unjustified. Involve nepotism, sexism, racism. Jerry Muller: Mmmhmmm. Russ Roberts: So, judgment has a bad name these days. And numbers have a gloss of scientific precision. And that seems like an improvement. Why isn't it? Jerry Muller: Well, because some of those things are both true but not true if you universalize them. That is to say, judgment is under attack, both for being linked to the bias or prejudice as you say. And then, of course, there's the whole field of behavioral psychology, that delights in demonstrating how our biases lead us to mis-estimate numerical values and probabilities. And so on. But, there are--and in that sense, measurement can be useful. Into [?] partially interact or inform judgment. The issue as I see it is that measurement can't replace judgment. That is to say: You need judgment to decide, first of all, what's worth measuring, because often the things that are most easily measured are not the most important ones for the organization. And secondly, you need judgment to decide how to evaluate the relative significance of what gets measured. And thirdly, you need judgment because there are lots of important things in organizations that simply can't be measured in any standardized way. A person with judgment can give them a numerical evaluation--can say that Employee X on Criterion Y rates a 5 out of 5 as opposed to a 3 out of 5. But, that rating needn't necessarily, for some qualities, have to do with qualities that cannot be measured. And that are often, at least as significant as the things that can be measured, in terms of the successful functioning of an organization.
6:08	Russ Roberts: So, we had a recent episode with Bryan Caplan on education. Bryan is very skeptical about whether education--excuse me, formal education, years of schooling, sitting in the classroom--leads to much. And he sees, for example, college education, as mainly a signal of persistence, diligence, and conformity, that students send out to the labor force. Of course, there's some truth to that. One of the most interesting pieces of that conversation with him was where I argued that much of what we learn in college, and in high school and below, is not measurable. But real. How to think and how to imagine and how to explore, and all kinds of intellectual curiosity. And there's some of that in your book. So, I'd like to start with that. Let's talk about education. Where, evidence has been objective measures of success: test scores and other measures are used more and more widely, at least for a while. Give us a little bit of that history and where you stand on this question of education and what can be measured. Jerry Muller: Right. So, the field of K-12 education is one where this metric fixation, this combination of standardized measurement, pay-for-performance, and publication of results in the name of accountability has been most intense. And it was actually partly by following those debates that I got more interested in the kinds of issues that eventually led me to write this book, The Tyranny of Metrics. When No Child Left Behind--there's a much longer backstory to the idea of measuring educational performance and rewarding schools and teachers accordingly. Back in the 1860s in Britain, a Liberal Parliamentarian by the name of Robert Lowe put together essentially a pay-for-performance plan for public schools, where they were going to--inspectors were going to go into the school each year and test the children on how well they did in Reading and Arithmetic. And, the schools were going to be penalized if the students didn't do well enough. And, one of my intellectual heroes, Matthew Arnold, the poet and cultural critic whose day job was as an inspector of schools, said, 'What you're going to do here is you are going to end up--first of all you are going to end up penalizing students and schools in areas where the students are poor and less well-off, because they won't show up for school and they won't be as successful on the tests. And also, you'll narrow the focus of schooling to the kinds of things that are tested by the inspector.' Well, all of this, then, recurred in the course of the late 20th century. And it reached its sort of paradigmatic embodiment in No Child Left Behind, which was passed in the early years of the George W. Bush Administration, with Democratic and Republican support. And at first--and that was based on the idea of testing students in schools. And publicizing the results. And rewarding and punishing schools based on the results, including the possibility of closing down schools. All of that-- Russ Roberts: Can I just--I want to interject one thing. This was a Federal mandate imposed on local school districts, in frustration--possibly: who knows what the real reasons were? But part of the justification was the frustration that these schools districts didn't seem to do a very good job. And they weren't accountable. But, here, we would publish, we would measure and then publish and make accountable--all of which are really attractive goals on paper. Jerry Muller: Exactly. And that's the way it sounded to me, at first. And then I began to encounter young teachers who have told me this was having a demoralizing effect upon them, the fact that they had to narrow and tailor their teaching to the requirements of this test. And then I went to a panel, around 2003, 2004, at the American Enterprise Institute, where one of the panelists was Checker Finn, who was a big advocate of this sort of thing; and the other was Diane Ravitch, who had begun as an advocate of what was then called educational reform, but by then had become very skeptical about it. And as I followed the literature on that--and there's been a lot of literature on it and it's actually increased in recent years--it became clear that this whole educational, what was known as the educational reform movement--I mean it had several branches. Part of it had to do with charter schools and greater school choice and so on, which I think is actually a plausible idea. But a good deal of it was based on this metric fixation. And, the more and more evidence that came out, the more it showed that it actually didn't work. In a couple of senses. First of all, it had absolutely no effect on one of the major motivations behind No Child Left Behind. Which was to close the so-called achievement gap between whites, on the one hand, and black and Hispanic students, on the other. Asian students actually did better on the whole than whites; but then people didn't pay much attention to that. Um, so, it's been going on for, um, well, actually, since--this kind of thing has been going on since about 1992. And it got more intense with No Child Left Behind. It's had absolutely no effect on the achievement gap. What it has had--it's had a tremendous effect on K-12 education in the public school system, because more--especially for schools that deal with lower-performing students, but to some degree well beyond that--it's narrowed the range of subjects that are taught. And it's narrowed the way in which subjects like English are taught, so that they are taught to try to maximize student achievement on the tests, as opposed to being able to being able to write a long-form essay or being able to read a novel or a play, and so on. So, it's--not only has it not had the intended effects of lower thing, of narrowing the achievement gap--but it's had the unintended effects in many ways of making education narrower and less functional. And actually AEI [American Enterprise Institute] had a conference about this a couple of months ago, and that was the upshot of a lot of the papers. As well as a recent book by this fellow, Dan Koretz, from the Harvard Graduate School of Education. So, but I found--but then I found that we can talk about this, in field after field. That, when this was tried, by and large, it hadn't worked. Now, the people who specialize in measuring these things are loath to say that in such stark terms. They say the results were difficult to measure, or there were some minor improvements. Or, there were improvements for one group and one grade, but it didn't last through the end of high school, and so on. So, what's so striking when you read through a lot of this literature on pay-for-performance and standardized measurement combined with pay-for-performance is: How often the scholarly literature shows, in a variety of fields, that it doesn't work. And yet, politicians, policy-makers, they don't seem to get the message. And now, this whole regime of metric fixation is being extended to the realm of higher education, as well. Russ Roberts: And everywhere else, as we'll talk about. Jerry Muller: And everywhere else. Right.
14:22	Russ Roberts: But, what's your thought on this? So that was a great summary. But, what's your thought on this issue of--and it's beyond the scope of the book, but I'm just curious because I noticed in a few places that you do talk about it in passing--that everything is, in principle, measureable. So, one response to that literature would be--it's not mine, but one response could be--'Well, we just didn't do it correctly. We just need to measure it better. We just need better tests. We need tests that are less rote, that are more expansitive[?]' Etc., etc. Respond to that argument, and the general idea that education should be, in theory and in principle, and in reality, testable. Jerry Muller: Right. So, there are--in some respects, testing is genuinely useful. And this is one of the points that I make. When standardized measured of performance are used by practitioners to diagnose what they are doing in their practice, then they can be genuinely useful. So, a teacher can have her students take some standardized test on arithmetic or math or English. And she can see how they are doing: to what degree are they catching on? It doesn't have to be a standardized test from outside, that's imposed from outside the classroom. Of course, it could be one that she creates. And that's a way of keeping track of how the students seem to be doing on that particular slice of the subject. So, in that sense, testing is fine. What, when testing becomes counterproductive and pernicious is when it's connected to reward and punishment. Reward and punishment of the teacher or reward and punishment of the school. That's when it becomes, that's when it becomes problematic. And then, of course, many of the most important things that go on in any institution, including certainly in schools and universities, can't be measured. The degree to which, in school, children are taught to behave, taught to cooperate, taught to be self-controlled. All of those things are difficult to measure; and yet they are by no means the least important thing that goes on in the school. The way in which intellectual curiosity in the variety of subjects is or is not cultivated. That's difficult to measure. And of course, that's one of the problems with the Bryan Caplan book-is that under what, first of all, he has an extremely materialistic and economistic conception of how to measure things. Namely, your salary. As if that's the only thing that counts in life. It is an important thing in life. But, it's not the only thing that counts in life. And then, of course, his belief that only those things that you can measure are real. So, there's all these intangible elements that go on in K-12 education. And go on in college education. You know--when I engage my students in the classes--I did this morning on the family in the market, and I aroused their interest, and they say something; and another student refutes it. Or, I ask something that calls it into question. And they have to think, dialogically and dialectically. All of that develops skills that really matter in life. And in the world. But, they'd be very difficult to measure through some standardized test.
18:36	Russ Roberts: So, I'm very sympathetic to your point. I fundamentally agree with it. But I'm going to challenge it in a different way. Partly the way I think Bryan might. But, I think also in a way that he might not but I will. Which is, that: you know, I agree with you that all those intangibles--those nonmeasurables, those immeasurables, those key parts of education, such as cultivating curiosity, encouraging dialogue, intellectual challenges, internal dialog, skepticism--these are what a great education should involve. Jerry Muller: Right. Russ Roberts: But, you and I also know that, whether you measure them or not, they are really hard to get people to work on. So, I think--you know, one argument would be, 'Well, it's true we can't measure them, but people know--it's not like, well that's what people are spending all their time on. At the expense of more measurable things.' They're not doing a good job, period. There are so many mediocre teachers in schools, K-12 and in college, who teach by rote, who do give, who teach to a test that's not a very good test. Their own, perhaps. That doesn't challenge the students, that doesn't change their focus and the way they perceive the world or their deep sense of knowledge and wisdom. And these imperfect tests at least mean that something is going to get done in the classroom. I think that would be the best defense that some folks could come up with. Are you sympathetic to that at all? Jerry Muller: Uh, a little bit. But, you know, there's a larger point here. And that is, the fact that some institution is not working as well as we would like it to, or producing the results that we think is desirable doesn't mean that instituting metric fixation--this combination of standardized measurement, reward, and punishment, and publicizing results--it doesn't mean that that's going to make the organization better. In other words, that's true of a lot of life. The fact that the situation is problematic doesn't mean that the solution that you have at hand is actually going to make the situation better. And, what I argue--what I'm inclined to argue--is that in many cases, it actually makes the situation worse. So, it's not that I deny the problem. It's that I'm skeptical of the efficacy of the proposed solution. Russ Roberts: Yeah; that's very well said.
21:18	Russ Roberts: This reminds me--and there's an example or two in your book of this, where people will say, 'Well, this situation--the outcomes are not very attractive.' So, what we need, are, say, incentives. We need the things that make market outcomes work really well. So, we'll just put those in. Jerry Muller: Mmm-hmm. Russ Roberts: And, in fact, in my conversation with Diane Ravitch--and we'll put a link up to that old episode, which was quite a while ago--that's exactly what it came down to. She was horrified at the solution of "running a school like a business." Jerry Muller: Right. Russ Roberts: I certainly agree with that. We shouldn't run a school like a business. It doesn't mean a business couldn't run a school well. In today's world there are a lot of charter schools run by non-profits, and maybe for-profits. I think we have the potential to do a good job. But, certainly because businesses work well using carrots and sticks, means schools could do that, too. Imposing that from the top down does not create the institutional infrastructure that a market creates en-route to the outcomes that we like about markets. And so, I think that's--I think that's an incredibly important point, that: 'Yeah, this isn't working well, so we'll just jam in these incentives.' And we know incentives work. They do. The problem, as you point out, many times, as you point out in the book is that they work too well. People respond to the incentives rather than the ultimate goals of the institution. Jerry Muller: Yeah. And we should come back to that. But let me say a little bit about this notion of running a school like a business. What actually happens is the schools--the schools and other non-profit-making institutions on whom metric fixation is imposed--is they don't actually run like a business. They are made to run like a simplified caricature of a business. And that is to say, in real business it's true that there is a bottom line. But, people within business organization have motivations over and above those of monetary reward. That's certainly an important one. But, there are other ones that are important for the functioning of the organization that have to do with intrinsic motivation. That is, what the greed of people find the job interesting, or to what degree do they find the job significant. And then, there are qualities that are unmeasurable. Like, mentoring. Or, cooperating with one's fellow employees. That are actually essential to a profit-making business, too. And, one of the problems in profit-making businesses where metric fixation has taken hold is that it also in this, in this simplified caricature of how people work, this notion that they have a kind of, have low-fee and response to material incentives, it actually has a distorting effect in businesses, too. So, first of all, there are differences between businesses and schools. And, secondly, even in businesses, the use of this, kind of, simplified conception of human nature turns out to be counterproductive. And then, as people, like, as many people now, but for example, Henry Mintzberg, a really fascinating professor of management at McGill has pointed out, and James Q. Wilson did this in a different way. Ultimately businesses do have a kind of single bottom line. But institutions like schools or universities or government agencies, don't have one single purpose. They have a multiplicity of purposes. And, if you just try to focus them on one or two of those purposes, the other important parts of the organization--the other important goals of the organization are not going to be well served. Russ Roberts: And actually may be corrupted. I mean, I-- Jerry Muller: Yes. Russ Roberts: I'm--as many listeners know--this podcast is sponsored by Liberty Fund. Which is a foundation in Indianapolis, IN, which has fabulous educational things. They publish books. They run conferences. And they have the Library of Economics and Liberty, of which EconTalk is a part. Jerry Muller: Yeah. Russ Roberts: And I am paid a fixed amount to generate these episodes by Liberty Fund. And you could, they could instead say, 'We're not going to just pay you a fixed amount. We're going to pay you based on the number of downloads. Jerry Muller: Mmmhmm. Russ Roberts: And, that is a reasonable thought. It's a reasonable measure of success. It's one I use personally, in looking and in evaluating my performance as the host. Jerry Muller: Mmmhmm. Russ Roberts: And--but it could be that's the way I'm compensated. I'm not. But it could be. It's not an unreasonable idea. And yet, there is a huge pressure, and you mention it in passing in one chapter--a huge pressure by Boards of Directors for philanthropic organizations for charities to measure stuff. Oriented toward results. Not just feeling good about what you are doing. And that's basically--the effect of the altruism movement, we've talked about here on the program with William MacAskill. It's not a bad idea. It's a good idea, in general, to care about what happens, not just whether you think you are doing a good job. So, that's okay. The challenge is: How do you measure it? As you point out. And so, if we'd measured this podcast by downloads only, one challenge would be I'd have an incentive to become corrupt. Literally corrupt. To downloads, to get people to download it who weren't going to listen to it. Obviously that could happen. But the bigger problem would be it would change the way I run the program--the kind of guests I have. And I have deliberately chosen a style for the program that I think is educational. And I hope it's a little bit entertaining as well. But, that style limits the number of listeners to some extent. And I think it's just a profoundly non-obvious thing, because--I say 'non-obvious' because so many people do it anyway. To choose some measure. Because that way, we'll have incentives. And it's just--it's dangerous, actually. Jerry Muller: Right. So, this is actually a fabulous example that you've hit upon. I mean, one of the reasons that I'm pleased to be on EconTalk is that I have listened to EconTalk a lot over the years, and learned a lot from it. And sometimes I've gone out and bought the books. Or, sometimes, I've simply assimilated ideas which have then found their way into my own work. Now, that's something that's--it's not difficult to measure. It's impossible to measure. But you know that the people who--and it's actually sort of known among a certain class of listeners to EconTalk--that EconTalk has a sort of high level of listeners. And in that sense it's very effective. But, as you say, we could be talking about something much more popular or sensationalistic or whatever. And you could have more listeners. And your metrics would be better. But, you wouldn't really be accomplishing your goal. And neither would your employer. Russ Roberts: And, some people might say, 'We just need better measures.' So, I was happy to see in a recent book that EconTalk gets acknowledged as being useful. And that's lovely. Jerry Muller: Uh huh. Russ Roberts: 'So, we could use that as the measure. It won't just be how many downloads. It will be: 'How many books mention EconTalk in passing?' And then that encourages me to work with prospective authors to suck up to them and hope that they'll mention EconTalk; and maybe ask them directly. And then they'll get turned back. It opens up a chain of consequences that are uncertain.
29:19	Russ Roberts: I want to turn to that now. Which is, as you've said already, and it's mentioned many times in the book: One of the results of these kinds of incentives and metric fixation is unintended consequences. And I want to ask two questions. One: Why are they inevitably--seemingly--why are they inevitably negative unintended consequences? And, secondly, are you sure they are unintended? Because-- Jerry Muller: Yeah. Yes. Russ Roberts: Because, you and I, if you--I mean, you've thought about it. You are in the top, I'd say, half of a percent, maybe even higher, of people who have thought about how complicated incentives are. I'm in the top something. I'm kind of an economist: that's kind of our job. So, when someone--if I'm on the Board of a charity and someone says, 'Well, let's incentivize the director; but we'll make their pay based on such-and-such'; and my first thought it, I'm going to start thinking right away: 'Gee, what's that going to lead to?' And you are, too. You are "only an historian," Jerry. But you clearly know a lot of economics. You are not certified as an economist, but you know a lot; you've read a ton. And I've written a book about, essentially unintended consequences of incentives--the wrong kind. So, you'd think of these things. Don't other people think of these things? And, secondly: Why are--maybe the are on purpose? Jerry Muller: Right. So, sometimes the--so, one of the effects of measuring performance, and then rewarding and punishing it, is that people in the organization will indeed focus on what gets measured and rewarded. And, sometimes that's in keeping with what the leadership or the CEO [Chief Executive Officer] or Management want. But often enough, they want that because they actually haven't thought the consequences very well. Because, in many, in most organizations, they have multiple purposes. And in most jobs, there are multiple facets. I mean, if you are in a standardized job where you are, you know, flipping hamburgers or you are changing windshields or something like that, where there is not much room--where these is actually sort of really one function; it's not very intrinsically interesting; there is not much room for innovation; mentoring and cooperation doesn't matter that much--well, then measuring and rewarding may work. But in, as I say, in most jobs there are multiple facets to the job. And one of the unintended negative effects is that if you--is that people will actually focus on what gets measured and rewarded, at the expense of the other parts of the job, and the other purposes of the organization, that aren't being measured and rewarded in a way that can be ultimately dysfunctional. So, that's one of the first unintended consequences. A second, of course, is a whole--a second unintended consequence that people who try to implement these things from the top typically forget about is: How much time it takes to actually input the information, analyze it; and, that that is time taken away from doing the activity that is nominally being measured. So, that's another factor. And then, there's the whole realm of gaming the metrics. That is to say, of attaining the metrics in a way that is at odds with the actual goals and purposes of the organization. And, you know, part of my book is a catalog of that--of all the ways--of which perhaps the most dramatic example is: Teaching to the test. Actually orienting the education in the K-12 classroom towards the narrow range of skills that are required to take a test. Including simply having the students practice taking a test more and more. What could be more pedagogically deadening than that? But it could also improve the metrics. Or: Take another example. You know, some years ago the National Health Service in Britain had a lot of complaints about waiting times to be admitted to the hospitals were too long. So, they declared that hospitals would be penalized if the waiting time to get in to, to be admitted to the hospital was 4 hours or more. So, what did the hospital--so, some of the hospitals did the following: When they had patients coming in by ambulance and they knew that the wait was going to be more than 4 hours, they would have the ambulance circle around the hospital until they could admit the patients within 4 hours. Which sounds kind of amusing at first, until you think about the fact that there were then patients sitting at home, waiting to get picked up by those ambulances, who weren't picked up in a timely way. So, the hospital could meet its metrics in a way that was transparent, but with negative effects for the actual purposes of the institutional. So, one of the things that metric fixation does, is it turns us all into gamers. Russ Roberts: And there's an example that I think--by the way, that example is so horrific that I'm a little bit skeptical about it. Because--I understand the incentive to do that. It's so ugly. But, of course, as you point out, and this one I'm sure is true: Doctors will refuse patients because they are afraid the surgery is too risky or the outcome won't go well, and their success rate that gets published will look bad and then they won't look like good doctors. Jerry Muller: Right. And that happens all the time, once these surgical report cards were instituted by which surgeons' rates of success and failure were publicized. So, many of them, at least some of them, reacted by what we call creaming or cherry-picking or selection bias--it goes under a number of names, but it basically means the same thing: you take cases where you are more likely to be successful, and you turn down the cases--for example, patients with co-morbidities and complex situations where the risks are greater, and so you are less likely to succeed. And, of course, the people who pay for that are the people who don't get operated on, which you don't see in the metrics. Russ Roberts: Yeah. But your point about ambulances reminds me of something I've never seen but I suspect is true, which is that, I've noticed that airlines will tell you the scheduled arrival of a flight, and it seems way out of line with how long the flight's going to take. I assume that's because they keep track of how many times a flight it--a late flight is 15 minutes or more past expected arrival, and so they just build a cushion in, to reduce their bad performance measurement. Jerry Muller: That's exactly the causal chain.
36:57	Russ Roberts: So, now I want to raise the question which you didn't answer, because it's an unpleasant thing to think about, or you just gave a--you got derailed, it's either one--about whether some of these are intended. I want you to talk about what happens in the world[?] police, or policing, because this is such a depressing and human response to measurement that has happened in police around the country. So, talk about how the tyranny of metrics works with police and the FBI [Federal Bureau of Investigation]. Jerry Muller: Sure. So, and this is an excellent example of two things that people confuse, and that is the use of metrics for diagnosis by practitioners versus the use of metrics for reward and punishment. So, one of the tools that's gotten a lot of attention in the last few decades is CompStat [Compare Statistics]--these computerized statistics that were first developed, by the police I believe, in New York, and since have been adopted in many other cities. And they have an informational diagnostic element--that is, using GIS [Geographic Information System] and so on, they map where crimes are occurring almost in real time. And, that can be very valuable in terms of deciding where you are going to deploy squad cars and things like that. But then there's another element that often goes with it, and that is there are these weekly sessions in which district commanders have to defend the rate of crime and so on in their district. And, in many places, their promotions are attached to the issue of whether crime goes down in their district. Now, of course, in good part--so part of making crime go down in their district is something that's amenable to improvement by where they deploy police and how they deploy police and so on. But, much of it has to do with who lives in the neighborhood and all sorts of other things. Russ Roberts: Things beyond the control of the police. Jerry Muller: Things that are beyond the control of the police. So, when they're told, for example by politicians--like, somebody who is running for mayor will typically be challenged by someone else who says, 'Oh, the crime rate is too high.' So, the mayor tells the police commissioner: 'We have to cut the crime rate by 5% by the end of the year.' And, he tells that to the commanders and he tells that to the cop in the car and the police on the street. Well, they actually can't cut down crime, the actual incidence of crime by 5%. But, what they can cut down is the official reporting of crime by 5%. And there's been many, many documented cases of this in various places in the United States, but also in Great Britain and elsewhere. And that is: Police who were told that their promotion or whatever depends on their cutting the rate of major crimes; and there's four major crimes that go into the FBI's index of major crime indicators. So, what they end up doing often is taking crimes that ought to be classified as felonies and classifying them as misdemeanors. So, what ought to be grand theft becomes a minor theft. What ought to be aggravated assault becomes something much more minor. And so on. So, this is gaming the metrics through reclassification. And in some cases, crimes are reported to the police and they simply don't record it at all, so that it doesn't make its way into the metrics. So, all of this then has a corrupting effect on the diagnostic value of the metrics that are being gathered. But, again, the politicians can brag about the fact that they've cut the crime rate by 5%, or they've increased test scores by x%, or what have you. Russ Roberts: Well, that's where I wanted to raise this pretty cynical and not-so-attractive thought, that maybe it's not unintended. Maybe it's intended. So, you use this metric. And so they get reclassified. And the police look good. The politician looks good. The people who live in the neighborhood know that it's maybe--maybe some of them realize that actually crime is going up, not down, or real crime hasn't changed at all. But, the system kind of incentivizes everybody to use this fake cheerleading metric that can be waved around and actually--here's the irony--it reduces accountability. That's the incredible part of it. Jerry Muller: Yes. I think that's quite right. One of my other criticisms of the use of metric fixation is that it tends to reduce initiative and entrepreneurialism within organizations. But, as someone pointed out to me when I gave a talk on this recently, there actually is a lot of initiative and entrepreneurialism, but it's in gaming the metrics as opposed to improving the result. And I think that's what you're talking about. Russ Roberts: Exactly.
42:46	Russ Roberts: So, let's talk a little bit about the alternatives and the reality that these techniques are extremely appealing to everybody--except you and me, and a few other people who are worried about these kind of effects. They do operate under the guise of scientific precision. They appear to be leveraging the incentive effects that often are attractive in certain organic systems like markets. And so, they are everywhere, as you point out. Many, many chapters--it's a short book, but there are many chapters, and the chapters are short--I think a nice design element. Jerry Muller: And if I can add something to that, Russ: they are also often connected to--I mean, I'm not the first to call it this--this managerialism, as an ideology. Which is different from management. Management is a craft, and the practice. Managerialism is this notion, much of which comes out of some business schools and is promoted by all sorts of business gurus--it's the notion that management is not a matter of experience. Russ Roberts: It's not an art or a craft. Jerry Muller: It's not an art or a craft; it's not based on judgment. It's a matter of technique. And you have--so, that creates an incentive to use this simplified conception of incentives in the first place; and then to have these standardized techniques for measuring, for surveilling, and for rewarding. And, it's part of that managerial ideology that a manager, a CEO [Chief Executive Officer], say, should be able to go from a company that makes one sort of product to a company that does something entirely different. Or, should be able to go from being a CEO of a Fortune 500 company to being the president of a university, or from going to be the CEO of a Fortune 500 company to being the head of the Department of State. The notion is: All organizations are the same, and we can have these standardized techniques, and because there are numbers attached, they are scientific; and because there are incentives they must work, and so on. That's part of the whole misapplied package, I think. Russ Roberts: So, I wanted to challenge you to think about--you've done a wonderful job pointing out what's wrong with these things. With this whole trend. And I think the challenge for those of us who are worried about it, whether it's in education, police, medicine, all kinds of--the running of charities and not-for-profits generally--the real challenge is: I think we need a better defense of judgment. And I'll give you my favorite example of this, and it brings in an insight I've learned from another EconTalk guest, Nassim Taleb. Somebody, a friend of mine who is interested in finance said, 'Value at risk is a flawed measure.' Because value at risk is used to measure the riskiness of a portfolio. And, in the financial crisis a lot of firms were over-confident about the riskiness of their portfolio. Jerry Muller: Mmmhmm. Russ Roberts: And Taleb and others have pointed out, 'Well, yeah, those measures were flawed.' Everybody knew those measures were flawed. Anybody in the business knew they were flawed. If you asked them about it, as my friend would say, he said, 'But they are the best we have.' Jerry Muller: Mmmhmm. Russ Roberts: And his argument was: 'Something is better than nothing.' And your point, which I think is 100% right, is that, 'Yeah, that's true, as long as you keep it as a tool, a diagnostic tool to help people with your judgment.' But, once it becomes something that becomes, I don't know--regimented. Used for pay and performance. Used for assessment. Used for promotion. It distorts behavior badly. But I'd say it does one other thing. This is the Taleb point. Which is: It lulls you into thinking you've got your hand and head wrapped around the challenges that you face. Jerry Muller: Right. Russ Roberts: And his example, which I love, is: You know, the people who are lost in Paris, they are supposed to meet somebody at the Arc de Triomphe--this is my version of it. But it's his example. He's supposed to meet somebody at the Arc de Triomphe; and they're lost; and finally somebody says, 'There's good news, I've found a map.' 'Oh, thank goodness.' It's a map of New York. But it's better than nothing. And the answer, of course, is it's not. It's actually much worse. Because it deludes you into thinking you are heading toward your goals. And, I think the challenge here--and my friend in finance who insists it's better than nothing--and I say it's hot. He said, 'Well, what's the alternative?' And the alternative, of course, is judgment. It's craft, it's art, it's a recognition that you do not fully understand the situation. You do not fully understand how to get there from here--literally, in the case of an organization. Akin to that map story. And you've got to grope. You don't know the full ramifications of your portfolio at any one time. You might use metrics to give you a better measure. But you don't know. And I think--my argument is, is that's better than fooling yourself into thinking you are knowing. And a lot of people find that deeply dissatisfying. Hehehe. They say, 'Oh, no. We can do better than that.' React to that, those examples and claims. Jerry Muller: Yeah. So, it is this kind of scientism on the one hand, that thinks that everything can be measured in the way that physicists might measure inanimate objects. And that forgets that people are animate objects. Hehe. And they react back on what's being measured. I think that--I don't want to pose the situation as metrics or no metrics. The way I would pose it is, metrics with judgment, informed by judgment and being modified by judgment. And, indeed, even metrics together with pay-for-performance, can be functional, if what is being measured and rewarded accords with the actual professional goals of the people in the organization. So, if you are going to reward a hospital for increasing the level of safety--that is something--and you are even going to even reward the physicians on the basis of those safety measures--that may actually boost the physicians' intrinsic motivation, because there is this link between what they themselves know that they would like to happen, and what is being measured and rewarded. So, in that sense, you know, even pay-for measured performance can work, depending on, as I say, what the goals are. And then there's the question of who has input into setting the measurements. And that's a matter of professional judgment, too. And then of judging, as I've said before: How relatively important those measurements are compared to others? And what might be extenuating circumstances? A lot of life in business and in organizations is a matter of extenuating circumstances. That is, things that are beyond the framework that you set out with in the first place. So, I think that, it's not a matter of metrics or judgment. It's the two of them functioning together. But, as you say, there are times--this metric fixation, this trinity of measuring, rewarding, and making public--it seems like a magic bullet. In so many situations. And there are lots of situations, in organizations, in the field of education and medicine and so on, where we really would like to have better outcomes. But, the magic bullet often doesn't work.
51:02	Russ Roberts: I want to say something a little bit radical and get your reaction. Your book encouraged me to think about--and this conversation encouraged me to think about the--what I would call, maybe meta-incentives. Or--I don't have a word for it. But, let me give you the layout, what I'm thinking. Jerry Muller: Okay. Russ Roberts: So, if you say to a, to someone, I think principals of a school--I'm talking about an elementary school, now, or a high school, K-12. Principals of a school should be allowed to give out bonuses based on performance of their teachers. Jerry Muller: Mmmhmmm. Russ Roberts: And, when I [?] suggested that, the typical answer I get back from people who are uneasy with that is the following: 'Yeah, but they're prejudiced. They don't really--they shouldn't be entitled to indulge their personal favorites,' say, or the people who are easy to get along with. They don't really--they are not going to do a good job. Jerry Muller: Mmmhmmm. Russ Roberts: So, we need some objective measure, whether it's test scores or others to measure performance. So, my reaction to that is: A good Principal knows exactly who the good teachers are in their school. And they know who the bad teachers are. And you don't need a test. In fact, the test misleads you--as you pointed out. Jerry Muller: Mmhmmm. Russ Roberts: And, fundamentally, what you are really saying is: in the public school system, for example; maybe even in some private schools, the Principals have too much leeway. There's no market check on them. True, they--if they do their--if they are well-trained and they are skilled and they have good judgment, they could run a school with lots of authority. And lots of power. But, they are too unaccountable. And, what these metrics allow us to do is to constrain their discretion. And that's fundamentally--to me--I just want to realize, that's just a different way of saying that there's no attractive way to incentivize the principle. Ehhuh. And so, therefore, we want to remove discretion. Go to these objective measures. And I'm going to make the radical argument--is, as we've expanded the reach of government and the size of government, inevitably we have to try to find ways to, um, restrain and incentivize politicians, bureaucrats, administrators, etc. And they are inevitably going to be flawed. Like we're talking about. Whereas, in an organization--I mean, the analogy would be, in a financial firm where people are, it's a partnership where people are investing their own money, they are not going to use these goofy measures to justify outside investors that they've done a good job. They are going to know inside whether they have done a good job or not. More or less. Yes, they have biases. Yes, they can self-deceive. But they are spending their own money. As we move to worlds and situations where more and more people are spending other people's money, we have to try to find ways to incentivize them correctly. And they are inevitably going to be terribly flawed. Jerry Muller: Mmmhmm. Yes. Uh--so, first of all--in terms of what you said about how this works in Finance. Um, I think that's an important point. You know. One might think, if one was naive, or it one had gone to business school, that, uh, the way in which people in Finance are to be incentivized is purely on the basis of how much profit they bring into the firm. And that is certainly one of the ways in which they are incentivized. But, as you say, people in the firm know what other roles these people have been playing--in mentoring others, in cooperating, in bringing in new ideas, in taking initiative, and so on. So, they take that--so, I'm saying in a profit-making business, if it's well run, they take such things, such intangible things, unmeasurable things, into account in rewarding. When it comes to--when it comes to schools and principals and so on, yes--I guess I would say, the truth is I would say there are dangers on each end. Russ Roberts: Agreed. Jerry Muller: And--on the one hand, of giving a Principal leeway, and that is to say, discretion. Which leaves room for judgment. Which creates the possibility of, on the one hand bias and prejudice; and from the point of view of his subordinates, of fonding[?] behavior. Uh. I would say that we've become so focused on that set of dangers that the pendulum has gone way too far in the other direction to eliminate or minimize the role of judgment and discretion. And that's what my book is trying to help counteract. Russ Roberts: That's very well said. The only point I'd add, on the case of Finance, and I'm going to broaden it a little bit, is that: I know you care about whether your cooperative, and so on. I also care about, say, the riskiness of the portfolio. I don't care just about the return. I'm in a bit of--I'm in a little minor Twitter war today about whether Ben Bernanke, Henry Paulson, and Tim Geithner did a good job in dealing with the financial crisis. And I suggested that they did not. And everyone comes back to me with: 'Yeah, but look how long the recovery's gone along.' As if that were the only metric. Obvious counterpoint to that is: 'It's been a pretty mediocre recovery.' But that's not my real point. My real point is that, by rewarding bad actors in that situation, they sowed the seeds for the next crisis. That next crisis will not be put at their door. It will be put at whoever is in charge at that point. Jerry Muller: Yeah. Russ Roberts: And I think the relentless praise for short-term results is incredibly dangerous. Jerry Muller: Uh-huh. Russ Roberts: I'm not saying I'm right. I'm not saying that--in fact, I don't look particularly right, right now, because we haven't had another crisis. I think things look pretty good. I'm open to the possibility that they did a good job. But, my real point isn't that they did a bad job because of the next crisis. It's all these other subtle things about faith in democracy, faith in capitalism, the rewarding of cronies that the financial crisis solutions dealt in. Those are the things that to me are intangible, and don't get laid at their door. So, I do think: Yes, you are right. You have to be careful. Judgment has got its own risk. But, the other direction is really, really dangerous, too. Jerry Muller: Agreed.
57:41	Russ Roberts: Let's close with talking about transparency, because you say something delightfully shocking toward the end of the book, which is: Transparency can be a bad thing. Everyone thinks that, particularly with government, because they are "working for us," supposedly, that more transparency is better. You take a different approach and suggest some costs to transparency. What are they? Jerry Muller: Yes. So, I try to point out that there are lots of organizational contexts and lots of relations in human life when transparency is counterproductive. Starting with a relationship with one's spouse. Do you really want--my spouse pointed this out to me--do you really want to know everything that your spouse has done? Or is thinking? No. In fact-- Russ Roberts: Ignorance is bliss. Jerry Muller: Ignorance is bliss. Russ Roberts: Sometimes, anyway. Jerry Muller: And as the philosopher, Moshe Halbertal, has pointed out, the whole possibility of having intimacy with people depends on our thoughts and ideas not being transparent to most others most of the time. So, we sort of selectively make them transparent to others. And that's true--starting with the most intimate relations, like marriage. But it's also very true in various parts of government. So, for example, to take something that bothers me a great deal, the cult of Snowden and the event with Snowden, and the cult of Wiki-leaksism, that's based on the notion that making everything the government does public and transparent must be a good thing. Because sunlight is the best disinfectant, and so on. You know all the cliches. Well, the truth is: Intelligence agencies, to begin with, can't function if what they are doing is known to our enemies and antagonists. Many elements of statecraft are based upon having information that is not public. Many elements of politics, especially of political negotiation, is based upon not having all the considerations in the negotiations made public, because there are various public interest groups. And if you are a politician and you are going to get something done, you almost by definition have to compromise. Which, from the point of view of particular public interest groups, is seen as essentially betrayal and treason. So, if negotiations take place--political negotiations--take place entirely in public, or in a transparent way--then they are just not going to get done. And, I think it's Cass Sunstein who has made the point that, if the liberations within the government, among civil servants or policymakers, are open to being made public, that means that those people are simply not--and those people have to deal with issues that are tricky, from a public point of view. That is, are going to alienate one or another part of the public. They are simply not going to convey their views honestly and openly to other policy-makers, knowing that those could be made public. So, in all of those areas and more, there are limits to the virtues of transparency. Russ Roberts: And--well, again, as you pointed out earlier, there are tradeoffs here. I think there are some serious costs of [?], revelations. But it did, in a democracy let some people know, let the voters and citizens know about saying what was going on, the extent of it. Which I think was not fully imagined. So that, there are benefits from that. The question is: How much sunshine? Is it always 100%? And the answer is clearly, No. As you point out with discussion. Discussion has to allow, if it's going to be full of the give and take of ideas, inevitably, if it's free-flowing--and this is always my worry, in having a conversation with an EconTalk guest. You know, I'm going to say something that's inappropriate, that's wrong, literally. But of course, if that's my fear, I'm going to just script a bunch of questions in advance and just read 'em out and be safe. And that's a disaster--for knowledge and the production of wisdom and understanding and learning. And that's true in an organization, as well. So, I think this is a really interesting topic; it goes way beyond the scope of your book--that, the revelation of every jot and tiddle of conversation that goes on is going--the way people respond to that metric, is: They stop talking. And that's really what, you know, to think about the surgery example: It seems reasonable that you should know how effective a hospital is, and when you go to a doctor, how many people have died under the knife of that doctor. But, if that's the way that people are going to judge doctors, then they are going to be encouraged not to take hard, difficult surgeries. And that's not a good thing. So, a lot of this, I think ultimately comes down to the fact that you've got to have some understanding and knowledge about the imperfection of measurement. And that's really hard for us--I think as a species. Jerry Muller: It's hard for us as a species. It's even harder under the influence of several larger cultural factors. The aura of modern science. The aura, nowadays of data and big data, where often institutions assume that someone will, someone has a way of analyzing a lot of data so they better come up with questions to ask that data. And, also, this sort of cult of managerialism that I mentioned. And this simplified conception of human motivation. So you put all those things together, and yes: There's this overconfidence in measurement and under-rating of the role of judgment in the experience.

Jerry Muller on the Tyranny of Metrics

Nassim Nicholas Taleb on Black Swans

John Ioannidis on Statistical Significance, Economics, and Replication

READER COMMENTS

Allen Hutson

Apr 16 2018 at 11:01am

Texas Red

Apr 16 2018 at 11:07am

Krishnan Chittur

Apr 16 2018 at 11:27am

Nonlin_org

Apr 16 2018 at 11:50am

Earl Rodd

Apr 16 2018 at 2:05pm

Isaac Moses

Apr 16 2018 at 2:13pm

DWAnderson

Apr 16 2018 at 2:56pm

Jack in the Box

Apr 16 2018 at 2:59pm

Andy McGill

Apr 16 2018 at 4:40pm

Andrew Bellay

Apr 16 2018 at 7:52pm

Floozy

Apr 17 2018 at 12:14am

Peter Pitsch

Apr 17 2018 at 6:10am

Andy McGill

Apr 17 2018 at 9:45am

John Pinkerton

Apr 17 2018 at 7:14pm

Scott Todd

Apr 18 2018 at 12:21pm

Jakob Engblom

Apr 18 2018 at 1:54pm

A.G.McDowell

Apr 18 2018 at 2:03pm

Andy McGill

Apr 18 2018 at 8:15pm

Doug Iliff, MD, FAAFP

Apr 18 2018 at 9:20pm

SaveyourSelf

Apr 18 2018 at 9:46pm

Chase Steffensen

Apr 19 2018 at 1:14pm

Floccina

Apr 19 2018 at 4:13pm

Gary Goubeau

Apr 19 2018 at 5:36pm

Tom G

Apr 19 2018 at 6:25pm

Trent

Apr 20 2018 at 9:14pm

Marilyne Tolle

Apr 24 2018 at 12:02pm

Nick Ronalds

Apr 25 2018 at 8:11am

Kendall

Apr 30 2018 at 10:32pm

Enter your email address to subscribe to our monthly newsletter: