Measuring Ourselves to Death

What is the appropriate relationship between judgments and measurement? Is it not the case that “if it matters, you can measure it?” In this episode, EconTalk host Russ Roberts welcomed historian Jerry Muller to talk about his book, The Tyranny of Metrics.

Is your work evaluated based on metrics? If so, do you find such evaluation reliable? Are you worried about the reliance on standardized tests at your kids’ schools? Is crime over- or under-reported in your area, and how would you know? These are just a few of the issues Roberts and Muller discuss.

1. What is the “tyranny of metrics,” according to Muller? Under what circumstances are metrics useful?2. How have metrics affected the way both teachers and students are evaluated in K-12 schools? What are the intended and unintended consequences? To what extent can or ought judgment be used instead?

3. How has the use of metrics (especially in technologies like CompStat) influenced policing? Do you think the consequences of this particular reliance on metrics is intended or unintended? Why? What does it mean to “game metrics through reclassification?”

4. What are the costs of transparency in government? Are there some areas where transparency is not to be desired? (Note Muller’s concern with what he calls “the cult of Snowden.”)

Bonus: If incentives and metrics matter, how should Liberty Fund compensate Russ for EconTalk?

READER COMMENTS

READ COMMENT POLICY

Jim Thorson

Apr 20 2018 at 8:57pm

IN DEFENSE OF METRICS AND JUDGMENT

Certainly , as in physics there is an observer effect whenever you put a measurement metric in place. You will change the system by attempting to measure it. It takes a good understanding of the system you are measuring and rigorous identification and control of side effects (unintended consequences) to avoid this. The blind use of metrics , without taking steps to minimize the observer effect , is the misuse of metrics.

The right metrics are not always the final goal you desire. In safety programs, measuring and rewarding a reduction in lost time accidents (a term defined term by OSHA) does not necessarily lead to real improvements in safety, and the system can be gamed. A better metric is to measure some action that we know leads to the desired result- for example “Close call safety incidents reported”. This system can of course also be gamed, but if it is gamed it will likely result in meeting the ultimate goal (but at a higher cost than is optimal).

I disagree strongly with the position that the some things cannot be measured. If the buyer cannot measure the difference in quality of two competing items, those items, are, from the buyers perspective, a commodity. They should buy based on the low bidder. Russ stated that there are a lot of mediocre teachers ( and I would add there are also lots of excellent teachers) How can person make these statement without some kind of a measuring stick?

It may be that that measuring stick integrates a number of factors, and that we cannot tease out what those factors are, and their relative weights. This is called judgement. Or, even if everyone could individually come up with their own list and weights ( and I believe this is possible) , there would be no consensus across a number of of people on the list and weights, and thus no generally agreed measurement.

What I believe we do know is that while the judgments of an individual ( yes, even a completely unbiased well informed individual) are not perfect, the judgement of a lot of people are often very good. Thus the efficient market hypothesis, and the relatively decent record of prediction markets.

What I don’t know is; Is one persons judgement batter than a poorly selected ad vetted metric? How many people (market participants) does it take to make a reasonable judgment? And, how do we choose and incentive the judges?

SaveyourSelf

Apr 21 2018 at 3:47pm

Is your work evaluated based on metrics?

Yes. Medicine.

If so, do you find such evaluation reliable?

Based on a single high quality trial we believe the number needed to treat with aspirin when ECG shows an ST segment elevation is 42. Thus, on average, we think we save one person from dying in the next month after having a heart attack for every 42 people with ST elevation MI treated with aspirin with minimal down side. We now monitor whether each patient who has a STEMI gets aspirin or not. This, I believe, increases the number of people who get aspirin during a STEMI. It’s a good thing. Metrics is encouraging an inexpensive practice known to produce good outcomes with little risk.

Based on no evidence the US government performs surveys of Medicare patients and asks, among other things, whether they feel like their pain was addressed adequately during their hospital treatment. Centers for Medicare and Medicaid services (CMS) reserves 2% of what they have agreed to reimburse and diverts that money from facilities with poor survey performance to facilities with high survey performance. This encourages aggressive prescribing of pain mediations, not necessarily wise use of medications. All medicines are poisons, so aggressive prescribing of any kinds of medicines is not without negative consequences. This is a bad thing. Metrics is encouraging unwise practices with definite harmful consequences.

Are you worried about the reliance on standardized tests at your kids’ schools?

I am worried about schools in general. My children go to public schools. We have a daily joke where they tell me some of the ridiculous things they are made to do and I gnash my teeth and wonder who on earth thought painting flowers or watching movies or coloring or singing is good preparation for a career. They can, and do, these things on their own at home for free. Why do I pay for someone else to do with them what they will do for free? The tests are just more of the same. The tests get them in a good college. The good college gets their toe in a good career. I can see the logic of the tests. It’s a filter for businesses. There are only so many jobs in the world that can use the tools taught in the universal curriculum. The tests make it easier to find the students who captured that material the best and exclude the rest. It’s efficient… for businesses. But the fact that a huge percentage of the kids in school will never use what they are learning because, for whatever reason, they are not competitive, means there needs to be alternatives. Rather than encourage alternatives, however, the failure of everyone to thrive at taking standardized tests redoubles the effort of the school’s to become more uniform, more ridged, and more test-centric, not less. It’s madness, what we do to our children. Markets are the best tools we know for solving problems, capturing and communicating information, and allocating resources, and markets are the one thing we exclude by law from their education.

Is crime over- or under-reported in your area, and how would you know?

I’m not sure. I pass warnings around to my neighbors and receive information from them when we believe there is a thief or a vandal in our community. I live in a wealthy neighborhood with many older people. Thieves and vandals can’t afford to live in wealthy neighborhoods. So I have the luxury of not having to worry too much about crime. When crime does find its way in my neighborhood, we pay the bulk of the police budget so the police take care of our problems quickly. The jail is overflowing though. That’s a problem. Drugs are bad in this area.

1. What is the “tyranny of metrics,” according to Muller? Under what circumstances are metrics useful?

Jerry Muller says the tyranny of metrics is a two part pattern where results of standardized measurement is weighed heavier than thinking in decision algorithms, and then rewards and punishments are appended to those measurements. He’s right, I think.

2. How have metrics affected the way both teachers and students are evaluated in K-12 schools? What are the intended and unintended consequences? To what extent can or ought judgment be used instead?

I once had a boss who told me ‘What gets measured, improves.’ I think he was right. But that knowledge implies great responsibility for those measuring. Because they need to ask themselves how they know if what they are measuring is the right thing to measure? Or the best thing? Or how they make certain the thing they are measuring and thus improving isn’t actually hurting them? Those are not easy questions to answer.

In my opinion, regularly weighting measurements greater than thinking in decision making is hubris. It misunderstands the mission by mistaking the indicator for the goal.

Gaming a metric through reclassification is simply changing the name to get it through a filter. It is the reason there are so many words in our language for sex and doing drugs. It’s really only a game played with three parties–like keep away. It’s a game where two parties in a three party trade intentionally confuse the third. It’s fraud. It’s ugly. But so are three party trades.

If the goal of our government is to kill the enemy, keeping them ignorant and misinformed is a great ally. If the goal of our government is to dissuade the enemy, then transparency works well for that. If the goal of our government is to function as intelligently as possible, only transparency can accomplish that. Muller laments that the secret service cannot do its job if the government is transparent. I, for one, would not miss the CIA if it disappeared. Secrets in the government are like locks on your car. The most likely person a lock is to keep from entering your car is you. Locks don’t keep out the real badguys. Another way of saying it is that government secrets are guaranteed to hurt the host country. Whether they can hurt the enemy remains to be seen.

Apr 21 2018 at 5:21pm

Your comment is well thought out and written. You wrote some questions that I’d like to try and tackle for my own edification.

Is one person’s judgment better than a poorly selected ad vetted metric?

You’d need an experiment to determine this answer. Take the judgment and the determination of the metric. Treat them each as a hypothesis. Run each as an independent prospective trial with as close to identical conditions as possible. Go with the winner.

How many people (market participants) does it take to make a reasonable judgment?

This number could range from one to infinity. The future is unknowable, so the answer to this question is unknowable—as a generalization—until a reasonable solution is accepted or we stop trying, whatever and whenever the person(s) you let define ‘reasonable’ decides.

And, how do we choose and incentivize the judges?

You don’t. Not if you are wanting to see the full benefits of the market. The market is a problem solving machine. Not only that, but the market is a problem choosing machine. Ideally, we’d like the market to solve problems that matter, not problems that are made up. If someone is given power over others to either punish or reward or both, then at least some of the problem the market then seeks to solve is how to please that person, rather than how to best interface with the actual environment. Interfering with a market through recompenses or threats introduces frivolous information in a market, causes misallocation of resources, reduces the problem solving ability of the market toward the problem being incentivized by reducing the independence of the separate trials, adds a third party to two party trades—making them less efficient, and almost certainly requires violating justice and reducing freedom to make the incentive system operate. In short, it makes the market’s outcomes reflect less the scientific method and more the opinions of the ‘judge’. Ideal markets have their own built in reward system—Profit and Loss—that give power and weight to the different market actors.

Mauricio Lema

Apr 22 2018 at 6:58am

In the war against the guerrilas in Colombia, a few years ago, some soldiers killed innocent men in order to boost their performance. This is an extreme version of a perverse incentive for measuring performance….

Apr 24 2018 at 9:37am

bonus: If incentives and metrics matter, how should Liberty Fund compensate Russ for EconTalk?

This is a challenging question, I think, from both the perspective of Liberty Fund and Russ. The mission of the Liberty Fund is to provide a “contribution…to the preservation, restoration, and development of individual liberty through investigation, research, and educational activity.” They are a non-profit. It makes sense, I guess, that they would set up the remuneration for the host or Econtalk in a way that does not encourage profit seeking. Doing so would probably lose them their non-profit status. Russ seeking profits on the side would probably threaten their tax-exempt status too. So questions of compensation would have to be restrained to non-monetary means.

At some point in college a professor told me that people are motivated by power, prestige, and property. Power games–also called politics–are off limits as they would also threaten a not for profit’s tax exempt status. Of those three, then, only prestige remains as a means of variable compensation. So I will focus my thoughts there.

My father–a career officer in the US Navy–once told me that money does not motivate people. But changes in money do–bonuses, raises, and pay cuts–motivate people remarkably well. The same reasoning applies, I think, to prestige. I wrote a book recently and published it. I thought the measure that would matter to me most would be the number of books sold or the total profit I made–assuming I ever made a profit. But what turned out to matter the most to me was the little star rating people provided who had read the book. I think that’s the modern equivalent of measuring prestige. Russ already has everything he does open to comments. If he wanted to increase his prestige–in this case the number of comments he received each week–he could add a simple star system to each episode so that comments were not the only way that people could express their appreciation or disgust. Facebook uses a thumbs up, thumbs down system but that has proven too clumsy. I think currently the star system with option to comment is the best measure of prestige monitoring in use presently.

Even without the ‘power, prestige, property’ thing, we already know Russ is sensitive to feedback. He’s very open about that fact almost every week. Exposing him to even more feedback, therefore, is likely to make him even more responsive to his audience. At least to the extent possible within the restrictions of his own consciousness, the mission of the program, and the requirements of the Liberty Fund.

Matt Throckmorton

Apr 30 2018 at 8:50pm

When will the audio of this be posted?

[This post is a discussion of the podcast episode Jerry Muller on the Tyranny of Metrics, where a link to the audio file can be found. —Econlib Ed.]

Comments are closed.

Measuring Ourselves to Death

By Amy Willis

READER COMMENTS

Jim Thorson

Apr 20 2018 at 8:57pm

SaveyourSelf

Apr 21 2018 at 3:47pm

SaveyourSelf

Apr 21 2018 at 5:21pm

Mauricio Lema

Apr 22 2018 at 6:58am

SaveyourSelf

Apr 24 2018 at 9:37am

Matt Throckmorton

Apr 30 2018 at 8:50pm