Cathy O'Neil on Weapons of Math Destruction
Oct 3 2016

Math%20Destruction.jpg Cathy O'Neil, data scientist and author of Weapons of Math Destruction talks with EconTalk host Russ Roberts about the ideas in her book. O'Neil argues that the commercial application of big data often harms individuals in unknown ways. She argues that the poor are particularly vulnerable to exploitation. Examples discussed include prison sentencing, college rankings, evaluations of teachers, and targeted advertising. O'Neil argues for more transparency and ethical standards when using data.

RELATED EPISODE
Susan Athey on Machine Learning, Big Data, and Causation
Can machine learning improve the use of data and evidence for understanding economics and public policy? Susan Athey of Stanford University talks with EconTalk host Russ Roberts about how machine learning can be used in conjunction with traditional econometric techniques...
EXPLORE MORE
Related EPISODE
Adam D'Angelo on Knowledge, Experimentation, and Quora
Adam D'Angelo, CEO of the question and answer website, Quora, talks with EconTalk host Russ Roberts about the history, evolution, and challenges of Quora. Along the way they discuss the aggregation of knowledge and the power of experiments for improving...
EXPLORE MORE
Explore audio transcript, further reading that will help you delve deeper into this week’s episode, and vigorous conversations in the form of our comments section below.

READER COMMENTS

Eric
Oct 3 2016 at 10:44am

Here is an early classic book that raised similar concerns.

Computer Power and Human Reason:
From Judgment To Calculation

by Joseph Weizenbaum
(San Francisco: W. H. Freeman, 1976; ISBN 0-7167-0463-3)

Weizenbaum authored the program Eliza, which interacts with a user and gives the appearance of a Rogerian therapist. (It appears to understand, even though being only a program it doesn’t actually understand at all.)

Weizenbaum was deeply concerned about the general willingness of people to trust the calculations of computer algorithms and what can happen when this replaces human reason.

He would have seen clearly, for example, the danger involved in replacing a human judgment about the skill of a teacher with the calculation of a score based on questionable (or even unrevealed) software.

When a program like Eliza fails, the exposure of its lack of understanding can become obvious and even quite humorous. The really serious trouble comes when the trusted calculations of software can fail silently in ways that may never become apparent, even though we base important decisions on those results.

Another excellent episode.

Fred Giertz
Oct 3 2016 at 1:22pm

O’Neil’s critiques of decision algorithms are often convincing. However, she consistently falls into the fallacy noted by Buchanan in the comparison of the market vs. government option for resource allocation. She finds an algorithm less than perfect and immediately rejects its use without comparing it to available alternatives. Perfection is usually not usually the default.

Bob
Oct 3 2016 at 3:52pm

Some good points, I certainly agree people trust models far too much and rarely understand them.

Having said that, I think we’d help a lot more people — especially poor minorities — if we ended the war on drugs and reduced or eliminated all the things the government does that raises barriers to employment (wage price floors, obscene regulatory compliance, etc). Just look at the unemployment rate for various demographics — clearly there is a huge need to make it easier for people to get jobs.

I also got the feeling that there might be a little too much focus on differences in crime demographics that may be false or imperfect, without acknowledging the fact that there really are differences in serious violent crime rates along various demographic lines like sex and race and age. e.g. it’s not just an oddity of criminal justice models that young black men make up a disturbingly large fraction of all murderers — wildly out of proportion to that population size. This sort of stuff is seen not just in arrests and convictions, it also matches quite closely to what the victims themselves report in surveys (where the vast majority of the victims are in the same demographic as the attacker). You can also point to significant disparities in rates of child abuse, single-parent households, and other factors that have demographic-independent correlation to negative outcomes like poverty and criminality.

Nonlin_org
Oct 3 2016 at 5:50pm

Unbelievable how many people disregard the market. In this case (as always?) the guest has a vested interest in pointing out flaws and then presenting her skills as absolutely needed to “solve” the problem.

Yes, algorithms are flawed, and yes people make mistakes, but overall, the market works way better than the “benevolent tyrant” that wants to fix everything.

People heavily discount the information if inaccurate or when algorithms are new and unproven. Sure, we all look at the college ranking, but who ever blindly follows them?

Then there’s the separation between private and public: no one has any business dictating to the Googles of this world beyond deciding to not do business with them if their algorithms are too outrageous. As far as public, yes we have a right to know more because we don’t have a choice in doing business with the government.

The argument that ranking schools and teachers and inmates should be scrapped goes way too far. Minorities have always been discriminated against and then they had something to prove and they did and they rose from the bottom. Not getting a fair deal makes one stronger while “affirmative action” makes one weaker.

Michael Byrnes
Oct 3 2016 at 8:55pm

Nonlin_org wrotes:

[quote]overall, the market works way better than the “benevolent tyrant” that wants to fix everything.[/quote]

On this point, I would venture to guess that in many cases Cathy O’Neil would agree with you, except that she would see the hand of the “benevolent tyrant” in some of the algorithms she refers to as WMDs.

These algoritms don’t come about as emergent phenomena that result from an iterative market process – they are top down creations, literally the opposite of the emergent phenomena that can develop as a result of markets or other selection processes..

Indeed, one of her criteria for a WMD model is the lack of feedback and tuning – both from the model to those evaluated by it, and, maybe more importantly, from the model to itself as a means to fine tune itself over time. As it happens, feeback is a major characteristic of markets and a big part of the reason why markets actually work as they do. Top down models that do not assimilate feedback are an antithesis of markets.

One topic that unfortunately did not come up in this interview was O’Neil’s introductory discussion of baseball statistics – a major success of data and modeling. Over the past 15 years or so, the use of data by major league baseball teams to evaluate and optimize performance has revolutionized the game.

But O’Neil points out that there is a key way in which baseball models and analysis differ from many of the applications that she describes as WMDs. At the end of the day, teams use these models to help them win games. And baseball is very much a zero sum game – at the end of the day, everything reduces down to wins and losses. And that is feedback. A great looking model that isn’t useful in the actual prediction of wins and losses (and in helping teams improve their team’s ability to win) will not last very long in the world of baseball analytics. In baseball analytics, models can be improved iteratively based on their performance. Teams have their own proprietray models, of course, but the quality of the stuff that is public domain has improved dramatically over the past 20 years, because ultimately there is a ready source of feedback that allows for trying different ideas, keeping those that work, abandoning ones that don’t. That is also part of how a market works – by providing a constant source of feeback to buyers and sellers alike.

Now, contrast that to the teacher value added model that the DC school system was using to pay teacher bonueses and fire so called unproductive teachers. There is no feedback, so any flaws in the model will simply have paradoxical and suboptimal effects. Effects that can have a drastic negative impact on some people.

Russ raised a very interesting point – that evaluation of teachers based on student test scores seems bizarre, given everything that is involved in being a teacher. I think this is the result of two biases we have about information and decision-making:

  • bias towards things we can measure easily over things that are difficult to measure
  • a belief that “the best information we have” must therefore be good information

I think these come into play with the WMDs as well.

John Sallay
Oct 3 2016 at 11:01pm

I really enjoyed this episode. I think this is one of my favorites in quite some time. I was confused by her thoughts on the student loan market.

If she were to provide loans to students at what she considers reasonable rates, then built into that rate would be the costs of doing business. If she sets out to make a 10% profit, while the other guys want a 100% profit, she just needs to add in the cost of advertising. She is essentially saying that markets don’t work in internet advertising. That offering a superior product at a vastly lower price can’t work, because the other guys can spend more on advertising. It may not be easy to out an incumbent, but it happens all the time.

d clark
Oct 4 2016 at 12:12am

Climate change scientists, pro and con, use data, big data and extensive modeling, sometimes models that are linked to each other which are made by parallel- or sub-modelers. I wonder how Cathy O’Neil would approach the rigor of such endeavors. How much of a conclusion that we are offered today is predictive of tomorrow’s conclusion?

Max Ghenis
Oct 4 2016 at 1:28am

Cathy discussed auctions for payday loan ads. Google banned such ads earlier this year: https://publicpolicy.googleblog.com/2016/05/an-update-to-our-adwords-policy-on.html?m=1

Bob
Oct 4 2016 at 1:31am

@d clark

Climate change […] I wonder how Cathy O’Neil would approach the rigor of such endeavors.

Excellent question. Are her criticisms motivated more by ideological leftism, or an ethically neutral, unbiased, rigorous analysis — the ideal of philosophy and science?

Economists know that fancy models are nonsense in their field. And yet it’s not hard to find economists who don’t give much consideration to the possibility that modeling done as it relates to climate may also be suspect, or worse. We all know the model output is whatever you program it to be — the model output is just your facts and assumptions expressed in numbers. And yet to many it seems climate modeling in particular is a kind of dark art done by magicians that let’s them reliably peer into the distant future of the most complex system of chaos that’s easily accessible which we just barely know anything about (all the myriad cyclical phenomena, positive and negative feedbacks from weather and plants and sea animals and gas cycles, gains in reflectivity and heat exchange rate based on other unmodeled trace gases and clouds and precipitation predictions and guesses about decay lifecyles on sparse empirical evidence … it’s preposterous that it can make reliable predictions for a year or a decade or a century. In fact it hasn’t — look at the predictions vs. the least biased data set: the satellites that have very good resolution and coverage). If you look at the kind of assumptions they’re making, and the hard-coded values they’ve massaged into their algorithms to vaguely curve-fit some subset of climate variables we have empirical data to fact-check … though only using the same data the modelers already had and were curve-fitting (i.e. history, not successful future predictions). And yet somebody will reply “97%” and miss the point that they’re not being skeptical enough of what they’re being told, and they haven’t fact-checked their assumptions.

The catastrophic claims are ridiculous. Adaption to a little warming isn’t a big deal and has a lot of upside. Even practically speaking, a cold economic calculation tells us we’ll be richer in the future if we allow markets to operate and we’ll be better off paying for adaptation as needed (more in the future, more gradually, via private local things that are mostly easy things already well-understood). The catastrophic predictions are ridiculous with terrible empirical evidence to give them credibility — many are so easily disproved by just comparing the predictions with the outcomes we’ve observed. The models just keep getting more complicated and failing to make specific reliable predictions about the future. It can curve-fit some blips in a toy simulation of a climate that in no way represents reality, sure, but who cares about that? If you don’t fact-check these absurd claims when you hear them you’re doing yourself a disservice. Did you know we just heard that apparently more methane is released in the arctic during the winter…not the summer as the models assumed based on our wild speculation about what gasses will become more abundant in the summer vs. winter — literally no evidence. This kind of wild speculation about the composition of snow and ice and water and the their relationship to temperature is just one trivial example of probably millions of cumulative unchecked assumptions baked in to these models, and they haven’t demonstrated their ability to reliably make predictions. In fact they’ve clearly failed if you compare what the models predicted vs. what has been observed by looking at the best data objectively.

Vermonster
Oct 4 2016 at 7:31am

I was disappointed that measuring quality in healthcare never came up in your conversation. Paying docs & hospitals for quality sounds so easy. The devil is in the details of measurement! Stand by for more Green Mountain folly. http://vtdigger.org/2016/09/15/shumlin-optimistic-federal-deal-health-payment-overhaul/

[Full url substituted for shortened url. Please use full urls on EconTalk–Econlib Ed.]

Jon
Oct 4 2016 at 9:11am

I found this conversation often disappointing on account of a frequent lack of real engagement between the interviewee and Russ. (I.e., it felt like two people hitting tennis balls against backboards next to each other rather than an interactive activity.)
O’Neil also made some pretty extreme/flawed statements (e.g., “zip codes with high recidivism correlate with race, which shows police are profiling”) which could have served as basis of useful discussion if her assumptions were brought out and examined. Partly, problem seems that some guests really just rely on vehement assertion and come on to deliver their talking points.
As constructive criticism, I’d suggest Econ Talk checks out Glenn Loury’s interviews on the same topic and note how an exchange of (often difficult ideas occurs.

Kevin
Oct 4 2016 at 12:23pm

I enjoyed this episode and the insights and cautions from the guest. I was a little disappointed that she often applied her insights and modeling to what seem to me to be cliches.

For example, the discussion about crime and high crime zip codes sounded very different from my experience living and later working in high crime areas. I recently lived in the highest crime zip code of a major city for many years.

During that 6 years I had no interaction with the police (almost none, there was the time the cop waved me back inside from my porch because they were chasing someone that was on my neighbors porch, and the time the police investigated when some local boys painted profanity and threats on my wife’s car after she asked them not to run through her garden). During the time I was there the majority of my African American neighbors never had any interaction with the police. If someone had surveyed my street and asked whether we wanted more police or less we would have all asked for more. Because the police are in these zip codes not because they think they are fun and entertaining places to be, or sillier still racism, but because as the proverbial bank robber said, that is where the money is. The violent crime is in those zip codes. There were 8 murders one summer within a mile of my house. We wanted increased police presence. I have no clues about whether or not the algorithm for recidivism is “racist”, but I do know that it is easy to avoid the police even in a high crime neighborhood and the vast majority of people in those neighborhoods of all races would welcome more policing. The guest conflates poverty with crime, but this is a large insult on poor people where the vast majority of poor people do no crimes even though many criminals are poor. Incidentally, I also grew up very poor and never had a joint in my pocket, urinated on the street, or had any of my family interact with the police.

I found the discussion of teachers very good and the best example examined. It is hard to judge teaching, but I think you could make better algorithms by surveying students and parents. At my childrens schools there are teachers every parent wants to avoid – maybe they are clued into something.

Isn’t it fairly paternalistic to think that if I get an ad for an Apple Watch I might buy, its good, but if a poor person gets an offer for a loan its predatory? If you don’t trust poor people to handle their own decisions, it hardly matters if they are getting targeted ads, they should probably have a fiduciary managing their affairs. I don’t understand why someone needs a pay day loan (growing up poor made me a savings fanatic) but if they do and can explain why to you, is it still predatory? Do they just not see the big picture, or do we just not see the small picture?

I am grateful for the guests overarching theme which I found very insightful, but I would have liked to hear more examples like teaching and less examples laden with cliches about race and the poor.

David McGrogan
Oct 4 2016 at 5:46pm

This is exactly the kind of thing I come to Econtalk for. I found it immensely thought-provoking and important – thanks.

The point about race and sentencing is surely that the more an area is policed then by definition the more people living in that area (poor and/or black) will have police interactions. They will then have higher recidivism risk scores simply by default. That is unfair and unjust. It’s not that the police are racial profiling. It’s that the way the algorithm is used has unforeseen pernicious effects which are equivalent but not conscious.

Madeleine
Oct 4 2016 at 7:03pm

I’m so happy Cathy brought up that just because something works doesn’t mean it’s right.

Racially discriminating might lift the bottom line, but that doesn’t make it OK. We’re not North Korea: we don’t punish for multiple generations.

If your algorithm is implemented with guilt by association, there’s a problem.

Julien Couvreur
Oct 4 2016 at 8:18pm

Regarding “predatory” ads on Google, in trying to maximize google’s revenue it must multiplying the bid to the likelihood of click.
Assuming the “good” lender offers an attractive rate, that likelihood will lean strongly in his favor.
The”predatory” lender doesn’t just win the ad space because he bids higher. He may have to bid a lot more…

Bob
Oct 4 2016 at 9:58pm

@Madeleine

If your algorithm is implemented with guilt by association, there’s a problem.

Stated that way I suspect you won’t find many who would disagree. 🙂 Punishing the children for the crimes of the father is very Old Testament, very tribal, very collectivist.

Though I wonder how many people would agree on what counts as guilt-by-association vs. relevant causal factors. For example, there is peer-reviewed scientific literature suggesting a causal link between child abuse and negative outcomes for that child as they grow up, including mental health problems, drug abuse, criminality, violence, etc. If someone is abused as a child that might well be predictive of their likelihood to engage in more crime. Let’s stipulate that for the sake of argument. Should a history of being abused as a child be a factor that potentially lengthens their sentence? What if they were “just” spanked, which appears to have the same directional relationship to more serious abuse, though with a lower magnitude? Should it still be a factor, though perhaps with a lower weight?

What if we take seriously the peer-reviewed literature on the inverse link between IQ and impulse control, and the inverse link between IQ and violence? Should we give prisoners an IQ test before we decide their sentence?

What about the peer-reviewed literature linking children raised without a father to negative outcomes — should an offenders’ lack of a father be a factor in the length of their sentence?

Would anyone really be shocked if there was a causal link of nontrivial significance between things like childhood experiences (e.g. abused, fatherless), mental ability (e.g. IQ), and criminal outcomes? Ignoring the science for a moment, it seems intuitively obvious — common sense. There are always exceptions, but we all know that parents who abuse their children are very likely to have been abused themselves. Should we ignore such information about causal links because it’s “guilt by association”? Or do we accept such links as valid and having some at least limited weight for estimating the probability of recidivism? What counts as guilt by association depends on what you accept as a fact.

As far as I can tell the whole notion of increasing sentence duration based on speculation about recidivism seems to miss the point. Shouldn’t we look more seriously at why our existing methods of criminal justice fail to reform offenders? Maybe we’re approaching the problem all wrong by focusing on sentence duration vs. recidivism and we should instead think about counseling and/or education and/or work programs that might provide restitution to the victim (or whatever works). Or maybe the opposite is true for some people and we should lock them up and throw away the key instead of debating about marginally longer sentences.

I personally know someone (~40 years old) who is a dangerous criminal with a very long rap sheet (currently in prison) that I don’t believe is capable of reform, in part because of brain damage resulting from long-term abuse of hard drugs. They’ve been given endless chances, tried to be helped by many people, in jail many times, in prison several times, and no matter what they end up breaking the law again after at most a few months. I fear for their safety and the safety of the public whenever they’re not in jail. They’re not very smart, have a short fuse, little impulse control, and a history of recklessness, violence, theft, drug abuse, etc. This is a hardened criminal — a heavily tattooed neo-Nazi with serious mental health problems. I don’t think they should ever be let out of prison, for both their own safety and the safety of everyone else. And yet they keep getting out. I’d say this person is a perfect example of the criminal justice system being broken. It’s not always the case that the system puts good people in jail for too long. Sometimes it does the opposite.

Luke J
Oct 5 2016 at 12:45am

Good stuff.

Nick Nahat
Oct 5 2016 at 1:27pm

Daniel Kahneman recently wrote an article in the HBR titled Noise: How to Overcome the High, Hidden Cost of Inconsistent Decision Making.

He contends that the cost of ‘noise’ in making inconsistent value judgments by professionals is extremely high–an issue addressed in this podcast briefly.

It would be interesting to be able to put an actual cost in some manner on the ‘savings’ from eliminating inconsistency with an algorithmic decision, or value created or destroyed, as the case may be.

In any case, Kahneman’s take is well worth reading after listening to this podcast for further thoughts on the subject!

Madeleine
Oct 5 2016 at 3:39pm

@Bob

You are talking about a totally tangential issue. The exact problem with the “guilt by association” point I was trying to make is exactly the “guilt” part. Black people, by virtue of being black, have not done anything wrong. Yet the racist algorithm would deem them to be ipso facto “guilty” (of credit default, of car accident, of whatever) just because they are black.

In your example, the prisoners have already committed crimes. There is no assumption of guilt: if they are sentenced, they have been judged guilty of those crimes. It is not the same situation at all, but a wholly separate issue of the fairness of sentencing. They are being judged (fairly or unfairly) for the crimes they have committed in your example. In the racist algorithm, they are being judged for “crimes” they have NOT committed.

Bob
Oct 5 2016 at 5:37pm

@Madeleine

In your example, the prisoners have already committed crimes. There is no assumption of guilt: if they are sentenced, they have been judged guilty of those crimes. […] In the racist algorithm, they are being judged for “crimes” they have NOT committed.

Thanks for your reply! As far as I understand it the algorithms are being used to help inform sentence duration (and things like parole eligibility, bail, etc). So the people impacted by the algorithms have already been found guilty. It’s just a question of how long we lock them up for. Dr. O’Neil expressed concerned that people are getting longer sentences essentially because of the zip code they live in (and similar), which I thought you were saying is guilt by association. And I tend to agree. Am I mistaken about these things?

Here’s what Dr. O’Neil said:

But I guess one of the [problematic algorithms] I worry about the most, […] is a family of models, actually, called ‘recidivism risk scores,’ […] if you are a higher risk of recidivism, then the judge tends to sentence you for longer.

I was trying to point out that what counts as guilt-by-association vs. relevant causal factor is not necessarily clear — it depends on what you accept as a fact. If you accept as a fact that being abused as a child makes you more likely to commit crimes, then should that be a factor in your sentence duration if you were abused as a child? What if you look at the wildly different rates of crime by race, should that be a factor in sentence duration? How about age? Poverty? What background information is guilt-by-association vs. relevant? I think that’s a tough question to answer unless we eliminate the use of algorithms entirely.

What do you think — should algorithms that help inform sentence duration be eliminated entirely? If not, what information is fair to use? Sex, age, race, income, abuse as a child, IQ, raised by a single parent? Maybe just their past criminal history and nothing else? The algorithms probably won’t have much success at predicting recidivism if they’re so constrained. Where do you draw the line?

Jim Ellison
Oct 5 2016 at 9:37pm

Statistics work well for populations but poorly for individuals.

Greg A
Oct 6 2016 at 1:11am

Thank you to Kevin for saying almost everything I was thinking.

Trent
Oct 6 2016 at 11:07am

An interesting discussion as always, though I was cringing at the end when Ms. O’Neil seemed to advocate regulation/regulators as the solution to her problem.

I enjoyed the depth of your discussion on colleges, and particular the US News & World Report rankings. It’s ironic that a defunct magazine continues to carry so much weight in this area – they seem to only be known these days for their rankings, in fact.

This was not the proper guest to discuss this question, but I wonder why no other college ranking scheme has emerged as the leader. Was it because US News was first, and that first player is hard to knock off the mountaintop? Do colleges actually support the US News rankings (despite complaining about them in public) vs another system that might actually have cost as a factor? Or are there other factors in play? Methinks this would be an interesting discussion topic with Mike Munger should you ever go back to the multiple-issue format/timed/bell format with him.

don l rudolph
Oct 6 2016 at 3:17pm

The segment on teacher evaluation reminded me of a time I was evaluated as a drafter at an engineering firm. They were trying to evaluate drafters by how much time it took to draw up a sheet of details. I think they perceived me to be a mediocre performer and wanted to prove it with math. The fact is the best drafter could have a bad score if he was working with a bad engineer and the worst drafter could have a good score if he was working with a good engineer. Luck was on my side, I was working with an engineer who gave me hand drawings when they were totally thought out so I could draw it once with no changes. The engineer/managers were left scratching their heads, not being able to prove what I assume they were trying to prove. I think their exercise would have been a better measure of an engineers performance or even their own abilities as managers. It is a good example of how blind math worshipers can be to all the factors going in to what they are trying to measure.

Chris
Oct 7 2016 at 1:48am

great episode!

What I found interesting was the hidden bias of the humans. When Cathy and Ross discuss the criminal system they not once mention gender, although the system is heavily stacked against males. When they discuss getting jobs in the tech industry, they immediately take up the underrepresentation of women.

Could be that host and guest conform to the human algorithm to overemphasize discriminations against females (which no doubt exist) and underemphasize discriminations against males (which no doubt exist as well).

Wish we had a learning machine to sort this bias out 😉

Michael Byrnes
Oct 7 2016 at 7:00am

Trent asked:

This was not the proper guest to discuss this question, but I wonder why no other college ranking scheme has emerged as the leader. Was it because US News was first, and that first player is hard to knock off the mountaintop? Do colleges actually support the US News rankings (despite complaining about them in public) vs another system that might actually have cost as a factor?

I think “first mover” is a huge advantage for US News. As to whether colleges “support” it, I think it is clear that they at least tacitly do – a lot of their development activities over the past few decades do tie in with the criteria in the rankings, and that can’t be an accident.

It is a great example of one of the problems with these kind of models. Whether the premises of any particular model are correct or incorrect, they become a goal to shoot for.

SaveyourSelf
Oct 7 2016 at 9:03am

It’s taken me some time to sort through my thoughts on this conversation. There was much about it to like. To begin with, Cathy O’Neil is really nice. I find myself rooting for her even when I disagree with her. I also like that she offers a very different perspective than Russ Roberts. Her background is mathematics, not economics, and she believes regulation and central planning are good solutions to problems–free 4 years state schools for example. Despite her comfort with socialist solutions, she tries to avoid being pulled out of her area of strength, mathematics. Russ goes fishing with her several times throughout this interview, inviting her to venture into economic topics, but, like her last interview, she is mostly successful at sticking to her strengths.

Cathy’s main concern in this interview was judges using recidivism risk scores when sentencing convicts. If what she said is true, that’s alarming, but I may not understand her fully. I think she argued that there are at least two potential problems with the machine learning algorithms. One problem is that the courts are using results from algorithms calculated using information that is illegal for courts to collect and use. The second problem she called a negative feedback loop, which I think means the formula for determining the risk score uses information that is the same–or a close substitute for–the formula’s own output, meaning that the formula’s output in the past serves as an input to the very same formula in the future, necessarily influencing its own future outcomes, creating a loop.

Nick Nahat, in the comments section, added nicely to the conversation when he mentioned an article by Daniel Kahneman which also mentioned a usage of predictive algorithms in the justice system, except Kahneman mentioned they were used in decisions about bail, not sentencing. I can see the rational for using predictive tools with bail, but I can’t imagine any benefit from using recidivism risk when sentencing someone following conviction. I wonder what the judges think to gain. I suspect it is an attempt to reduce future recidivism by increasing length of jail sentences. But that’s the equivalent to arbitrary decisions by justices, which means they are not operating by the RULE OF LAW.

Katrin
Oct 7 2016 at 5:55pm

Regarding the ballooning of college administrations (48:14), I’d like to recommend Ginsberg’s “Fall of the Faculty”, see https://www.goodreads.com/book/show/10764149-the-fall-of-the-faculty

Robert Swan
Oct 7 2016 at 9:17pm

Ah, maths abuse — one of my favourite topics. And there were repeated hits on my hobbyhorse of choosing which axis you’re going to distill some complex multi-dimensional thing down to. Comments have been good too.

Perhaps the most important observation in the talk was when Dr O’Neill said:

“When do we see these magic bullet algorithms be[ing] used? … The more complicated and societal, and, you know, taboo, a topic is, the more likely you are to come up with, to see something emerge along these lines.”

Pretty well sums up my view and is the reason why I hold all these “sophisticated” models in very low regard. Just because you have a difficult to explain model doesn’t mean that it successfully models the difficult to explain reality (climate models being today’s best example).

For no obvious reason, when the recidivism models were being discussed I was reminded of Terry Gilliam’s “Miracle of Flight”. At one point a fellow pulls on a pair of claws, feathers and a beak and jumps off a cliff. Mathematical techniques might well bear out the wisdom of his approach, but he does not so much fly as plummet.

When all’s said and done though, even more pernicious than the these complicated mathematical models, is the magic money model. I don’t know about the U.S., but in Australia you often hear politicians “proving” that they have delivered an improvement in service (health, say, or education) because they’ve spent so many more millions of dollars on it. Now who’d have thought that granite and marble foyers and towering parking complexes wouldn’t have improved patient outcomes?

Oh dear. A glimpse of my cynicism seems to have slipped through.

On the comedy relief side, I chuckled at Russ being too politically correct and describing himself as the “straight person”. I think that belongs in a different context. It is getting a bit daft when a man feels he can’t refer to himself as a man.

Lastly, I agree with other commenters on Dr O’Neill’s view of payday lenders. It’s not like there is no competition in that market. I also suspect they wouldn’t mind having the profitability of some of the not-just-payday lenders.

David Zetland
Oct 9 2016 at 5:32am

I’m surprised to see so many comments proclaiming that markets will solve the issues Cathy raises. In the podcast, I thought that Russ was not looking too deeply into what she was saying about asymmetric incentives being magnified by datamining.

Her example of payday loans or ripoff universities was on the money. Russ’s reply that “better lenders could set up a charity to solicit funds to compete for ad space” was about the longest struggle against reality that I’ve heard for awhile. OBVIOUSLY, as I learned from Russ in Econ 1 in 1989, demand [for ads] depend on the profits to be made. Her objections against such a tilted playing field with respect to “vulnerable” consumers is about as controversial to me as advertising to children. Anyone who’s been paying attention the results in behavioral economics knows that advertising and deception work and work better (for the advertiser) when the target is vulnerable. I’ll stop.

Anyway, this was a brilliant episode, as this problem is indeed way bigger than people imagine. I work with data and saw the dangers of the subprime crisis. The dangers here are indeed of the same magnitude — and far more devastating to our lives and welfare.

The Original CC
Oct 9 2016 at 8:25am

Great guest, great interview, and good discussion on this web page. Sounds like a good book, Cathy.

Libertarian Heretic
Oct 15 2016 at 10:07pm

Fun and interesting discussion as always Russ. But one criticism is in order.
I used to feel market fundamentalism was a needless and pejorative term. Not anymore. And I think you have the same intuition particularly when you speak about David Autor’s work. It’s much better to say that in strictly utilitarian terms a freer market is always better-the greatest good for the greatest number. But there are people who are harmed and absolutely worse off.
You had a market fundamentalist moment when talking about for profit schools and targeted ads. It seemed like you were really resistent to admitting that they may be a situation where many people are made worse off. Personally I think sunshine is the best disinfectant and making poor people aware of the possible suboptimal outcomes is 90% of the battle. No major legislative action required. At most maybe outcome disclosure requirements.
But you cant fix a problem if you ignore it. Resisting the suggestion based on the intuition that the goods wouldn’t be purchased absent real utility is slightly disturbing to me. Wouldn’t it have been better to give voice to skepticism with something like ‘Are the data clear about the disproportionately negative outcomes for poor people? Are they relatively speaking worse off than the middle class guy with college age kids and a gambling problem who gets Vegas ads and deals?’ Especially when you got to the point of suggesting charities could buy the ads instead it really sounding like grasping at straws. If I were still a lefty I would have left smug and content in my worldview after hearing that. Absent some miraculous shift in the attitudes of nearly all Americans that wouldn’t happen. That would only happen in a world were the two major US parties were the Minarchist and Anarcho-Capitalist. We may as well be speaking about parallel universes at that point. I know many people would bristle with explosive anger at the suggesting of donating money so that a corporation would take it in exchange for not exposing poor people to predatory ads.
Your guest didn’t seem to be advocating any new agencies or laws so I don’t know why it seemed necessary to stop her cold in her tracks at the first step. Couldn’t the tragic consequences this brings to poor people be given a little more attention. Being free market we will always be depicted as cold blooded no matter how far we venture in an inquiry. But if we don’t take concerns seriously we will just look out of touch and deluded to boot.
Don’t take it as some kind of chastisement. I would much rather have you as king for a day than her.

Libertarian Heretic
Oct 15 2016 at 10:16pm

On a lighter note.
When I studied Criminology one claim that really stuck out was that whites and blacks commit petty crime and drugs offenses at identical rates but that blacks were much more likely to face criminal sanction.
Maybe the problem with stop and frisk is the data set. Imagine if instead of using the existing data we randomly stopped and frisked people based on the average demography of America for a year and then used that data set. We could overcome huge problems of inherent unfairness. And after all it would probably be no less an infringement of 4th amendment rights for people unfairly frisked than the existing practice used on innocent people.

Nicolas
Oct 17 2016 at 9:38am

Great podcast. It makes me think about this fascinating paper explaining why judges are NOT bayesian: because their role is not to find guilty people, but to prevent crime.

Demougin, Dominique, and Claude Fluet. “Rules of Proof, Courts, and Incentives.” The RAND Journal of Economics 39.1 (2008): 20-40. Web.

https://www.jstor.org/stable/25046362?seq=1#page_scan_tab_contents

Andy McGill
Oct 23 2016 at 12:34am

Sure formulas for sorting people should be fair game for examination, and especially if judges use them to set sentences for those found guilty. I doubt that judges use them as she described, but I am not going to do all the work to prove that doesn’t happen anywhere.

But race is explicitly used in college admissions, and found constitutional ONLY because it is asserted to improve the education of white students, which has zero science behind those assertions.

I would suggest that credit scoring is a very raw type of data algorithm but is amazingly fair because even relatively poor people can have great credit scores based solely on the idea that they do what they promised to do. That alone is a huge step for human society.

If we just stop assuming poor people and minorities are stupid and not able to comprehend modern society, then suddenly a huge number of ways to improve their lives would be visible.

Comments are closed.


DELVE DEEPER

EconTalk Extra, conversation starters for this podcast episode:

This week's guest:

This week's focus:

Additional ideas and people mentioned in this podcast episode:

A few more readings and background resources:

A few more EconTalk podcast episodes:


AUDIO TRANSCRIPT

 

Time
Podcast Episode Highlights
0:33

Intro. [Recording date: August 26, 2016.]

Russ Roberts: One of the great titles of all time, Weapons of Math Destruction. What are they?

Cathy O'Neil: They are algorithms that I think are problematic. And I can define them for you. They have three properties. The first is that they are widespread--which is to say they are being deployed on many, many people to make very important decisions about those people's lives. So it could be how long they go to jail, whether they get a job or not, whether they get a loan. Things that matter to people. That's the first characteristic. The second is that they are secret in some sense: either there's a secret formula, that the people who get scored by these algorithms--usually a scoring system--it's either a secret formula that they don't really understand, or sometimes even a secret algorithm that they don't even know that they're being scored by. And then finally, they are destructive in some way: they have a destructive effect on the people who get badly scored or they sometimes even create feedback loops--pernicious feedback loops--that are overall destructive to society as a whole.

Russ Roberts: Let's talk about those feedback loops, because you give some examples in the book of where I would call it a misunderstanding of a false correlation--or not a false correlation: a correlation that's not causative--is misinterpreted and it feeds back on itself. So, can you give us an example of that?

Cathy O'Neil: Sure. Pretty much every chapter in my book has an example of one of these problematic algorithms. But I guess one of the ones I worry about the most, if we want to jump in, is a family of models, actually, called 'recidivism risk scores,' that judges all across the country--

Russ Roberts: That's 'recidivism,' right?

Cathy O'Neil: Recidivism risk, yeah.

Russ Roberts: The risk of getting back on the bad side of the law and ending up in jail, for example.

Cathy O'Neil: Right. So they are basically--they are scored for people who are entering jail or prison. And 97% of people eventually leave. So the question is: How likely is this person to return? And so these algorithms measure the likelihood for a given criminal defendant to return. And they are given, like, basically--there are categories: either it's low risk, medium risk, or high risk. And that score is given to the judge in sentencing. Or, sometimes in paroling, or even in setting bail. But I'll focus on the sentencing. So, it might not be obvious, and it's actually not obvious. We can talk about it. But if you are a higher risk of recidivism, then the judge tends to sentence you for longer. And so we can get into what I think is problematic about the scoring systems themselves. But let me just discuss the feedback loop. The feedback loop here, which I consider extremely pernicious, is that when you are put in jail for longer, then by the time you get out of jail, you typically have fewer resources and fewer job prospects, and you are more of an outsider--more isolated from your community, you have fewer community ties. And you end up back in jail. So it's a kind of--it creates its own reality. By being labeled high risk, you become high risk. If that makes sense.

Russ Roberts: Yeah. So, that's a theory--right?--the idea that prison is not much of a rehabilitation experience and that in fact it could be opposite. Right? It could be an opportunity if you spend more time with people who, instead of making you a more productive person in legal ways make you a more productive person in illegal ways when you do get out. Do we know anything about whether that's true? It's a hard question to answer.

Cathy O'Neil: There certainly have been studies to this effect. And, by the way, I'm not claiming that this is inherently true. I mean, it's theoretically possible for prisons to be wonderful places where people have resources and they learn--you know, they go to college and they end up, because they spent a full 4 years there instead of 3, they end up with a college degree. And it actually improves their life after prison. But the studies that we know about don't point to that.

5:32

Russ Roberts: Okay. So carry on. But that's a fact of--that's an issue of how, whether present sentences should be structured the way they are and whether prisons should be, what the experience should be like of being in prison. Some would argue it could be a deterrent effect; maybe it's not in practice. But how does the data part of this interact--the riskiness and the length of the sentence, to have a feedback loop that's pernicious?

Cathy O'Neil: Right. So, the scores themselves are calculated in problematic ways. So the first thing to understand about these scoring systems is that they basically--there's two types of data that go into the recidivism risk scores. The first is interactions with the police. And the second is kind of questionnaires that most of these scoring systems have. And then they use all of this information--the kind of police record with the answers to the questions--and they have a logistic model that they train to figure out the risk of coming back to jail.

Russ Roberts: A logistic model is just a technical style of--an attempt to isolate the impact of the individual variables in this kind of 1-0 setting: Come back or not come back.

Cathy O'Neil: Right. Well, it's actually a probability, but you have a threshold. If it's above, like, 65% or something, you'll say it's likely to come back. I don't know the exact thresholds they set. Nor do I actually have a problem with using a logistic regression. I don't even have a problem with calculating this probability. What I have a problem with is sort of interpreting the score itself. So, to be clear, if we have to take a step back and understand how data and the justice system works, and what kind of data we are talking about here. And so, you know, everybody who has been alive for the last few years, has seen, has looked around and seen all these, you know, black lives matter movement issues. A lot of--the Ferguson Report, the recent Baltimore Report, reported in the Chicago Police Department Commission Report--all point to police practices which, at the very least we can all agree upon are uneven. So there's much more scrutiny of poor and minority neighborhoods. There's just many, many more police interactions in those communities. Um, which leads to an actually biased data set coming out of that practice. So, I already have a problem with that kind of data, going into these recidivism risk scores. If--and I just want to be forward, I want to object. I want to make the point that if we were only taking into consideration violent crimes, I would have less of a problem. But we're not. We're taking into consideration a lot of things that we consider broken-windows, policing type interactions with the police.

Russ Roberts: Explain what that is.

Cathy O'Neil: That's the stuff like nuisance crimes. Like, having a joint in your pocket. Peeing on the sidewalk. Things that are associated with poverty, more or less. And things for which poor people are much more likely to get in trouble with the police than richer people or whiter people. So, that's one of the problems: it is that the data coming in from the police interactions is biased. The other thing is that often the questions that are asked in the corresponding questionnaire are actually proxies for race and class as well. So, there's a very widespread version of this recidivism risk score called the LSI-R (Level of Service Inventory-Revised). One of the questions on the LSI-R is, you know, 'Did you come from a high-crime neighborhood?' So, it's a very direct proxy. The answer to that question is a very direct proxy for class. There is another question which is, 'Did your, do family members, in your family, have they historically had interactions with the police?' This is obviously again--it goes back to if you are a poor, black person, then the chance of your saying yes to that are much higher. I would also point out that that's a question that would be considered probably unconstitutional if we were asked in an open court--if a lawyer said, 'Oh, this person's father was in jail, Judge, so please sentence this person for longer.' That would not fly. But because it's embedded in this scoring system, it somehow gets through. And the reason it gets through is because it's mathematical. People think that because it's algorithmic and because it's mathematical--

Russ Roberts: It's science--

Cathy O'Neil: It's scientific, yes. That they think it's objective and fair by construction. And so, the biggest point of my book is to push back against that idea.

10:20

Russ Roberts: And that's where you and I have tremendous common ground. Right? So, in many ways--we'll turn to some other examples in a minute--but in many ways a lot of the examples that you give, are just, to me, really bad social science run amok. Which becomes more possible when there's more data. Which is what the world we're increasingly living in--

Cathy O'Neil: Yeah. I would make sure that--right up front that I'm not against using data.

Russ Roberts: I know.

Cathy O'Neil: But I'm not--

Russ Roberts: That's good to say. I know you're not, but it's good to say.

Cathy O'Neil: I'm a data scientist. And I promote good uses of data. What I'm seeing more and more, and the reason I wrote the book, is very unthoughtful uses of data being used in very high impact situations. Unfairly. And so we might agree completely. I don't know if we have disagreement, Russ. But I'm sure you'll find it if we do.

Russ Roberts: We'll dig up some. But it's an interesting example. You are a data scientist. I'm an economist. And of course we're in favor of using data and evidence and facts, but using them well. And using them wisely. It's an interesting challenge, how to react to that: if it becomes increasingly difficult to do that. So, to come a narrative that you write about as well in the book, which is financial issues: I have friends who argue, 'Well, of course we have to use technical, mathematical measures of risk, because that's the best we can do.' And that's certainly true: That's the best that we can do in most cases. Sometimes. But what if, by putting the risk into this mathematical formulation, you become insensitive to it? You start to think you have it under control? That, psychologically, even though you know it's a flawed measure, and you know when you could list all the assumptions that went into it that you know were not accurate about, say, the distribution of the error function or the likelihood of a black swan--even though you are totally aware of that day after day, of looking at the data and your model and saying, 'Everything's fine today,' you get lulled into a false sense of security. In which case maybe this is a weapon of math destruction. And it's very difficult for technically trained, rational, left-brained people to say, 'Yeah, I shouldn't overuse that because I'm prone to use it badly.'

Cathy O'Neil: Yeah. You bring up a really important point. I don't have a simple answer to it. But the truth is, it's really difficult even for trained professionals to understand uncertainty on a daily basis. With a lot of these things, the uncertainty is extreme. It's not the same thing as, say, the Value at Risk measure, which can be deceiving, even for people who kind of understand its failings. If that's an example you had in mind.

Russ Roberts: That is what I had in mind.

Cathy O'Neil: I mean, let's just go there. Value at Risk--I was a researcher at RiskMetrics, which kind of developed and marketed and sold for Value at Risk. It was clearly flawed. Of course, it was easy for me to say--I actually got there in 2009. But I feel like, if somebody had been in charge of being worried about Value at Risk being misinterpreted, they wouldn't have had to go too far to find the way people were--and I'll use shorthand here--the way people were stuffing risk into the tail in order to gain the 95-var risk measure. And I don't want to get too wonky here. But the point being that we had a sort of industry standard of worrying about 95 var. Sometimes 99. What that meant was that we never looked further afield than that kind of risk.

Russ Roberts: Right. That's a perfect example. I assume by 95 or 99 you mean 1 in 20 or 1 in 100 chance.

Cathy O'Neil: One in 20. Exactly. The worst return in 20 days.

Russ Roberts: So, when you have a 99 and that's your standard and it never gets close to it, after a while you start to think everything's great. And of course that's not true. Let's go back to the prison example. You are a consulting firm--I assume; this is a privately designed, for money, for profit measure that some Department of Justice grant has funded or is paying for. And who wants to say that, 'Oh [?] I'm not sure we should really use this because it's got all these proxies that might not be accurate for what we're trying to measure. So, I would just use it as a crude rule of thumb. But I wouldn't rely on it.' But that's not really a very good career move. It's not a very good move for a person at the Department of Justice, let alone the consulting firm. So isn't that part of the problem here, is the temptation to soft-pedal the problems in these kind of models when you are being paid, on either end, as the buyer or seller?

Cathy O'Neil: I mean, great point. I would even emphasize that in the case of the justice system, what we're dealing with currently is a very, very problematic situation, where judges are probably less reliable than these terrible models. So, in other words, I wouldn't say, 'Hey, let's go to the old days,' when we just relied on judges who were often more racist than the models I'm worried about. What I am worried about--and yes, so that's one thing. The next thing is, 'Yes, I built a model but it's not very good.' Right? No one wants to say that.

Russ Roberts: 'But it's still a bargain. You got a good deal, trust me. It's great for what it is.'

Cathy O'Neil: That actually is the context for the--they could probably honestly say, 'I built a model and it's better than what you have.' Right? Yeah. And there's another thing going on, by the way. I interviewed somebody, like, you know, on background, who is a person who models, who builds recidivism risk models. And I asked him what the rules were around his models. And in particular I said, 'Well, would you use race directly as an attribute in this logistic regression?'

Russ Roberts: Let me guess.

Cathy O'Neil: And he said, 'Oh, no, no, I would never use race--'

Russ Roberts: Of course not--

Cathy O'Neil: 'because that would be--that would cause racial disparities in the results, in the scoring.' And I said, 'Well, would you ever use zip code?' And he said, 'Yeah, maybe.' Well, that's a proxy for race. In a segregated country like ours, what's really the difference? And he said, 'Yeah, no you're right, but it's so much more accurate when you do that.' It is more accurate. But what does that mean? When you think about it, what that means is, well, police really do profile people. So, yes, it is really more accurate. In other words, this doesn't--we want mathematical algorithms and scoring systems to simplify our lives. And some of them do. Like, I'll tell you one of my favorite scoring systems. If you've visited New York City, it's the restaurant grades. You know, there's a big sign, a big piece of paper in every restaurant window saying, you know, what their score was, last time they got the Sanitation Department came and checked out their kitchen. And you know not to go to a restaurant that doesn't have an A grade. Right? Why does that work so well? Because it simplifies a relatively thorny and opaque question, which is: Is this a hygienic restaurant? And we don't know if it's a perfect system. But it does really have this magic bullet feel to it, which is: That's all I need to know. Thank you.

Russ Roberts: Well, we know it's not a perfect system because on the night you ate there maybe the people didn't wash their hands that day; and it was three weeks after the inspector and everybody's falling back into [?] behavior--

Cathy O'Neil: Of course, of course. Absolutely.

Russ Roberts: You raise an important issue throughout your book, which is: These kind of simple indices, like, what's the probably of recidivism--which is a big, complicated thing, obviously, that's very person-dependent but we're going to simplify it as a function of 8 variables. Or the same thing is true from the grade from the Department of Health. The problem with a lot of these is of course that they can be gamed by the people to achieve a high score that doesn't represent high quality.

Cathy O'Neil: So, it can be. And actually there was an interesting blog post about the prevalence of restaurant scores--so they started out as numbers, I guess, and then they turned into grades--that are just above the cutoff. So, there is clearly something slightly unstatistical about that. But at the same time, you know--and we also don't really know what we need in a clean restaurant. But it is, crudely put, a good way for us as consumers.

Russ Roberts: There's some information there. That's what I would say.

Cathy O'Neil: There's some information there. The problem with recidivism scores is what we've done is we've basically given the power to a class of scientists, data scientists, who focus on accuracy only. And when, again, when I talked to the person I interviewed, I said, 'You know, is accuracy--is the only thing we care about accuracy?' I would care more about causality, right? And you mentioned the word 'causal.' Like, the question should not be, 'Is this person poorer?' And are they poor minority people. The question should be 'Is this person going to commit another crime that we can prevent?' And, like--in other words, they can't do anything about having grown up in a poor neighborhood. For that fact to be used against them doesn't seem right.

20:28

Russ Roberts: I want to dig into this a little deeper, because if things go as planned this episode will add shortly after a conversation with Susan Athey, who is a machine learning econometrician, who makes a distinction in our interview between prediction and causation. And that's what you're talking about, I think--we should clarify this and go a little deeper. When you say 'accurate,' it very well may be the case that people from this particular zip code or people with these characteristics have a higher chance of committing a crime when they come back out of jail. And therefore ending back in jail. And that would be the "prediction" part: it fits the data well. These characteristics "predict"--they may not predict for this person very well but they do predict with these classes of people--these groups--according to the variables that you've actually measured. And that is not necessarily what we care about in a justice system; because, I think your argument--correct me if I'm wrong--you're argument if we observe in these neighborhoods a lot more police presence we may actually see more types of police interaction and even arrests and sometimes crimes of smaller versus larger amounts that will confirm the model in the sense that it's "predictive," but it's not really describing the fact that these people are more likely to necessarily be bad people, but they are just more likely to get swept up in a police problem. Is that kind of what you're getting at?

Cathy O'Neil: Yeah. That's a really good description. Let me just reframe that a little bit, which is: I would look at the system as a whole. And it's not just police. It's also the way our jobs work for poor people, or don't work. The way our economy offers opportunities to [?] or doesn't. But I guess the simplest way to put it is that when you give someone a score this way and then you hold them accountable in a certain sense--by which I mean judges actually sentence people to longer if they have higher scores--in a very direct sense you are punishing them for that score. And so you are laying the blame on them. You are pointing a finger at them; you are saying, 'You have a bad score; I'm holding you responsible for that.' And the question is, of course, 'Why do you have a bad score?' Is that because of what you've done?

Russ Roberts: And who you are.

Cathy O'Neil: Or is it because of the police system you live in? Is it because of the economic opportunities you are given or not given because of who you are, how you were born, how you were raised? And the point is that that's a very hard question which I'm not equipped to answer by myself. But I am equipped to say that as a data scientist it should not be my job to decide this.

Russ Roberts: Yeah. I just want to clarify what I said before, because I think it might be somewhat confusing. If I fit the data on what's the probability of somebody coming back into prison, I may have variables in there that correlate with that probability, but they are not causal. It just happens to be the case that people from these neighborhoods because of a police presence at certain time or different allocations of resources or whatever it is--school quality--it may turn out to be true. It doesn't imply that this person in particular, when they go back into that neighborhood, will have that experience. Because there could be a correlation that's not causal. And I think that's the distinction that machine learning is unable to make--even though "it fit the data really well," it's really good for predicting what happened in the past, it may not be good for predicting what happens in the future because those correlations may not be sustained.

Cathy O'Neil: And we hope them aren't, in that situation. Let me give you another example; and you said it very well. It's a thought experiment that your listeners might enjoy. I'm imagining that there's a tech company and they want to hire engineers. That happens a lot, actually. And they decide to--they are having trouble finding good engineers, so they want to use a machine learning algorithm to help them sort through resumes. And of course they have their own history of hiring people, and those people either succeeded or they didn't succeed in their company. But they have to define success for this model to sort through the historical data and look for people who look like they have succeeded. That's basically what--when you want to build a model you have to define your data set; you have to say what success looks like; and [?] to feed the algorithm--you should choose an algorithm--but once you've chosen the algorithm you have to tell it, 'Look for this; look for patterns of people that look like this success story.' Now imagine that they define success as someone who has been there for 3 years and has been promoted at least twice. Now imagine that they run this machine algorithm; it gets trained on their historical hiring practices; and they set it on the new data set, which is new applications for engineering jobs. And they find that, like, no women get through the filter: that the algorithm literally rejects all the women applicants. What would that mean?

Russ Roberts: It obviously means women aren't good at being engineers.

Cathy O'Neil: I've set it up, an extreme case; probably not happening.

Russ Roberts: Playing straight-person to your--

Cathy O'Neil: Right, right. Thank you: Straight man. I set it up to be extreme, but the point being like the algorithm would not say, 'Hey, you guys should check to make sure your culture is welcoming to women.' Right? It would instead just say, like, 'Women do not succeed at this company; throw them out.'

Russ Roberts: Or it could be that the applicants--there aren't very many women in the data set because you have a poor history in the past and there's a lot of noise in the data, so women are just not matched to those characteristics that you found. But certainly the culture example would be more dramatic, right? If you have a sexist culture, women are going to look like they can't get those promotions, and as a result you are going to be encouraged not to hire them in the future by the machine learning. And then you'll see how smart you were--you'll think you're really smart.

Cathy O'Neil: If you don't like that example--

Russ Roberts: I like that example.

Cathy O'Neil: Well, I'm just going to say, think about Fox News and women anchors. It's not that they don't have any women. It's that the women that they have are pushed out. Right?

Cathy O'Neil: I don't know if that's true.

Cathy O'Neil: I'm not saying that this is actually happening in a given engineering firm. I'm just making the point that a machine learning algorithm is dumb. They don't understand the 'why.' The only understand the 'what happened.'

Russ Roberts: I think that's important to emphasize. There are patterns; sometimes patterns are very dramatic. But that doesn't mean they'll be sustained in the future or that they should be sustained. Right?

Cathy O'Neil: Exactly.

27:20

Russ Roberts: A friend of mine worked at a company and said he noticed that everyone there--he was an intern--he said he noticed that everyone there who had a permanent job, had only gone to 3 different universities. I don't think that was a coincidence to start with for their resumes. And it's not a bad place to start. Obviously there are good universities; I'm not going to name them; I don't remember them, actually. But they were good universities; but that's not necessarily--that's one way to reduce the cost of sifting through a lot of resumes. It's a very crude and perhaps not a terrible way to save time and cost. But as you get to these more sophisticated methods, as you point out, you get this opportunity to make false conclusions. Right? It's pretty straightforward.

Cathy O'Neil: I mean, it's interesting. Because, you know, it's kind of obvious once you say it. But these algorithms, you know, as sophisticated as they are--and they sometimes are: they deep learning, they are all network algorithms--I wouldn't call it 'sophisticated' but they are certainly unintelligible.

Russ Roberts: They are fancy.

Cathy O'Neil: They don't make moral decisions. They literally only pick up patterns that already exist. So, it would be great--and sort of the Big Data promise is that you throw data against a wall and truth falls out. The Big Data promise is that somehow the truth is embedded in historical practices. But that's only true if historical practices are perfect. So, as soon as we have a firm that has--an engineering firm that has like really mastered what it means to find good engineers--as soon as we have that then we should make a machine learning algorithm to mimic that. But I don't think we have that yet.

29:20

Russ Roberts: And I think the other point you make which I think is important--I'm not sure I agree with it in all the cases you give: there's not always a mechanism for making the model better. So, in the case of the engineers, you'd consistently hire men. You slowly would weed out the women in that case, or you wouldn't hire them to start with. And you'd have a model that you'd be foolishly thinking had worked pretty well, but in fact you've made a mistake. Now, I would argue that firms that do that have an incentive to at least think about whether they are making a mistake: whether their big data models are serving them well. And I think we are in early days. So, one argument would be, against your pessimism about these models, would be, 'Well, we're just starting. Sure, they make some mistakes now but we're going to get better.' In fact, the evangelists would say, 'It's just going to get better and better. Of course they're imperfect.' What are your thoughts on that optimism and pessimism?

Cathy O'Neil: I'm actually one of those people. I know we're going to get better. What I'm trying to point out is that we can't assume we're already good. What I'm objecting to are high-stakes decisions being made when there's no actual check or monitor on the fairness or the actual meaningfulness of the scores themselves. And I say, 'meaningfulness,' because I'm thinking about the teacher-value-added model--

Russ Roberts: I was just going to ask you about that.

Cathy O'Neil: Yeah. I don't think the problem there is discrimination, per se. Like, actually a lot of the teachers are women. It's a very diverse field. There might be some discrimination issues around it. But the biggest problem is that it's not very meaningful. We have these scores that are typically between 0 and 100. And some work has been done to see just how consistent the scores are. And it's abysmal.

Russ Roberts: Let's back up. Put the uses and the Value-added model in context, because listeners won't know what it is. This is an attempt to evaluate teacher quality and use that evaluation to either--typically to fire the worst teachers under various mandates. Right?

Cathy O'Neil: Yeah. It goes back a couple of decades and a few Presidencies. The idea is: Fix education by getting rid of the bad teachers. And we have this myth of these terrible teachers that are ruining education. And I'm not saying there aren't--

Russ Roberts: Yeah; I wouldn't call that a total myth. I think there are some lousy teachers.

Cathy O'Neil: There absolutely are bad teachers; and there are bad schools. But, I'm just claiming--and I'll repeat myself--that, you know, there might be a problem but if you have a solution that doesn't actually solve the problem then you are getting nowhere. And I think the value-added model for teachers is an example of that. So, what they've done, the first generation of teacher assessment tools, was pretty crude and obviously flawed. And that was to sort just count the number of students in a given teacher's class who, like, were proficient in their subject by the end of the year. And the reason that was super-crude was that essentially performance on standardized tests is highly correlated to poverty. Across the nation. And across the world, in fact. And when you discounted the number of students in a given class that attained proficiency and that punished the teachers who had very few of those students, then you are punishing basically teachers of poor students. And it was pretty clear that that wasn't good enough. Like, that wasn't--it wasn't discerning enough as a way of finding bad teachers. Or another way of thinking about it was, 'These kids weren't proficient in Third Grade. Why would they suddenly be proficient in Fourth Grade?'

Russ Roberts: Yeah. You are not controlling for the initial quality of the students that the teachers had to deal with. So that's clearly wrong.

Cathy O'Neil: Exactly. Right. So that's clearly wrong. So, they wanted to do exactly what you just said: they wanted to control for the students, themselves. So, what they've developed is this, what I call a 'derivative model.' So, it depends on another model, which is in the background, which estimates what a student, a given student, should get at the end of their fourth grade year. Let's say. And is based on what they got at the end of third grade--reasonably enough--as well as a few other attributes like what school district they are in, like whether they qualify for school lunches--which is a proxy for poverty. Various things. So, now, just imagine: Everybody in your class--you are a teacher, a fourth grade teacher--everybody in your class has an expected score at the end of the year. What is your score ending up? What's your Value Added score? It's going to be essentially the difference between--the collection of differences because you have a bunch of students--the differences between what your students actually get versus what they were expected to get. So, if you are a student--

Russ Roberts: Which is a good idea, on paper. Right?

Cathy O'Neil: It is. It's absolutely a good idea.

Russ Roberts: That's exactly what you want to try to measure.

Cathy O'Neil: Right. So, if Tommy was expected to get an 80 but Tommy got an 88, then that's +8 points. That's good for you. If Sarah got a 60 when she was supposed to get a 65, that's not good for you. So, you kind of--again, the idea is--and this is kind of reminiscent of what we were talking about with the recidivism risk scores--you are held accountable for all these differences between what your students were supposed to get versus what they actually got. And I'm simplifying it because there's all sorts of complicated, sophisticated mathematics going on as well. But let's put that aside. This is more or less the idea. The problem, statistically speaking, with this, is that the original model is just not very accurate.

Russ Roberts: Yeah. A lot of noise.

Cathy O'Neil: And when you are dealing with the differences between actual and expected, that's called something: It's called the error term, in a bad model. So, as a teacher you are being held accountable essentially for the average error term of a bad model. Which is also, by the way, is also called 'noise.' For a reason. And it's just simply a bad scoring system. It's not consistent enough. I interviewed someone named Tim Clifford, who is a middle school English teacher in the New York City public schools. He's been teaching for 26 years. He has a bunch of awards, etc. He got a 6 out of 100--

Russ Roberts: That's a low score--

Cathy O'Neil: the first time he got a value added [?] model. Terrible score. He got a 96 the next year.

Russ Roberts: He must have gotten smarter in the meantime. He took some classes on how to teach well.

Cathy O'Neil: So, one of the things--I characterize 'weapons of math destruction' by saying they are widespread. So, this is all over the country. Most states now use some kind of version of this--that it's secret. So, this is what really gets to me about this. There's actually been quite a bit of uproar around these teacher assessment scores. And the New York Post actually filed a Freedom of Information Act (FOIA) request, and got the names and the scores of all the teachers in New York City--first year, I believe it was the first year it came out. And they published them. It was kind of like a public shaming of the teachers. I tried FOIA--I tried to get the--I filed a Freedom of Information Act request to get the source code for that same scoring system, under the assumption that if you can get the scores, public access, probably I can get the system, the scoring system itself. I was denied the actual code. And moreover I found that under the licensing agreement that this company, this big data company, had written with the City, New York City, nobody in the Department of Education could see the source code, either. So literally nobody actually understood how these scores were being built. So, a final word is that I kind of gave up and I didn't know what to do after that. But this really smart guy, who is actually a high school teacher at Stuyvesant High School, a math teacher, what he did was he took the stuff that the New York Post had published, he took that same data, and he found some teachers that were actually listed twice. Quite a few, actually. Hundreds of teachers were listed twice. They had maybe taught 7th grade math and 8th grade math, so they'd gotten scores for both classes. And he just graphed them. He just looked at how consistently these teachers were scored.

Russ Roberts: That's pretty good.

Cathy O'Neil: And he found very wide discrepancies. If you plotted on a scatter plot, it looks almost like uniform distribution. Whereas what you'd expect a line, y=x, just right down the middle. It's nothing like that.

Russ Roberts: Although it is possible, of course, that a teacher has a particularly annoying class or a particularly challenging class--some classes will get more time and effort and energy from a teacher. They don't spread their time equally. And they probably don't do the same job in each class. But you'd expect some correlation. So the fact that it's virtually zero would be disturbing.

Cathy O'Neil: It's not 0; to be clear, it's actually 24%. But that's like, for a teacher with themself.

Russ Roberts: It's not so good.

Cathy O'Neil: I'm not saying there's no information in that at all. What I'm saying is: It's not very good information. It's really not. And at the same time, it's being used for high-stakes questions. So, for tenure decisions. I interviewed a woman named Sarah Wysocki who was a Washington, D.C. area teacher. She got fired because she had a bad growth scored, value-added model score. She actually got fired over this. She had plenty of reason to believe that her score was actually caused by a previous teacher cheating on their students' tests because there was a bonus involved. It's complicated. But the point being that these scores are simply not accurate enough to fire people, to have large decisions based on them.

39:32

Russ Roberts: Yeah, so--the Wysocki example is tremendous because, it's just a phenomenal example of how if the incoming class grows or are artificially inflated the year before by cheating, or by some teacher is really good at teaching to the test and you are not as good at teaching to the test, and you are not as good at teaching to the test but you are a great teacher--you can get a lower score and seemingly worthy of being fired. I think it's important to add that: It's a horrific system. It's a horrific system, the public school system. And, you know, we could take turns--and I found myself taking turns as I read your book--feeling bad for the teachers or the students. So, it's true: That's very unfair to a teacher; and I think that's a crude and a very lousy way to evaluate teachers. And I'd also add that it's masquerading as objective when it's not. But the also truth is that these students get awful teachers who can't be fired. And so you have to have--you don't have to--but the current system because it's so entrenched: there's no way to get rid of bad teachers. And I think that's the tragedy, to me, and I come from a different ideological place than you do, but I think--you know, I don't know anything about this Value Added Model--it sounds awful to me for lots of reasons. I think it's incredibly difficult to predict expected scores. But the idea that somehow there's a good alternative--there isn't a good alternative in the current system, it seems to me.

Cathy O'Neil: Yeah. I mean, listen: I'm glad I've convinced you of my main point, which was that this is not a solution. And we could talk about political solutions to bad teachers, which I agree are a problem. And if you wanted to know my personal opinion, like, 'Let's pay them much better, and remove tenure and get rid of bad ones in a thoughtful way.' I also think that data has a place in education. But I think that education, the way data and algorithms and models should work, has to be intrinsically a feedback loop between the teachers and the test scores. Right? And, you know, the teachers have to not just get a score, but like feedback about what they should do better. What--you know, 'Hey, we did this interesting test. The test actually measured the students' understanding of these various dimensions; and we see that your students were lacking in this dimension; and this is how you teach that.' In other words, feedback that the teachers can--that good teachers--can actually reliably use to improve their teaching. Which is not what we have here.

Russ Roberts: I agree with that. The problem is that we are stuck because of the nature of the public school system. I think. We are stuck with objective, un-messaroundable things like test scores. Test scores are a terrible way to measure teacher quality. On so many dimensions. My wife's the head of a math department in a high school, and if I told her, 'Okay, what I want you to do is evaluate your teachers based on hos their kids do on test scores,' she would be so offended. She spends hours in the classrooms of her teachers. She wants people to be in her classroom when she teaches. And what makes a good teacher is a subtle--and there are a lot of dimensions to it. And certainly not only how somebody does on a test score--even if it's a huge improvement. Which is a good thing. I'm not denying--I don't think test scores are irrelevant. But I think it's bizarro that we assess teacher quality based on a score. And the reason we do that is it can be defended. It's sort of--to me, it's sort of a meta-version of what's wrong with the more complicated systems that you are talking about. It's not the way anybody would do it if they had to design it from scratch.

Cathy O'Neil: I could not agree more. And I think that the philosophical question that is raised by your venting just now, which I completely agree with, is: When do we see these magic bullet algorithms be used? When do people say, 'I'm going to solve this very thorny, complex, related, complicated societal-wide problem' with this stupid algorithmic scoring system? Which doesn't answer the original question and leads to all these unintended consequences. And I think the answer is: The more complicated and societal, and, you know, taboo, a topic is, the more likely you are to come up with, to see something emerge along these lines.

44:18

Russ Roberts: Yeah. But--but--you've given a lot of examples from the private sector that are not as societal, that are different. And I want to turn to a couple of those because they are very interesting to me. And I want to defend--my only criticism, serious criticism of your book is that you don't spend much time talking about any of the benefits. So, it's--you emphasize the costs. Which perhaps is the right way to start, at least to get people's attention. But one example you use that came to mind was you talked about U.S. News and World Report and their attempt to measure university quality. Which is absurd. Obviously it can't be done. And they end up doing it. You know, just mindless--

Cathy O'Neil: [?] could do it.

Russ Roberts: And they rank universities; they rank MBA (Master of Business Administration) programs; they rank graduate schools. And it's--we all understand that it's to sell magazines. It is to start arguments. And it is very effective at both of those things. But it also changes lives in all kinds of unexpected and not-so-attractive ways. And it creates what you call an arms race among universities trying to pad their scores, because they know it goes into the index. Having said that, it also forced a lot of schools that had great reputations to actually serve their students better than they had before. In my opinion. So, do you want to expand on the bad part? And do you want to accept my good part? Or do you want to disagree with that?

Cathy O'Neil: Maybe I--if you have good evidence that the U.S. News and World[?] Report arms race among college administrators has actually had positive effects on student learning, I haven't seen that.

Russ Roberts: Well, I wouldn't suggest it has much for student learning. I think what it did in places like in MBA programs, which is where I saw it way too up close and personal as a former faculty member in a business school that was really desperate to get in the Top 20 and stay there: there were some pernicious things that you talk about--that people did things to make the scores look better when in fact they weren't any better. But there was an enormous revolution among business school programs to make their degrees, I think, more useful to students. And I think that was a good thing. The rest of it, at the college level, you might be right; or I'm sure there's a lot of truth to what you say: which is, a lot of all it did was change the way people gamed the ranking system in lots of silly ways, and it's not what people should be spending their time thinking about--rather than they should be thinking about how to make the university better. Rather than trying to--

Cathy O'Neil: Yeah, I mean--okay. I'd be happy to look at what you are saying. And, you know, I'm not claiming that I've spent that much time on the MBA level of this stuff. I think my biggest criticism is that, if you are going to make a score of quality for colleges, especially if it's going to be aimed at parents of high school kids--and I'm one of them: my son is entering junior year of high school, which is like, critical moment, start worrying about college. Right? It is abominable to me, and I'm sure to you, that you do not--that you actually create a model that is blind to cost. As if we're a bunch of Rockefellers who can send our kids to whatever school is the best--you know, ignoring cost. Of course cost is a major factor. And the consequence of their ignoring cost, back in 1983--which they had many, many years to resolve, which they have not--the consequence of that is that tuitions have risen in direct relationship to how much these colleges are fighting each other to outrank them on this one list[?]--

48:14

Russ Roberts: Explain that. Because that was really interesting. I'm not sure I agree with it, but it's really interesting. So, talk about that connection.

Cathy O'Neil: I'm not the only person making this case. But everybody knows that the number of administrators at these colleges has ballooned. And partly that's due to all sorts of things that they now have to--regulations that they now have to make sure that they are following. But a lot of that is directly due to the gist that many people in universities' job is to keep an eye on their ranking, and to make sure that, you know, they're competitive for incoming freshmen. Which means that they sort of like, the colleges at a given tier are all fighting for the best students that they can hope to get for that tier. And what that often means is they want to get these student athletes. So they have to build these new stadiums. They want to get really nice dorms. They have dorms that have, like, water parks embedded--

Russ Roberts: Yeah. It's unbelievable.

Cathy O'Neil: in the dorms. It's like--forget about--I mean, I'm sure you have your story, too. I went to U.C. Berkeley in 1990. We had to find our own housing. It was very bare bones. We got a great education. It was very, very affordable. Especially for in state. We didn't get coddled: we were grown-ups. And I just feel like--it of course is part of a larger societal issue of like when do kids actually get to be called a grown-up in this day and age.

Russ Roberts: For sure.

Cathy O'Neil: But it is completely outrageous, and way too expensive. It's something that I as a parent would never agree to. But it's being--this money is being spent. And then charged to me, because of the fight for the U.S. News and World Report ranking.

Russ Roberts: Yeah. That's an interesting question. That's a great example. I don't know if it's true. I think some of it's true. Because, the idea is here, you want the highest SAT (Scholastic Aptitude Test) students; you want to be selective so you want lots of applicants and you want to reject a bunch of them. Because that makes you look like a better school because you are more selective. None of which--you know, it doesn't make you a better school, obviously. It just makes you look like you are better school.

Cathy O'Neil: And moreover I would argue we are just all fighting--and when I say 'we' I mean colleges--are just all fighting for the same group of kids. It's not like the kids change. You know, it's just the same group of kids. We're just sorting them slightly differently because after all this ranking situation. And we're putting them in very fancy dorms.

Russ Roberts: Yeah, with really nice food and athletic facilities to play in. And maybe not always a water park, but lots of--it's a resort-like experience. Now, the question is: Is that because of these rankings? Or is it because we are a really rich country and rich people send their kids to these schools and they want their kids to have a pleasant experience? They don't want them to have your Berkeley experience or the experiences I had that are much more bare bones. Because you look at the high schools that these kids come from. They also look kind of unusually fancy.

Cathy O'Neil: Well, listen. I mean it depends on who you ask. Obviously. I think that the kids that have, go to fancy high schools enjoy their fancy colleges. I think if you talk to a bunch of Millennials right now about their student debt, and ask them, 'Would you trade in your student debt for fewer perks in your college dorm?' they would trade it in a second. I also don't think this all is completely deliberate. It is not somebody's plan. I don't think there was anybody who--like, I don't even think the U.S. News and World Reports were like, 'Oh, we're going to screw the lower classes, and the middle class; and they're going to have huge amounts of college debt in the next 20 years. That wasn't--it wasn't like that. I wanted to give an example of what feedback loops can really do. And it's a natural. It happens, it arises naturally, because of the trust that we put into these rankings. We have actually endowed--as parents, we have endowed these rankings with power way beyond what they deserved.

Russ Roberts: Yeah; I don't [?] that, but I know a lot of my friends do. Because I think, having taught at 5 universities, having been in the kitchen, I'm much less concerned about the grade that the Department of Health gives. And a little bit more maybe about what's actually going on, and therefore, you know, it sounds like I've got to get my kid in this kind of school, I'm thinking it's not really worth it. But it is--it's an interesting question. I think it's a question of magnitude of these effects. I don't know how much of it is driven by the U.S. News coming into existence. And I say that because one of the things I do know the data on, if you look at the amount of government subsidies to education over the last 10 years, 15 years, it's rather extraordinary. And the number of students going to college has increased. I think--it's a shocking number. I think who graduated--it's either go or graduated--was up, I want to say 50% between 2000, 2010. It's a huge increase over a very short period of time. Could argue maybe that's a response by the political process to the increased demand. I don't know. But there's a lot going on there. That's all I'd say.

Cathy O'Neil: Yeah. There is a lot going on. I'm not saying this is the only factor. I also--I think the Federal Aid system is a factor--like, it's made things, it's made it easier for people to borrow money to go to school--

Russ Roberts: Which pushes up the demand--

Cathy O'Neil: which obviously is a very good incentive for schools to raise their tuition.

Russ Roberts: Correct. It pushes the demand up.

Cathy O'Neil: So, I absolutely don't claim this to be the only factor, but I do think that it is an important one. And I get that from my research from listening to administrators say when they install fancy stadiums. Russ Roberts: By the way, a separate issue--it's not obvious. I know administrators like to say that fancy stadiums and good sports teams encourage applications and improve rankings. There's a debate on that. And that may be an example where correlation isn't causation, either. You use the example the Flutie effect, where Doug Flutie threw a miracle pass at the end of a U. of Miami game; put Boston College on the map; and their admissions went up 30% over 2 years. But there are other things going on. It's not obvious that it was just due to that. But I think administrators like to invoke that as an excuse for fancy sports teams.

Cathy O'Neil: Yeah. Again, it's not the only thing going on. I do think that alumni giving is one of the factors that the U.S. News and World Reports counts as a sign of quality. And I think that people who used to be on the football team are more likely to give money. But, again, I don't want to quibble. It's an example of a very, very influential algorithm. And it's an old example. So I just wanted to say: The algorithms have power. And we have a bunch of new algorithms that we are just blindly trusting and empowering, and we have to be careful.

55:06

 

Russ Roberts: You talk a bit about an issue that's come up recently on the program, which is A/B testing at tech firms like Google or Facebook or Quora, where, I interviewed Adam D'Angelo recently on that--of all the experiments that they are running daily. And Google is famous for that. This is a really cool thing; it's a really cool thing for data scientists. They have this incredible laboratory where they can change the color and change the font. And those are kind of harmless. Some of them are not so harmless, though, you suggest. So, talk about what worries you about, inside these tech firms, with proprietary experimentation going on.

Cathy O'Neil: Are you talking about the predatory advertising?

Russ Roberts: Anything you want. The bright side is, 'Oh, it's great. Everybody gets what they want. They make it work to customize for you.' And it sounds good. I think a lot of it is good. They show me books that I want to see. They show me things I want to buy, rather than things I don't want to buy. And on average that's good. But it's more than that.

Cathy O'Neil: Yeah. I actually worked at an ad tech firm after leaving finance. And there's a story I say in the book about a venture capitalist who was considering investing in our Series B funding round. And he talked to the whole company, which I was I think at the time 50 people or so. And he talked about this sort of glorious future which he was imagining, where he would only see offers to vacations to Aruba and jet skis, and he would never again have to endure a University of Phoenix ad. Because those are for people like him. And when he said that, people laughed. And I was like, 'Wait a second. What?' We hear, the ad-tech guys are always talking about the opportunities and how tailored ads are a feature, almost like people should be grateful for them: 'Oh, thank you; I was thinking of buying that lamp. I'm so glad you showed it to me.'

Russ Roberts: Right. 'You knew. You knew.'

Cathy O'Neil: And in some sense they are right. There are often opportunities. Sometimes there are coupons. There may be nuisances or distractions when we are trying to get some work done. But in the worst case scenario, they are actually predatory. In the worst case scenario, going back to the Federal aid program, is for-profit college, which specifically target people who are vulnerable to this kind of really hard-core recruiting, and are eligible for the financial aid that goes straight from the government to the for-profit college. So, you know, and that's one example. There's another example of payday, at Payday Lenders. And the reason I think it's so important to understand that the worst case scenario is that it's quite predatory is that--I've been in Finance; I've been in Data Science. In Finance, when we had a weapon of math destruction, which was the Triple-A ratings on mortgage-backed securities, when that model failed, it failed spectacularly; and everyone in the world noticed, because in the financial crisis, the financial system was at risk. But what I fear about data science algorithms that fail, or that create pernicious feedback loops like the one I just described with for-profit colleges with debt and cycles of poverty, is that they are absolutely failing the people they are targeting, but we will not see it. It's exactly what the venture capitalist visiting my company said: He doesn't want to see it. He wants to be siloed and segregated and put into a position where he's like treated like the first class citizen that he is. And he wants other people, who are being preyed upon, to be separated and away from view. And that's the thing that bothered me the most. Actually, that was the moment when I decided to write the book.

Russ Roberts: Yeah. I don't have any opinion on for-profit colleges. It is--think it's a--I don't know how predatory they actually are. They don't come across very well in your book. Which is maybe justified. The question is: What is to be done about that? Should we warn people that they are bad places? Let's start with the assumption, again I'm agnostic on it, I don't know anything about it--that they don't serve their clientele well and that there's a scam element, that things are being foisted on them that are not productive. Suppose that's true. Does that mean we should warn people about it? We should stop letting people borrow money for those uses? Or does it mean--which is what you focus on--that we should be wary of algorithms inside, say, Google or Facebook or elsewhere, that push certain type of people toward certain types of approaches that "aren't good for them"? Which is really what you are saying, I think.

Cathy O'Neil: Yeah; I mean, at the very least I want people to stop promoting tailored advertisement as a purely benign, if not a positive force. It really is a segregating force. And for those of us who have money in our pockets and are well-educated it serves as an opportunity. And for other people it doesn't. And as far as the for-profit colleges go, I don't want to only single out for-profit colleges, because, the truth be told, like some of them are probably fine; some students probably have good [?] experiences. And then some other colleges are probably not fine. I think the answer to that--and if I were in charge of the world, which I'm not--would be: Yes, to cut off Federal aid. Because they are essentially leaches on the Federal aid system of loans. Which I think, between you and me should be completely changed; and we should just have free--like, very rare 4-year[?] State Schools and maybe forget about Federal aid. But that's just my opinion.

Russ Roberts: I agree with half of that.

Cathy O'Neil: Yeah. No. We don't have to agree on everything. And I'm not trying to say, 'Hey, yeah, everyone: Agree with me.' What I'm trying to say is like, 'This is happening.' Like, advertising online, these guys know a lot about you; they can target you; they know if you are poor; they know if you are single mom; they know if you are desperate. They find you. And they say, 'I'm going to solve all your problems. Just sign here and you're going to get online education,' and, you know, at the end of 4 years you'll be saddled with a lot of debt and you'll have a diploma that is often not worth more than a high school diploma.

1:01:58

Russ Roberts: So, there are going to be examples like that. But I want to disagree at least--at your response--to this idea that it's okay for me and you to get tailored ads but not poor people. So, I was thinking of buying a watch, and I did some Google searches on watches. And all of a sudden, watches started showing up in my searches as the ads. And I bought a watch. And they keep showing up. They are going to keep showing up. And that--to use this example for it: It's not that smart. It takes a while. Or maybe they are hoping I'll buy another one. Which I'm not. But, so I'm glad I saw them. Some of them. And I rejected some. I clicked on some, maybe. Don't remember. But I like in general that the ads are tailored for me because otherwise it's just clutter. But why is it that a poor person--aren't there things that a poor person would like to know about to buy, that are good for them and that they would profit from having? And shouldn't they be free? And wouldn't it be better for them to get those products that they are desperately eager to get a good deal on rather than they jet skis? And isn't it okay for them to get tailored ads, too?

Cathy O'Neil: If they are looking to buy something and they don't have a lot of money, I of course want them to find a good deal. The problem is that they are worth, in this situation, they are worth way more to a potential payday lender or a potential college, [?] of a college, that can get just tons of money through the Federal Aid system than they are worth to a purveyor of cheap whatever--you know, the actual products that these people can afford. I don't know if you know the way that Google auctions work, but essentially, given a space on advertising goes to the highest bidder. So, you know, the different companies that are vying for space in front of that person, they each value that person in different ways. And right now, for poor people, it's not so surprising to hear: the predatory industries value them the most, because they can make the most profit off of those people.

Russ Roberts: But they don't have to be predatory.

Cathy O'Neil: They don't have to be predatory. No. I'm not saying they are.

Russ Roberts: Even payday loan, lenders may not be predatory. Because these people that we are talking about maybe don't have a bank. Maybe they don't have access to capital. They can't--it's good for them. They want those things. Should we not let them have them? Should they be banned?

Cathy O'Neil: I mean, I think they should. Depending--look, if it's a true payday lender. I mean, we don't, we're going to go into the weeds here. Let me just make the one point, which is if you had two lenders that are vying for the space on a webpage that a poor person is looking at, and one of them is predatory and charges enormous fees and makes enormous profits, and the other one is much more reasonable as a lender, the person that makes more money is going to be able to offer more money in the auction. And they are going to win. Do you see what I mean?

Russ Roberts: I do. But it doesn't have to be the person, the most predatory. It could be, if there's competition between them, that there are costs--

Cathy O'Neil: There is competition between them. There is competition. But if I opened a bank today and I promised myself, I'd made it my mission to make really reasonable loans and I would make much less profit off those loans, I wouldn't have that much money to pay for tailored advertising. So I wouldn't win those Google auctions. Or those, whatever-the-advertising auctions.

Russ Roberts: But that's because the audience that you are trying to attract is evidently more expensive. So, you could choose, as a matter of charity--you could raise money to create an NGO (Non-Governmental Agency) or a nonprofit that would outbid and offer lower rates of interest. But evidently that's not the market rate. But you are right: we are way off in the weeds here.

1:05:52

Russ Roberts: I would enjoy continuing. But we're almost out of time. If it's okay, I'd like to shift gears for our final questions. Is that okay?

Cathy O'Neil: Sure. Absolutely.

Russ Roberts: So, you raise a lot of really interesting points about--and we've talked about most of them, and you have lots of examples, for people who want to read more on the causation issue, the lack of feedback loops, sometimes to improve the model, etc. Where does that leave us? We're in a world right now where there's a tremendous amount of romance about these models and about the data that we have access to. There's a lot of excitement, a lot of really smart people--people like you who don't have your wariness who are just going gung-ho, full speed ahead. What might slow them down? What do you recommend we think about--besides reading your book, which is, that's a good recommendation--to make people a little more careful about what they are doing, and humble? What also might we do besides being aware?

Cathy O'Neil: Well, I have proved[?] three planks to this. The first one is the easiest one, which is to get more ethics and more thoughtfulness in machine learning and data science education. There was actually a little brouhaha at Columbia about a computer science class, a machine learning class, being assigned to be the robo-cop. I don't know if you heard about this.

Russ Roberts: No--

Cathy O'Neil: But they were assigned--it was meant to be satire, but it wasn't clear enough. So the idea was, you know, take the Stop-and-Frisk data--which is public data--and figure out, like who to go after. It was done really poorly. But I thought it was actually the right thing to do. I think what we need to do is realize, is to actually study, realize data sets and realize that, you know, we can infer a lot about police practices from looking at the raw data and it's not always a pretty thing. It's not always something we are particularly proud of. So, I would actually, if I were teaching a machine learning class for future data scientists, I would ask them to think about this. Very, very explicitly. That's the one thing. The next thing is we have to build tools to understand how these actual algorithms work. And right now we just don't have that. It's a large, gaping hole. It's kind of, we have a decision-making process without an auditing process for that decision-making process. And we need to build tools to audit algorithms. And the field, it's very brand new--I'm hoping to be part of it--I just started a company to audit algorithms, to be a consultant for that. And I don't know if I'll ever have a client, because a lot of these people don't want to think too hard about what's going on inside the black box. But I think we need to do that; I think that companies should do that to their own algorithms, internally, because they should be worried that they might get sued if they are doing stuff that is illegal or unfair. And also, I think regulators should keep in mind that these algorithms are essentially running wild and need to be understood. And finally I want the public to have kind of a bill of rights around scoring systems is where they--very similar to what we already have with credit scores. You know, with credit scoring we have the right to see our credit report. We have the right to contest data in our credit report that's inaccurate. And there are certain rules about how credit scores can be built or not built. And I think we need that more generally. But right now the, you know, the laws are not written for the age of big data. So we need to sort of update them. And we need the public to push back. I think to teach the example. Teachers who are being assessed by a meaningless, almost random-number generator testing a scoring system should demand to understand what their score means and how to get better.

Russ Roberts: It's early days, but have you had any reaction from fellow data scientists about these arguments? Your book is just coming out now, but are you going--do you think it's going to resonate with folks or is it just going to--want to push it aside--let me do my job.

Cathy O'Neil: I mean, so far, I've gotten a lot of positive feedback. Not everyone loves it. A lot of people like to live in a world where what they do is magical, and inherently correct. So, yeah, I'm expecting people to be uncomfortable with it as well.