EconTalk |
Jim Manzi on the Oregon Medicaid Study, Experimental Evidence, and Causality
May 27 2013

Jim Manzi, founder and chair of Applied Predictive Technologies, senior fellow at the Manhattan Institute, and author of Uncontrolled, talks with EconTalk host Russ Roberts about the Oregon Medicaid study and the challenges of interpreting experimental results. Manzi notes a number of interesting aspects of the study results that have generally been unnoticed--the relatively high proportion of people in the Oregon study who turned down the chance to receive Medicaid benefits, and the increase (though insignificant) in smoking by those who received Medicaid benefits under the experiment. Along the way, Manzi discusses general issues of statistical significance, and how we might learn more about the effects of Medicaid in the future.

John Ioannidis on Statistical Significance, Economics, and Replication
John Ioannidis of Stanford University talks with EconTalk host Russ Roberts about his research on the reliability of published research findings. They discuss Ioannidis's recent study on bias in economics research, meta-analysis, the challenge of small sample analysis, and the...
Frakt on Medicaid and the Oregon Medicaid Study
Austin Frakt of Boston University and blogger at The Incidental Economist talks with EconTalk host Russ Roberts about Medicaid and the recent results released from the Oregon Medicaid study, a randomized experiment that looked at individuals with and without access...
Explore audio transcript, further reading that will help you delve deeper into this week’s episode, and vigorous conversations in the form of our comments section below.


Russ Roberts
May 27 2013 at 8:25am

As I mentioned in the podcast, I invited Austin Frakt to respond. His response is here.

May 27 2013 at 10:33am

Your more elegant restatement of policy significance vs statistical significance in outcomes could boil down to a public choice evaluation of constituencies.
Who has to pay, and how much, for the results (regardless of significance). Who cares about real outcomes when it gets votes, right? Many programs have worsened problems and yet are still considered the gold standard response for politicians.

The trade-off is only a problem for the decision-makers when the backlash from the tax costs votes (or donations) that aren’t offset by the vote gains (or donations) from the recipients.

On smoking, subsidizing a behavior ensures more of it. Hope the bathroom smoke smell has cleared.

Thomas A. Coss, RN
May 27 2013 at 1:40pm

Russ, Thank you and your guests for such careful treatment of this study.

In my over 10 years of Emergency Room and ICU work, it became clear that not everyone shares the same regard for their personal wellness; yes they want to be free of pain, but often that’s where it ends. We don’t even consider the possibility that there are rational actors who frankly care less about living long or not, they just want to have fun. Why, then, should we assume that everyone shares the same preference intensity for good health and thereby provide it?

Here are two things upon which we can all agree high degree of certainty (p=1): Only you can do your own healing, and in the end, only you can do your own dying. The individual is the residual claimant, not the population or the government. It seems awkward to me to only pursue policies aimed at reducing hypertension by providing medication and not a reduction in potato chip consumption.

May 27 2013 at 1:43pm

Great podcast Russ. It was a good follow up to the Frakt interview, which was also very good. I found the discussion of increased smoking very provocative. Although not statistically significant, it seems like a fertile area for investigation; i.e., is there any evidence that provision of health insurance (whether through the public or one’s employer) increases moral hazard in the form of unhealthy behavior?

May 27 2013 at 11:54pm

I found the last two weeks of interviews very interesting and the results rather surprising for me and have read the actual paper on NEJM.

The way I understood it is that there was no statistical result that found that having medical insurance really helped in the overall well being of a person. Except for depression.

Assuming that the above generalization is correct. I’d like to make the argument that: yes you are not going to find a difference because of the social factors in the society that the studies are published in.

Everything that we measure success by or celebrate success by is relatively un-healthy (yes, i’m making a huge generalization). But think about it. If you are a poor family you are not going to afford to go out and eat or go to a bar and drink alcohol, or buy a couple cigars. I always remember that as a child one way my parents would save money is by not eating out for extended period of time and cook absolutely all our own meals which in turn is healthier.

There’s been some studies publishes that argue this point in more detail:

another point: we just took a household and threw medical insurance at them. Yes this was a great deal. But did we educate them about what a ‘healthy lifestyle’ is? Did we do any preventative medicine. That would have had a bigger effect on the results, that would be un-measurable. I’ve always been thought that preventive medicine is the best medicine out there.

I live in Canada and we have ‘free’ healthcare. It would be interesting to see if there was any studies ore measurement that would show if the population got healthier or un-healthier after public health care was introduced.

I also had a technical question about statistics: How long should you track a variable before you can state that you have found all the possible results of that variable. They only did this for two years. Why not track it longer, and then you might find a change?

Thanks for the podcast they’re always very interesting. And really cool to see what I am studying in university be applied in real life examples.

May 28 2013 at 7:47am

Re the insurance as moral hazard issue:

I don’t have any insight into the smoking stats, but I know for a fact that at least some people are less willing to participate in dangerous sports if they’re uninsured. I’ve seen it repeatedly.

Alan Clift
May 28 2013 at 12:33pm

Of those 60% who submitted the application, about half were eligible; and about half were not found to be eligible

Maybe the 40% that won and didn’t register relates to the above statement.

I imagine registering for the lottery was low cost, so eligibility isn’t a huge concern, and personal conditions may change. So people register just in case. But having won, and seeing that their current conditions makes them ineligible, they don’t register.

Seems more interesting that 50% registered anyway despite being ineligible. Might interrupt that as placing a high value on winning.

Alan Clift
May 28 2013 at 1:22pm

I used Register to often in prior post.

There are those that entered the lottery. Once having won, they review the eligibilty in the application and don’t follow up. 40% doesn’t seem an unreasonable number of those.

Of the 60% that do apply after winning the lottery, about half are ineligible. Some combination of factors influenced those to apply despite being ineligible. Effort to apply, value of the prize, and understanding the eligibility requirements.

May 29 2013 at 9:02am

I am not shocked that the 40% of “winners” did not pursue the “low priced” health insurance. I thought that would be higher.

It is very difficult to apply for these programs and the participants have to disclose a lot of data before they get accepted. All of this takes time and effort. Furthermore there is time and effort and more paperwork required to participate.

Then once in that does not mean that everything is peachy and free. You have copay charges and worse you have to find suppliers of your specific treatments of which these poor folks find that very few providers taking new patients (Especially Medicaid patients). And your choices of providers may not be local to where you live.

Also added on to the paper work burden is the fact that many people have partners or other family members who have insurance.

Lastly, an increasing number of poor people are working in the Underground Economy where they do not want the IRS or other government agency looking into their employment. These folks will get their care by showing up at hospitals and avoid income and FICA taxes on their earnings.

Trent Whitney
May 29 2013 at 11:00am


Enjoyed both podcasts on the Oregon study, and especially enjoyed your discussion in this podcast on statistical significance.

I’ve run into the same issue in the business world – people are wowwed by a variable being statistically significant, while they overlook that the magnitude of said variable is miniscule. Conversely, a variable that has significant magnitude but that is borderline statistically significant gets ignored.

If you ever get one of your dream podcast interviews – Sabermatrician Bill James – I’d like to hear him opine on how often he’s run into this effect in baseball.

Mort Dubois
May 29 2013 at 12:53pm

I second Bogart’s comment above regarding the difficulty of completing the application. This was the very first thing I thought of as I listened to Russ and guest gassing on about the moral weakness of the shiftless poor. Did they look at the application? What did it consist of?

I have an autistic son who qualifies for Medicaid, and I have to fill out a new application for him every year. I am also a graduate of an Ivy League engineering school, and have successfully run my own business for 26 years. Of all the paperwork I have encountered, the Mediaid application ranks in the top 10 for difficulty and time spent to complete – it’s about on par with filling out your own 1040 and Schedule C, including depreciation schedules. It wouldn’t surprise me at all if many of the 40% simply abandoned the effort. If the Oregon app was like the one I fill out, it’s more impressive that 60% submitted it.

Did either Russ or guest take a look at the application before jumping to conclusions? If so, can we see a link? If not, why not?

May 29 2013 at 1:21pm

I think you would find the same lack of effectiveness of any corporate private healthcare system as well. We don’t have “health care” system; we have a disease management system. Modern medicine manages symptoms. Insurance pays for the management and mitigation of those symptoms. The people I know who are healthy aren’t that way based on any sort of “health care” insurance plan they have. Rather, they adopt lifestyle changes, eat well, exercise, become educated about their health, take supplements (which are not covered). etc.

I would like to see a study that shows how private insurance meets the same goals set before medicare. My sense is it would fare very much the same.

May 29 2013 at 5:53pm

Not sure why this wasn’t mentioned — maybe it was covered in the actual study, which I haven’t read — but is it possible that the people who took the time and effort to register for Medicaid were sicker than the people who won the lottery but didn’t register? That might account somewhat for why there wasn’t much movement in the main health markers — if you have cancer, for example, your doctors are probably not going to worry very much about addressing your slightly elevated blood pressure. Same if you have a chronic condition like arthritis — that’s the thing you will be primarily seeking healthcare for, even if maybe you have high cholesterol.

And in general I was confused by the health markers they selected to measure. Blood pressure, cholesterol and blood sugar are important health measures, but they are a very tiny piece of the puzzle. It seems like there would be other important factors one could look at such as whether the number of emergency room visits were reduced or whether someone with chronic pain was able to return to work because they received treatment. You wouldn’t know this by measuring their blood pressure.

Plus, two years doesn’t seem like a very long time, especially when you consider that not everyone will immediately seek treatment the instant they get Medicaid. It can take time for treatments or lifestyle changes to take effect. I’d be curious to see how the study participants are fairing 10 years down the line. The effects would probably be much more pronounced.

Ak Mike
May 29 2013 at 7:23pm

Mort and Bogart – what states do you live in? Out of curiosity after reading your comments, I checked out the Oregon Medicaid application, which is available online. It was about seven pages, fairly easy questions about income, property owned, and insurance coverage. There was an extra page if you are self employed, another extra page if there is more than one absent parent, and one or two more. It looked to me like about an hour to fill it out.

There are apparently about 58 million Americans using Medicaid services this year. I just don’t think that filling out the application form is a significant deterrent.

Mort Dubois
May 29 2013 at 9:18pm

@Ak Mike: I just took a look at the application, here’s the link:

For those who don’t want to wade through it, it is similar to the application I fill out in Pennsylvania. Mike, I’m amazed if you completed it in an hour – that implies that you have a very simple financial life and all information constantly at hand. I applaud your competence. For the rest of us, the application starts with this warning:

Here are some things you should do or know before you get started:

1) Use the correct browser
This online application is built and tested for use with Internet Explorer. Using other browsers may cause the form to not work properly.

Important: The “submit” button does not work when used with the Macintosh Safari or Google Chrome browsers. Please do not use Macintosh Safari or Google Chrome browsers with this application. We are also not able to support the use of this form on Ipads or mobile devices at this time. If you would rather have a paper application mailed to you, please call 1-800-359-9517 (TTY 711).

2) Gather what you’ll need
At this time, it isn’t possible to save this form and come back to it later. Once you begin the online application, you will have 3 hours to complete it. After 3 hours, your information will be lost and you will need to start over.

Not the most user friendly document I have ever seen. Pennsylvania’s fortunately, does allow one to save and return to work on the application later, useful when one is confronted with questions like this:

Tell us about money (you receive), including:

• rent paid to you
• loans repaid to you
• TANF (Temporary Assistance for
Needy Families)
• retirement pension
• veterans benefits
• worker’s compensation
• disability benefits
• child or spousal support
• guardian or foster care payments • Social Security benefits
• Supplemental Security Income (SSI)

They ask for verification of how much each month, and who pays it. Again, if you have a very simple financial life, one paycheck for one person, it wouldn’t be too difficult. It usually takes me 6 to 8 hours, two lengthy phone calls, and and 2 mailings for me to submit my application each year. This is for a kid who automatically qualifies, no matter what our financial situation, because of the severity of his disease.

I invite the skeptical, just for fun, to fill out the Oregon Medicaid App themselves. It’s possible I’m the dumbest Medicaid applicant out there, 40% of applicants abandoned the effort through sheer laziness, and that my skepticism is unfounded. I suspect not.

Ak Mike
May 30 2013 at 1:22am

Mort – I’ve no doubt that you are a heck of an engineer, but a form filler-outer you are not. I took up your challenge and got through the form in 25 minutes, much less than my estimate. My finances are a bit more complex, I suspect, than the average poor person applying for Medicaid – yours are no doubt more complex yet. I suspect your trouble comes from the lack of fit between a form designed for poor people with few resources, and your situation, which is probably upper middle class with brokerage accounts, retirement accounts, real estate, etc.

By the way, the Oregon form explicitly contemplates that someone may be helping the applicant fill out the form – I bet there are non-profits that provide such assistance.

As I noted before, there are nearly 60 million Americans (and most likely not the most highly-educated quintile) who have surmounted the awesome challenges of filling out the form and got themselves qualified for Medicaid benefits. The application is simply not an obstacle, when thousands of dollars of benefits are at stake.

My own belief is that most of the 40% who did not complete the application were not prevented by laziness or incompetence, but rather either were not qualified, or did not perceive much benefit.

Jim Manzi
May 30 2013 at 2:54am

Russ asked me to take a look at the comments on why so many people did not end up applying, and weigh in. I’m happy to do so, and thanks for all of the excellent comments.

Consider three groups of people: (1) lottery winners who submitted the application (“compliers”); (2) lottery winners who did not submit the application (“non-compliers”); and (3) lottery losers (“control”). In somewhat simplified terms, we could make one of two comparisons: (A) the change in health outcomes for the two year-period after the lottery for all lottery winners (compliers + non-compliers) versus control, or (B) the change in health outcomes for the two year-period after the lottery for only the compliers versus control. A is termed an intent-to-treat (ITT) analysis. B is termed a Local Average Treatment Effect (LATE) analysis.

The advantage of the first is that we know we have randomized people into the test and control groups, so there should be no confounding, and therefore we can have the highest confidence that the observed differences are caused by winning versus losing the lottery. The disadvantage is that we are answering the question “What is the effect of winning the Medicaid lottery?” rather the presumptively more useful question “What is the effect of going on Medicaid?” The advantage of the second is that it provides an answer to the question of the effect of going on Medicaid; but the disadvantage is that this answer is less reliable, because there are potentially unobserved differences between the compliers and non-compliers that could have an effect on change in measured health outcomes over the measurement period of the study.

One of these potential differences is that if sicker people were more likely to comply, then you might very well see greater deterioration in health outcomes among the compliers than non-compliers, independent of any treatment effects, and by extension, also see greater deterioration among the compliers than among the controls, independent of any treatment effects. Systematic change in the test group versus the control group is otherwise known as “bias.” The authors, being diligent analysts, obviously evaluated this. From the Supplementary Appendix, page 11 “Pre-lottery diagnoses”:

[Start quote] In some of our analyses (see penultimate row of Table 2, and also Tables S14b and S14c), we limit the sample to individuals who report having pre-randomization diagnoses of specific health conditions. Table S8 examines the balance of treatment and control respondents on reports of pre-randomization diagnoses for ten conditions. Participants are considered to have a prerandomization diagnosis if they reported in their interview having a specific diagnosis first made before March 2008. The multivariate F-statistic for differences in all these conditions pooled has a p-value of 0.30; the standardized treatment effect for change in diagnosis of all these conditions is -0.0026 standard deviations. This suggests that there is no differential reporting of prerandomization diagnoses by treatment or controls. In addition to observing balance on the individual conditions, we also see no evidence of imbalance on a composite measure for having a pre-randomization diagnosis of diabetes, hypertension, high cholesterol, heart attack or congestive heart failure (estimated average difference is -0.26 (standard error =0.9; p value is 0.77). We use this composite measure to identify a subset of our population that is at increased risk of adverse cardiovascular outcomes. This subset does not include those also at increased risk who have not been diagnosed pre-randomization because we have no way to identify them. [End quote]

The last sentence is important. Remember that “the test group was sicker than the control group” is one possible source of bias, but that there are others. One I discussed is the idea of unobserved differences in prudence. There are literally an infinite number of possible sources of unobserved bias. The key issue isn’t baseline health bias per se, but the unobserved part. This is why randomization is so important: subject to sampling error, it will hold all possible sources of bias constant – even ones we haven’t thought of, or might not be able to get data for.

Therefore, the most reliable way to interpret this experiment as something approximating a randomized controlled trial is to do the analysis on an ITT basis, and accept that we are answering a more limited question. In Uncontrolled, I went through an extended argument for why I think this is the best way to use evidence from the almost exactly analogous case of school choice lotteries. Note that I argued this for a policy that I generally support, even though such an approach makes it harder to argue for the policy.

If we do take an ITT approach to the Oregon experiment, the authors of the study report causal effects of winning the lottery that are one-fourth as large as those in the headline results table that have been the numbers used in almost all commentary about the study, but with similar relative effects for estimated impact in blood pressure, smoking, Framingham Risk Score, etc. I think this makes the arguments I made in the post obviously stronger, not weaker. Alternatively, if we take the LATE estimates from the paper as reliable estimates of the effect of going on Medicaid, then we have the main argument of the piece, which is that (roughly speaking) if we are to consider positive but not statistically significant apparent effects on physical health outcomes as containing useful information, then we ought to consider the negative apparent impacts as well.

May 30 2013 at 1:13pm

[Comment removed for irrelevance.–Econlib Ed.]

Jim Jennings
May 30 2013 at 7:50pm

About the not quite statistically significant smoking effect:

Is it possible that people with free health care are more likely to smoke because they have more disposable income with which to buy cigarettes?

May 31 2013 at 11:58am

I greatly enjoyed both podcasts but I was a little disturbed by how differently the guests were treated on the show. Austin Frakt was asked to respond to more criticisms of his argument than Jim Manzi. Jim Manzi was, in contrast, showered with more praise about his nuanced and “beautiful” reasoning. Russ, I think your podcast suffers a bit when you don’t challenge the guests you agree with during their interviews.

But don’t get me wrong–both shows were highly informative and a joy to listen to. I think now I have a better idea of the difficulty in basing policy decisions on a handful of studies, even very good ones.

Russ Roberts
May 31 2013 at 1:30pm


I think Jim Manzi is more cautious than I am in interpreting data and studies and I try to be very cautious. So I was lauding him for that. It certainly wasn’t meant to be a comparison with Austin Frakt who did a fine job and who responded so graciously to Manzi’s observations.

Don Rudolph
Jun 1 2013 at 5:27pm

I imagine some will be inclined to use this study to suggest money spent on medicaid is wasted. If the program filtered out people that tended to be non compliant to medical care, this points to the conclusion that health care is not valuable for anyone not just those on medicaid. It would be illogical for someone to argue that money spent on medicaid is wasted while at the same time spending thousands a year for their own health insurance.

Jun 1 2013 at 5:28pm


Once again, great podcast! During your discussion with Jim, I had several flashbacks to my days as a graduate student in experimental physics, banging my head against my desk during many late nights acquiring data, waiting, waiting, waiting to see, statistically significant effects, and then constantly second guessing the systematic effects when we did eventually “see” something. Of course in physics, perhaps the hardest of the hard sciences, we can always redesign our experiments to ferret out these pernicious effects, and if we are ever “under-powered” we can just sit down and take more data. After listening to your show for several years now, I am now more cognizant of exactly how lucky I was in this respect and that often in the social sciences, one is stuck with the data “ya got” and often do not have any “knobs” to tweak to eliminate systematic effects. It is actually pretty amazing to me that we are able to know anything at all in many of these situations…

Anyway, thanks again and I thought I would post this link to an animated giff, I recently came across while surfing around for information about the recent discovery of the Higgs boson at CERN. I think it very nicely provides a visual representation of “power” and “statistical significance.”

The short way to understand the experiment and the plot is to know that the experimenters were looking for a tiny “blip” above the background signal. (The actual measured signal has to do with detecting one particular type of particle from a shower of subatomic particles that is created when two incoming high energy particles are slammed together at nearly “ludicrous speed“.

Explaining the plot itself is a little technical, but very loosely speaking (and I admit that I am outside of my precise area of expertise, here) the plot is a mass/energy spectrum. Think of the y-axis as individual events, perhaps clicks on a detector that is calibrated to measure particles of a certain energy. Think of the x-axis is the measured energy, (actually in this case it is the measured energy scaled using a physics model that takes the energy as input and outputs the mass of the Higgs–lets call the model “quantum field theory”).

At any rate, the thing to watch is how the overall signal increases with time as data is collected and how the error bars on each point also get smaller as data is collected, you begin to see several blips form early in the data (which blip is the right blip?) As more data is collected, the “wrong” blips disappear into the background and “right blip” appears to rise above the noise level and viola! We have evidence of a statistically significant event right around 127 on the x-axis: the Higgs Boson!! Note also, that, despite being under-powered (to see the small blip)early on, one can still reasonably rule out the presence of any very large large blips in the data.

Anyway, I thought of the above plot when your were talking about the Oregon study being under or not-under powered and also when you discussed the term “statistical significance.” In this case, they were looking for a very small effect, but found this very small effect as a small “statistically significant” blip above the background signal. In this case, a tiny blip (small effect) of huge “actual significance” (to physics nerds) because it provided some strong evidence of detection of a long sought after fundamental particle–the Higgs.

You might consider having an experimental particle physicist on your show sometime to discuss the subtleties of experimental measurement in the physical sciences–especially high energy particle physics. These guys really make a living finding needles in the haystack–but way harder. More like finding a particular H2O molecule in an ocean–or perhaps even harder than that. The statistical analysis that goes into this stuff is really heroic and it would be interesting to contrast this against what is done (or not done) in the social sciences.

Jun 2 2013 at 8:46am

Kudos to both of you for an excellent discussion of an interesting and important topic.

Next, let’s take the the Medicare away from the lucky winners and see what happens to them.

Jun 3 2013 at 4:01pm

Most of the potential reasons that were discussed for why a person would win the lottery and not apply for medicaid seemed to boil down to “life’s hard when you’re poor”. I agree those were interesting and plausible, but I think there is another category (that I suspect is more prevalent) that was not discussed.

As a prudent person NEAR the poverty line I should sign up for the lottery even if I am currently above the cutoff. By the time the lottery is done I might be eligible, e.g. due to loss of job. Then once the lottery happens I would ignore my ‘winnings’ if I knew I wasn’t eligible.

Also, I suspect that those near the poverty line are much more likely to have ‘under the table’ earnings. In order to prove eligibility for medicaid you would need to detail your financial situation to ‘the government’. A prospect that would seem quite risky if any significant portion of your income was in a legal grey area.

Both of these explanations are based on the concept that applying for the lottery is MUCH less expensive (in terms of time/effort) that applying for medicaid once you win.

That all being said, the only way to know for sure is to interview a significant portion of those who won and didn’t apply.

Kevin Dick
Jun 3 2013 at 5:17pm

I have a hypothesis that explains both the 40% of people who didn’t apply after winning the lottery and why the only significant effects were on mental health.

Assume that the condition we’re treating for is actually high _worry_ about health care, rather than poor health per se:

(1) Almost everyone has a little worry about health care, so cheaply entering the lottery makes sense for most of the target population.

(2) The winners then face a tradeoff of level of worry vs level of effort to apply/enroll. Applying only makes sense for high worry individuals.

(3) Getting the coverage reduces the worry of high worry individuals but doesn’t have much other effect.

So my guess is that even if you went back and gave the coverage to everyone who lost in the lottery, either the statistical significance of the mental health benefits would disappear or the effect size would shrink drastically.

Jun 4 2013 at 9:54pm

This (and the Frakt) podcast were both very good discussions on this study.

One thing that came up (but was not emphasized) in either podcast was the fact that this study was essentially just taking a snapshot of the two populations (winners and losers) 2 years after the lottery. If this study had included two assessments (one pre-Medicaid and one post-Medicaid) the study would have had a lot more power to detect changes.

Jim Manzi’s smoking hypothesis (patients who got on Medicaid were more likely to start smoking) was, more or less, based on the observation that at the follow up visit, the number of patients reporting that they were current smokers was about 1-2% higher in the lottery-winner population than in the lottery loser population. (The 43% and 48% numbers Manzi cited in the podcast were from the model that assumed any difference between the groups was due to Medicaid enrollment, which basically multiplied the size of the lottery winner vs loser effect by 4x to accound for the 25% Medicaid enrollment).

That’s pretty thin gruel… but if there had been a baseline assessment, then we could compare the number of patients in each group who started smoking during the study, and the number of patients who quit. That would have been able to account for any possible sampling error between groups, and the type of statistical tests used would be more likely to show an effect.

Finally, on p values… I don’t think too much can be read into comparisons of non-significant p values. As Manzi noted in the podcast, a p value is the probability that we see the observed results even if the null hypothesis was true. In this case, if winning/losing the lottery had no effect on smoking behavior we would expect to see the observed results 18% of the time. That does suggest the possibility of a real effect… except that the paper had a lot of comparisons. If the lottery had no effect whatsoever on [b]any[/b] we would expect one in every 5-6 comparisons to have a p value of 0.18 or lower. (This is why data mining can lead to errorneous conclusions – if, say, 100 comparisons are done, we would expect five of them to appear significant at the p<0.05 level even if the null hypotheses were all true).

Becky Yamarik
Jun 4 2013 at 11:52pm

again, very thought provoking podcast as well as interesting comments. I loved what Johnk said about not having a health care system, but a symptom control system.
Just before reading that comment, I was on the phone with a hospice patient’s daughter, telling her that we would be discharging her 91 y/o father from hospice b/c during the last 3 mths he was on with us, his overall health improved. This was after he came onto hospice and stopped about 15 medications that he was taking! Obviously this is not normal, but kind of a funny story.

One thing that hasn’t come up is the reality that no one on either side of the political spectrum really wants to admit, and that is that Medicaid insurance isn’t very good insurance compared to private insurance. Most doctors who have a choice and work private practice don’t take Medicaid b/c it pays so little. The work horses of Medicaid are often low paid, overworked foreign medical grads who are required to work in underserved areas b/c of visa restrictions. While many are very good doctors, they work in difficult conditions and some do have language challenges. The clinics they work in are always super busy, and many don’t have interpreter services, so it’s not like the insurance that most people think of. Not that that excuses the results of the study. As someone who’s always believed that universal health care is a moral right, I am troubled by the study results. . . I wish they were different. I wonder what should be done. . .

It does remind me of one more story that I’m loath to tell b/c it so fits the narrative of the libertarian philosophy, but I should tell it anyway

(to understand the medical significance of the story you just need to know that poorly treated diabetes over many years leads to kidney failure nad the need for dialysis) . . . 10 years ago my husband and I hiked into the grand canyon thru the Havasupai Indian reservation. There is free healthcare provided by the govt, as in all reservations. Most people were very obese and I was talking to one of the residents about health care, b/c there’s no road in or out of the canyon, you have to hike out on a 8 mile trail or take a helicopter. I asked why I didn’t see many old people in the village. She said the rates of diabetes was very high. . . then she said “yes, it’s very high here and we Indians have a special kind of diabetes. We have the kind of diabetes where you end up getting kidney failure and needing dialysis. So when people here get old, they have to leave the village and live on the rim of the canyon so they can get their dialysis.”

So It’s like Johnk said, free healthcare doesn’t make you healthy. . . a healthy lifestyle makes you healthy. . .

Jun 5 2013 at 1:15pm

I’m a complete novice layman that happens to love listening to economic (and statisical!) podcasts, and I love EconTalk. With that said, why would you assume having insurance improves health, at least in the short term? Is car insurance supposed to make drivers speed less or wear seat belts? Is home insurance supposed to reduce storm damage or break-ins? Isn’t insurance to protect finances?

I would assume there was no improvement health (in just two years) and an increase in the utilization of medical care. I also assume any positive effects to be in the long-term. Questions I want answered are:

  • is the quality of life better as patients on Medicaid get older?
  • do they live longer?
  • do they need less hospitalizations ?
  • what is the total cost to the country for patients on medicaid versus not on medicaid, over the lifetime?

The fact that the biggest improvement to these patients was protecting them from financial catastrophe hints at the value of Medicaid to the country. Are there longitudinal studies that prove/disprove this?

Nicholas Poggioli
Jun 5 2013 at 4:13pm

How accepted is the seatbelt effect in the economics literature? It reminded me of the “protection effect”–reminding people of precautions they have taken leads them to see related risks as less likely–in judgment and decision making that has failed at least one replication test.

A protection effect replication attempt on the original study failed (study here)

Jun 6 2013 at 4:11pm

Forty years of ophthalmology practice should lead to profound insights, but there is very little of the big picture in private practice,just the next clinical problem. Three remarks:

1. The single common problem we see is correction of optical problems -glasses contacts, etc. The Greek style regulation in this area creates unnecessary expense, Luxottica monopoly, turf wars between opticians,optometrists and ophthalmologists. The regulation is unnecessary and costly. A bright high school kid could handle most glass prescriptions not done by the consumer herself. No insurance should be involved.
2. We usually preferred pro bono care to medicaid but current regulations make free care difficult.
3. Routine visits have a low practical value,possible psych/social value, probably should not be insured.
Gut feeling favors catastrophic insurance only.

Comments are closed.


About this week's guest:

About ideas and people mentioned in this podcast:Books:


Web Pages:

Podcasts and Blogs:



Podcast Episode Highlights
0:33Intro. [Recording date: May 21, 2013.] Russ: Jim has thought a lot about what we can learn from experimental data.... He's the author of Uncontrolled, which Jim and I discussed in a podcast in June of 2012, last summer. That's a book about the challenges of teasing out causation in a complex world. And today we're going to go a little deeper than we have in the past, or at least continue to go deeply, into the Oregon Medicaid Study, a subject of a recent podcast with Austin Frakt. You might wonder: Why are we doing a second podcast, a second episode, on this one study? Two reasons. First, I think that over the next year and a half you are going to be hearing a lot about the Oregon Study. I think it's going to play an important role in the continuing conversation about implementing Obamacare [Affordable Care Act, ACA]. But the real reason I wanted to talk to Jim was because of a fairly lengthy article he wrote about the study, that I found to be utterly fascinating in the aftermath of my discussion with Austin Frakt. They both looked at the same study; they both looked at the same results. But they drew very, very different conclusions. More than that, Jim noticed some results in the study that nobody else seems to have noticed, and I thought it would be fun for those of you in the listening audience to listen to his observations. And along the way we'll get into some other issues related to data and statistics and what kind of conclusions we can draw. We won't just be talking about the study. But it's a wonderful vehicle for examining some of these questions. So Jim, welcome back to EconTalk.... First, a quick review of the experiment. Oregon decided--and you can correct me if I'm getting any of these facts wrong, Jim--to expand their Medicaid program, which is a health insurance program for poor people. But they didn't have room for everyone who was conceivably eligible for this expansion. So they had a lottery. And that created two groups--people who won the lottery and ended up on Medicaid--they were the experimental group--and people who were somewhat similar but who just happened to lose the lottery. So, it was a chance to do a controlled experiment, which you don't get to do very often in economics. And so Oregon used a bunch of first-class health economists to follow up and study what happened to these folks, both before and after the enrollment into Medicaid for the experimental group, and for the control group to just examine what had happened to them in the meanwhile. And this recent study that came out in the New England Journal of Medicine was a two-year--I think it was a little after 2 years, so 25 months--so two years after the Oregon Study. It started--they wanted to see what the effect on the experimental group was. And I'm going to read a very short summary. This is from the conclusion to the New England Journal of Medicine paper:
This randomized controlled study showed that Medicaid coverage generated no significant improvements in measured physical outcomes in the first two years, but it did increase use of health care services, raised rates of diabetes detection and management, lower rates of depression, and reduced financial strain.
So, there were no significant improvements--and we'll be talking about what that means, 'significant'--no significant improvements in measured in physical health outcomes--that would include cholesterol, blood pressure, and I think blood sugar levels. And most of the debate over the findings has been on the physical health measures, that there was no significant improvement. There were some improvements. They just weren't statistically significant. We'll come back to that. But as when we had the conversation with Austin Frakt, we looked at those range of findings that were in that summary. But what fascinated me about your analysis was: That was not what you started off with. You started off by saying, noticing something that no one else noticed: That not everyone who won the lottery chose to apply for the opportunity to get what was essentially very, very inexpensive, and sometimes free, health care. Explain what happened there, and what you learned from it. Guest: Well, starting with what happened, your summary, to my knowledge, is accurate. And there were just under 90,000--89,000 or so--people who signed up to participate in the lottery. Of those 89,000 or roughly 90,000 people who signed up for the lottery, 35,000, roughly, were selected. That is, they won the lottery. Selection actually happened at the household level, so it was really about 29,000 or 30,000 households were selected. But since the measurements for physical [?] were held at level of people [?], in general, we talk about people in this study even though technically selection happened by household. So, about 90,000 people applied; 35,000 won the lottery. And what winning the lottery meant was that if you submitted an application within 45 days and were found to be eligible, you were granted access to this Medicaid program. So, of the 35,000 people who won the lottery, about 60% filled in and submitted the application within the deadline. Therefore, obviously 40% did not. Of those 60% who submitted the application, about half were eligible; and about half were not found to be eligible, either because their income was above the poverty level or some other reason; but that was the primary reason according to the authors of the study. So, one simple observation that I had was--so, it was surprising to me. Not that you didn't have 100% of the people who won the lottery fill out the application and submit it, but that almost half of the people who won it, didn't. I think if your mental image of the uninsured, and I put this in the piece, is a family huddled outside a hospital with a sick child who just cannot get the money to pay for the doctor to give them antibiotics, that result doesn't make a lot of sense. And I also don't want to trivialize the extreme difficulty that [?] of poverty can create in accomplishing seemingly straightforward tasks. I understand that there are huge challenges in accomplishing something like that when you are moving a lot, you are often not opening all your mail because it's mostly bills, you have a lot of problems being able to get things done. But I do think that, by [?] of logic, there are only one or two possibilities for why someone, a given winner, did not submit that. Either, a rational analysis indicated that the expected gain from coverage didn't justify the time and effort of filling out the form and submitting it, or the winners did not act rationally about the long-term benefits versus immediate inconvenience. And to me, I think neither is a strong argument for the value of this program, because in the first case it just means insurance isn't really worth a lot to those people, at least, the 40% who did not submit it. In the second case I think it indicates that the same things that make it hard to submit the application to get the insurance are probably indicators that conformance with the kind of therapeutic regimens that are necessary to effect the key physical health indicators which are measured in the study, which are basically blood pressure, blood sugar, and cholesterol, are very likely to slip and not be met. And so, that to me was a very telling fact that I hadn't seen anyone mention. Subsequently I realized that Roy also pointed out the same thing. Russ: Who pointed it out? Guest: Avic Roy, a blogger and health care analyst.
8:23Russ: Just to untangle that closing point, which I think may have been lost in some health care jargon: What you are saying is that if you are not very excited at being part of Medicaid, maybe you won't be so good at complying with whatever health rules or drugs that might purport to help you. You might not be a regular taker of your meds. That's what you are worried about. Guest: Yeah, I think that's right. Either you are not excited about it because it really isn't worth that much. I think the much more likely case, intuitively the more likely case, is in fact you do at a rational level at least see getting health insurance as being quite valuable but the fact that you didn't submit the form within 45 days is a marker for your likely behavior over the 2-year coverage period that's being studied here; that as you say, indicates for various reasons--that I can understand, given the kind of life situation you might be in--you don't take your meds every week. Or you have a dietary and exercise routine that you are not following. Or you are not doing any of the many things that are necessary to manage a chronic condition to improve the specific health indicators that are looked at here. Russ: And just to clarify this: If you were selected in the lottery and you were eligible--again, correct me if I'm wrong, but I think it entitled you to all forms of standard medical care, including pharmaceuticals other than, I think Austin Frakt said dental and I think eyeglasses. But all basic medical care services would be available to you and you'd only have to pay a monthly fee as a participant in the program of between $0 and $20, where the amount you paid depended on how close you were to the poverty line. So I assume if you were at the poverty line you'd pay $20 a month; if you were at something below the poverty line you might be paying something as low as $5 or $10 a month, or maybe $0 if you were very poor. And so this was a deal, on the surface. This was an incredible bargain that, as you say, 40% of the folks decided it wasn't worth it, or just missed the chance, as you point out--we don't know what the real reason was. And it would be a very interesting followup to talk to those 40%. Because it's not a small number. It's not like oh, there's this odd group that just for some reason passed up a chance for free health care insurance or cheap health care insurance. Nearly free health care. They just--a massive group, 40%--said, eh, I'm not going to do it. Guest: That's right. And I think that some of the possible reasons beyond literally just saying, I'm not going to bother, are: they are moving frequently. So, it's hard to contact them by mail. Another reason is they could have for one reason or another moved onto some eligibility for some other insurance. I think the authors think that second cause would be a relatively small number. But there are also, virtually certainly some of them who in fact the mail did arrive at the house where they lived and for some reason they didn't submit it. I really went to great pains to say I wasn't trying to make some kind of a judgment, a moral judgment. This isn't about a moral judgment about these people who didn't respond. It is purely about a marker for behavior which in my view is intuitively correlated with plausible failure to comply with chronic disease management regimen.
11:54Russ: So, how does that affect--and this is where I think it gets very interesting as a general example of how to think about data and findings and studies, because this is "the gold standard." It's a randomized, controlled test. It's one group that just happened not to win; another group that did win, and we are going to compare them. But as you point out, the fact that 40% of the one piece of the population did not submit may affect the reliability of the results. Explain. Guest: Well, I think the reason that a randomized trial is the gold standard--and I go to great length, in Uncontrolled, and this will be old news to many of your listeners--is that we want to try and identify the causal effects of the treatment, in this case access to insurance, that is being offered, in isolation of all other factors. And of course, economists and analysts of various kinds attempt to hold all factors equal between two groups, one of which got a treatment and one did not, through various mathematical methods--regression, matching, and other things--which attempt to say, well, I can see what tends to create different outcomes in the outcome metric of interest and make sure that the people I'm looking at with the treatment are matched up against people who are alike in all material respects as a control group. And the problem of course is we can have unobserved variance. In other words, there may be differences between the people who are in the treatment group versus the control group that we just don't have data on or can't model. And therefore we cannot be as certain that we have equivalent groups in the test group and the control group other than the treatment of interest. If we randomly select individuals into the test group versus the control group and we'll get into this I'm sure in detail, subject to sampling error, I know that even if there are factors I haven't thought of, I should have them approximately equally distributed in the test group and the control group. So, therefore, when you come to an experiment, there's a question, which is: Well, at the point of random assignment, I know that assumption is true subject to sampling error. That is, at the moment at which I ran the lottery, and I said, you win the lottery, you lose the lottery, I have two groups that should be alike in all material respects, subject to sampling error. As soon as I start to take subgroups of the group that won, I immediately find myself back in the same problem: How do I know the subgroup, which is the 60% who submitted the application, and of those, the ones who were eligible--how do I know which of the people who lost the lottery are most like them? And how do I know how similar they are? And running the analysis such that you simply compare the group that won the lottery and compare their outcomes to the group that lost the lottery is called the 'intention to treat' principle. And that is if you are being strict about only using a gold standard in a randomized trial--that's how you measure it. Now, what the authors are interested in is, what is the effect, not of winning the lottery, of being granted Medicaid? And I talk about this at length in in Uncontrolled when I talk about lotteries for school vouchers. And what I say is: It's the bacillus of econometrics, is introduced, because you have to basically do some kind of modeling to say, I'm going to look at the characteristics of those individuals who are the subset of those who won the lottery, who actually got the treatment that I'm interested in, and compare it to some subset or some adjusted benchmark for those who lost. And it means that you lose some of the reliability that you would have if you measure on an intend-to-treat basis. Russ: Now, in this particular case, I'm going to use the word 'prudence,' which I like because it's an Adam Smith word. Adam Smith talks about the virtue of prudence, that you look out for your own well-being, that you are not reckless. So, if I understood what you wrote correctly, in your essay, it's possible that the people who chose to apply were more prudent than the ones who didn't apply; and therefore when you look at the control group, which does not select out for prudence--it has a mix of prudent and imprudent people--you are attributing some of the, whatever the effects that there are of the Medicaid study, some of that may be due to prudence rather than just Medicaid. Is that an accurate way to say it? Guest: Yes. And I think that if you look at all the treatments that are reported in the headline results, they are the treatments that are estimated for the effect of being on Medicaid. If you were to instead say, I want to measure the treatments such that, I want to measure what's the causal effect of winning the lottery, versus losing the lottery, you can essentially take any of the estimated effects in the headline results in the paper and divide by 4. It's about 24, 25% of the effect, which you would expect, because only a small subset, something like 30% or so of those who won the lottery, got the treatment. And one of the things I argue in my book--and this is consistent with how a medical trial is classically run, this is consistent with the first known clinical drug trial, which is the Pertussis Vaccine Trials of North Virginia in the 1930s, you measure the effect at intend-to-treat; you apply the principle of intend to treat, because you are so concerned about this problem of how do I know that there's some subset of the test group is really comparable to some subset of the control group. And really all the arguments that I made in that essay simply become a lot stronger if I just take all the effects and divide by 4. Which is what you would do on an intend-to-treat basis. Russ: Explain that. What do you mean, I divide by 4? Guest: Well, one way to look at this is to say, I have 35,000 people who won the lottery, and I have, you know, 90,000 minus 35,000 people who lost the lottery. And what I'm going to do is to measure, ideally, I'm going to measure blood pressure for all the people who won the lottery and all the people who lost the lottery. And I'm going to look at the change in blood pressure for those who won the lottery versus those who lost the lottery, and I can ascribe the change in blood pressure by this example to the causal effect of winning the lottery. Our intuitive belief is, the mechanism by which winning the lottery is improving, or not, blood pressure is that a subset of those who won the lottery were given this Medicaid, access to Medicaid, which is "the treatment." Being able to say: Oh, but I know that there is, all of this effect is due to, for the subset of people who won, is due to being given the Medicaid treatment is a less reliable conclusion. And so the strictest way to look at this--and the authors are great. I mean in the supplementary appendix, unlike many of these papers, they went through the details of what the intend-to-treat effects were, if you do the analysis on an intend-to-treat basis. The strictest way to look at this is to simply say: The treatment I know I've randomized against is winning the lottery. And so the statement I can make with confidence is--scientific gold standard, it doesn't mean philosophical certainty--with high confidence is, the causal effect of winning the lottery is x. And it is intuitively going to be much lower per person, given that intuitively we believe that at least a large portion of the reason why some subset of the winners or some subset of the losers will have improvement is because only some subset got access to Medicaid.
19:42Russ: So, another way to say it is, there's this group of people who won the lottery, didn't apply; and we're going to presume that they didn't have any effect, because they didn't get the treatment. Is that correct? Guest: That is the assumption; that is the essential, that is the basic assumption that is made when rather than doing an intend-to-treat analysis, you do an analysis which tries to evaluate the effect of winning the lottery and getting coverage. The problem with doing that is, what we don't know is, what we just talked about is, is there some unobserved bias among the winners of the lottery for those who actually achieved Medicaid coverage versus those who did not. In other words, to go back to what we were talking about a second ago from the essay, if I assume, for hypothetical purposes for the moment of this discussion for the moment, that effectively out of the lottery winners those who went through the process of getting themselves registered are different, in terms of their prudence, in terms of their behavior over time, than those who won the lottery but failed to do that-- Russ: Self-discipline, all kinds of unobservable measures-- Guest: Exactly. Conscientiousness, plus, you know, that kind of rigor. If that's true and I'm comparing now a subset of people who won the lottery who are prudent, and looking at their change in health outcomes, to a mix of other people who are a mix of prudent and imprudent, I am going to mis-ascribe relative improvements in the prudent subset of the lottery winners to the effect of Medicaid coverage, when in fact it's the effect of this prudence mixed with the effects of Medicaid coverage. Russ: So, when you say, divide by 4, you are saying: This is sort of a lower bound, is we're going to assume that the people who didn't win the lottery would have been very imprudent and wouldn't have taken any of the medication and therefore wouldn't get any effect equivalent to the control group. Guest: That's conceptually correct but it is also the case, but the number I gave came from, in the supplementary appendix to the paper, the authors report the results of doing exactly the analysis I describe, which is: I take the whole group of lottery winners and compare them to the whole group of lottery losers. They actually report all the statistics on it. Russ: Because they have the data on the lottery winners? Even though they didn't get the treatment? Guest: As far as I know. Russ: Okay, okay. That's totally different. I misunderstood that. That's very cool. Let me go back the other way. Let me argue that the experiment understates the effect of Medicaid. Because not everybody has high blood pressure. So, why would I expect--since high blood pressure is only a particular portion of the population, I wouldn't expect most of the people to benefit from that. So as I add people, if I increase the size of the study, a lot of these people aren't going to have high blood pressure. So, should I include those? Are they included? Guest: Well, you can think of it as: There's just a treatment. The treatment is anything. It's any change in the environment created for the test group or the treatment group versus the control group. And the treatment in this case is, I have a group of 10,000 or so people, and they now have access to Medicaid, and I compare them to a "like" group who do not have access to Medicaid. And the causal Pachinko machine of how access to Medicaid is going to play through to my blood pressure being different, my blood sugar being different, my cholesterol being different, is extremely complicated. For some set of people, they are going to have Medicaid coverage and they are just not going to go to the doctor. Other people will go to the doctor and they'll be diagnosed with some disease state that results in some clinical treatment, like you are now going to get this drug. And some of those people take the drug and some won't. Some will be misdiagnosed. Some will have no clinical state relevant to blood pressure and be given medicine for something else, which will have-- Russ: Side effect-- Guest: You know, it's going to raise my blood pressure and make it worse. What we're doing is not trying to peel apart all those sub-effects here and simply asking the most basic, what's called an A-B split: You get the treatment; you don't get the treatment. Just measure the outcome difference between those who got the treatment versus those who did, where the treatment here is not a clinical treatment. The treatment here is Medicaid coverage. Russ: Medicaid coverage.
23:55Russ: So, let's go over the basics now. Were you surprised that there was no improvement? There were a lot of people who expected this to have wonderful effects, maybe not on mortality, which is hard to measure in two years, but certainly on things that are thought to be related to mortality. We mentioned in recent podcasts with Eric Topol and Austin Frakt that having a lower cholesterol level might not have a very big impact on lowering your mortality. But putting that to the side, are you surprised that having access to Medicaid did not significantly improve the physical health characteristics of the people who were enrolled in the program? Guest: It didn't surprise me. Two huge caveats are, 1. I'm so not expert in the subject that my surprise or lack of surprise does not have a lot of information content; and 2. the only reason it didn't surprise me is, out of interest in the methodology, I'd looked at the only prior randomized study, which looked at the effect of varying levels of health insurance coverage, which was the RAND study of 30 years ago, which showed no effect. It really surprised me when I looked at the RAND study that it had no effect. So, prior to having seen the one prior randomized experiment, it was shocking to me, actually. But because that essentially established my prior, no, it didn't. Russ: And the RAND study, it showed no effect on health care outcomes. It showed, like this study, an increase in health care usage. But tragically, not an increase in health care outcomes. Guest: That's correct. Russ: Now, Austin Frakt, when he was on the program a couple of weeks ago, said it's not surprising the results are not significant because the experiment was 'underpowered.' Now, underpowered is a technical term in statistics. Explain what he meant by that and whether you agree with him. Guest: Well, the power in a statistical experiment, and I often use this analogy, is sort of like the magnification power on the microscope you probably used in high school biology. It has on the side, 4x, 8x, 16x, which is how many times it can increase the apparent size of a physical object. And the metaphor I'd use is, if I try and use a child's microscope to carefully observe a section of a leaf looking for an insect that's a little smaller than an ant, and I don't observe the ant, I can reliably say: I don't see the insect, and therefore there is no bug there. If I use that exact same microscope to try and find on that exact same piece of leaf, not a bug but a tiny microbe that's, you know, smaller than a speck of dust, I'll look at it and I'll say: it's all kind of fuzzy, I see a lot of squiggly things; I think that little squiggle might be something or it might not. I don't see the microbe, but I can't reliably say that therefore there is no microbe there, because trying to zoom in closer and closer to look for something that small, all I see is a bunch of fuzz. So my failure to see the microbe is a statement about the precision of my instrument, not about whether there's really a microbe on the leaf. And I think the argument that Austin Frakt has made around this is: This experiment, because basically the sample size, the number of people in it with different disease states, is an instrument which is sized so that it cannot, does not have sufficient precision to find the size of the clinical effect a rational and informed observer would expect there to be on these health care outcomes. Russ: So, to summarize--I know you are going to challenge this maybe in a minute--but to summarize what the fans of the study and fans of Medicaid and fans of health insurance have argued, is: Well, there is a lot of positive effect on cholesterol and on blood sugar and on blood pressure, but they weren't large enough to be distinguished from random effects. But that's just because the sample wasn't big enough. The alternative view is that, well, the effects weren't very large. If the effects had been large enough, then even a small sample would have identified them. But Austin Frakt has argued that, well, no, even if it was a big sample, it would have required huge effects. I don't know if you looked at his calculations or not. Guest: I did. So, I think we have to be careful about what we mean by 'large effects.' So, there is, what some people have called 'clinically large,' or 'clinically significant.' Like, I care about this outcome, it's that big a deal. Or alternatively, and I think this is, the second of these, is the sense in which Austin Frakt has made the argument, is: If I think about what a rational, informed observer--that is, someone who knows about what we would expect given the package of treatments that would be applied against this population and the starting disease state of the population and how we know medicine works--would the expected effect we think we would normally see by applying this treatment called 'Medicaid coverage' be large enough to be detectable by this experiment? And so, I think a simple way, and a straightforward way--I don't want to restate his argument for him, and Kevin Drum has made the same argument, but I think I'm being fair about this--is, look--and I think this point of view is internally logically consistent and an intelligent point of view--which is, basically, look: your experiment is sized, it's designed--sample size means it can read an effect of size x--when you look at what you would expect a clinical effects of the benefits being accorded through Medicaid coverage to be for this population, it's like 1/20th of x, is the change in improvement in health you would expect to see. So the fact that when you use this child's microscope to look for a microbe, an experiment that can find the size x, and it didn't find any statistically significant effect when really we would expect it to be a 20th of x, well, you got no new information from this study. Sure, of course it showed you no significant effect, because you would have to get, you know, a gigantic multiple of what by biology and clinical experience should say we should get from applying Medicaid. I mean, I think that's the basic argument.
30:02Russ: That's their argument. Do you agree with it? Guest: I think that--I listened to your podcast because I was going to do this podcast, and he comes across to me there, as does Kevin Drum in his writings, as smart, informed, and a readable guy. Russ: And by the way, just to interrupt for a second, I'm going to invite Austin to write a response to this podcast, which I will post, I hope, as an early comment on the EconTalk page, as well as on EconTalk's Facebook page, which is now available for liking, so please check that out. So, carry on. Guest: So I think the whole issue then is calculating these numbers, going from this conceptual argument of, you can measure at size x and the effect should be, say, 1/20th of x, to actually calculating the two numbers. One is often described as 'the power of the experiment' is x, and the expected effect is this other number, which is smaller. So, you know, when it comes to calculating the power of an experiment, which is basically saying how small an effect can it measure reliably, well, you know, I know how to do a power calculation, but I don't know how to do a power calculation for somebody else's very complicated experiment without access to the underlying data and their analytic method. And second, when it comes to saying, well, when applying Medicaid to this population, what change would you expect in, for example, blood pressure, I don't begin to know how to do that. So, you know, I don't know how to calculate these two numbers. The only case in which, that I'm aware of, in which the authors of the study have made any statement about these two numbers is the case of diastolic blood pressure. So--and this is actually on Austin's blog. So, the lead author of the study, in response essentially to this argument, in correspondence said: Well, take diastolic blood pressure as an example. And so, diastolic blood pressure is the second or lower of the two numbers in your blood pressure reading. So, if you are 110 over 70, it's 70. And what she said, and this first number you can take right out of the study, which again, if you accept the analytical premises which you talked about before in the study, the author of the study says, well, we can read an effect of a reduction in diastolic blood pressure of around 2.65 points; and for some technical reasons if you look at a subgroup, that might be relevant; it's a little more than 3 points. So, essentially, in rough terms, the author of the study asserts the power of this experiment for reductions in diastolic blood pressure is about 3 points. In other words, the average person I think had about a 76 blood pressure reading, it was in the study; so, they could read if the average went from 76 to 73; in test versus control, they would be able to say that's a real effect we can measure and call that statistically significant. So then the question is, well, I don't know, would you expect this kind of coverage as an informed observer, would it be like a twentieth of that, which is a small fraction of one point, or would it be you know, ten times or around that? And her assertion is: Look, the benchmarks for what a reasonable expectation of the effect of this coverage are, are two, one of which is the RAND study, which measured an effect of a 3 point reduction in diastolic blood pressure. And the second is another study, which measured a reduction of 6-9 points. So even if I take 6-9 as, what, 7.5, the average of 3 and 7.5 is 5.25, right? In other words, in this case, for diastolic blood pressure, according to the author of the study, it's not x and 20x, it's, you know, x and more-than-x. You can measure down to 3--the precision of that microscope can go down to a measurement of a size of 3; the effect they would expect is 5. Russ: So, I got lost there. So, from past studies, you might have expected--the opportunity to be on Medicaid for two years might lower your diastolic blood pressure by 5 points. Not percent. Five points. Guest: Right. Russ: 76 to 71. And they found a decrease to 73. Guest: No, what they--if they said what the precision of their instrument is a 3 point reduction would be measured as a statistically significant reduction in blood pressure. So, in other words-- Russ: So there was sufficient power to measure what they could have--they could have found 3. Excuse me. The study was sufficiently powered, meaning the sample was large enough that if it fell by 3 points, it would have been statistically significant. It could have been. And certainly 5, which is, 5.25 or so which is the average of two past studies. Although I presume one of them wasn't statistically significant either, which was the RAND one. You said it was 5 points, but if I remember correctly it didn't find-- Guest: Three points. It was 3 points; the other study was 6-9 points. Russ: Oh, sorry; RAND was 3. So that sounds like the author was saying that their study had sufficient power to measure what could have been expected reasonably from the effect of the treatment. Is that correct? Guest: That's correct. Russ: What did they actually find? Guest: They actually measured a little less than a 1 point reduction, so a -0.81 point reduction. Russ: Okay. So that brings me to my next point, which I think is a very important general point that is lost in I think 99% of these kind of discussions. Guest: Let me just make a quick comment about the argument I just made, which is: I don't have any idea, other than the author of the study presumably has studied this in detail. The things I know about in the study are extremely well done. I'm assuming, without knowing, that that is the valid way to create the clinical benchmark, like is 5 a reasonable number to expect. And I'm also assuming that, because she provided this as an example, it's not the case that if you looked at 15 metrics, she picked one metric where they had power and not the 14 where they didn't. So, I don't know about that. And I think the project often that's undertaken, which is figuring out what the power of this experiment is and trying to create these prior benchmarks, what would be reasonable expectation against these clinical effects, I can't possibly comment on. I just don't have the expertise to comment on it. All I'll say is, in the one case where the study author has commented on this, they are claiming it has power. Russ: So, Megan McArdle ran your essay at the Daily Beast, and I don't know if it's run anywhere else. I don't know if it's posted anywhere else. But Megan made the point that the authors of the study were probably aware of this. About this issue of power. Obviously when they designed the study they had some expectation of what sample size would be necessary to measure independent experimental effects of the experimental treatment. And they didn't go into this blind. And they didn't write a survey or a study that said: Of course, we're not going to be able to measure this with any precision. But we'll let Austin respond to that. Guest: Well, I think that a point there is, they didn't design the study. It was a lottery run by the state government. They didn't set the sample size. So, I could easily see--I'm not saying this is the case--but I could see an argument that says, look, we were dealt a hand of cards; we did the analysis on it; there was no reason to do a power calculation in advance because we didn't control how many people were on it; we were just trying to learn what we could out of this. So, I do not make the assumption that they did prior power calculations and knew this was correctly powered. Russ: Good point, fair enough. Guest: I don't know. Russ: Fair enough.
37:33Russ: Here's the punchline. And for those who are tired of hearing about Oregon and Medicaid, I think this is an incredibly important point. And we have one more coming. For those who are lost in the weeds, there is an incredible punchline coming in a minute from Jim's analysis of the survey, which blew me away. But I want to make this point first. In the social science literature--I'm sure in other literatures, but overwhelmingly in the social science literature, certainly in the epidemiological literature--everybody is obsessed with statistical significance. Which means: There was an effect that could be attributed to the treatment--or to the causal variable we are looking at--independent of chance. And the gold standard is 95%. That, if we find a result, if we find a difference between the control group and the experimental group that is statistically significant, what that means is, typically, that there is a 95% chance that this is not due to randomness, but it's in fact due to the treatment itself. Correct? Guest: Technically, the p value, or the 1 minus 95%, or .05, is the probability that an observation as extreme as that data would occur by chance, given that the true effect was 0. Russ: Given that the true effect was 0. Thanks for that. Correct. But that obsession misses the much more, probably much more, important point, which is the magnitude of the effect. The fact that it is statistically significant is often not nearly as important as whether the effect is large. So, just to take an obvious example, expanding Medicaid coverage, which is part of the Affordable Care Act (ACA), known as Obamacare--it's expensive. It costs a lot of money. So you care about not just whether it changes people's blood pressure and cholesterol level and mortality. You care about whether it changes it significantly--not in the statistical sense of 'significantly,' but in the everyday sense, meaning a lot. And unfortunately, the word, the phrase, 'statistical significance', is a misleading term. Not deliberately misleading, but for nontechnical people it makes you think it's significant meaning important. All it means is that it's not due to chance. If the effect is small, then the effect could be statistically significant but policy-wise insignificant. And so this last example is an unbelievable, is a perfect example. Let's pretend they had found that in fact the reduction in blood pressure was statistically significant. Wow! Medicare coverage has an effect on blood pressure. But if it's less than a point in reducing your blood pressure, it's insignificant in the full, real sense of the word. Agree or disagree? Guest: Well, I--the general point, there are two different meanings to the word 'significant' and it is misleading when you use it in the technical sense in a lot of dialog where people assume the common sense meaning of the term, I definitely agree with. I wouldn't jump to the conclusion that at one point reduction in average blood pressure is not clinically significant, just because it could be that what you have is 99% of people are at, you know, 75, and you have 1% of people that at 140, and the way you got the 1 point reduction was take that 1% of people and bring them down substantially. Russ: And that's my earlier point. Right? Your point is that this was for the whole population, so that mixing in people who you wouldn't expect to have an effect. Is that right? Guest: Yeah, that's right. So you can get to a one-point average by a small population with a significant move. And that could result in a material reduction in total mortality for the group, because one of the points I make in the essay is, if you just think about the numbers, right? If you choose--and I don't want to get to a debate about how many people are insured or uninsured in the United States, but if you use a conventional number like 50 million people are uninsured, and you say a reduction in a probability of 10-year mortality of 0.0001--that's 5000 people. So, until we start to trade off against costs and closing off other reform options, basically any change in mortality probability is going to be morally significant. You are going to have to then get to the issues that you point to, which is that: okay, but then I think there is no free lunch. Might not be enough to get that. Russ: So then, let me correct what I said. Let me say it a little more elegantly, then. When you look at policy significance, rather than statistical significance, if you want to see whether a result is important, you have to look at its magnitude. Your footnote to what I said is that sometimes what looks like a small number is actually larger than what you may think. I take that point; I totally agree.
42:25Russ: Let's move on to what I think is probably the most interesting thing in your essay--all these things were interesting, but the most interesting was related to what is known as the 'Framingham risk score,' which is related to a longitudinal survey. Correct? It's called the Framingham study, that looked at a large group of people over time and tried to measure their chance of a heart attack. Is that correct? Guest: It's cardiovascular disease. So that's not just MI, it's a variety of indicators, negative outcomes, cardiovascular outcomes. And how was that risk score--what's it composed of? Guest: In this study, there are different versions of it, and I am not an expert on building up Framingham risk scores, but in this study, the variables that are used are age, cholesterol levels, blood pressure, blood sugar, um, whether or not an indicator, whether or not you are using medication for high blood pressure and smoking. So, effectively there's a formula which combines those factors for any person. And from that computes a number, which is your Framingham risk score. Russ: And that score, for the experimental group, went up. Correct? Guest: For the experimental group as a whole versus the test group, it went down very slightly. For those in the test group who started sick, who started with, um, um, an elevated risk because of diabetes or hypertension or high cholesterol, uh, etc., for that group it went up. For the overall population of the test group versus the control group, it went down extremely slightly. Russ: For the group that went up, though--for either of those, it was not statistically significant. Again. Guest: Right. That's correct. Russ: However, you made the point, the observation, which was hidden away in Table 5--I paid for the study, by the way, so I won't be able to--I'm now talking about the summary, the results from the New England Journal of Medicine, so if you want to pay, I think it's $15, you can get it; we may be able to put up Table 5 or some piece of it, I'll find out. But in Table 5, you noticed something rather extraordinary about the test group, which was? Guest: Well, the estimated effect of coverage, which was statistically insignificant, was an increase in the incidence of smoking in the test group versus the control group. In the control group it was about 43% of people who smoked. And the causal effect as indicated by the study was a 5% increase, 48% smoking in the test group--which was not a statistically significant result. So you would, under conventional standards, which I support, you would not ascribe that as a causal effect. You could not reject the hypothesis as random chance. Although interestingly-- Russ: It is close. Guest: The p-value is closer on that than essentially everything else we have been talking about in this study. Russ: Yeah, if I remember, I think it was .18. Meaning there is still some chance, there's a good chance--well, 'good' is a bad word. There's an 80% chance that this was just due to random variation. Correct? Did I say that right? Guest: It's the other way around. So, there's an 18% probability that an observation that extreme could occur by random chance, if there were in fact no causal effect of Medicaid coverage on the probability of smoking. Russ: Say it again one more time. Go ahead, try again. Guest: I think the way to say it is there is an 18% chance that an observation that extreme or more extreme in terms of the difference of the test or the control group would occur by random chance, given that the null hypothesis is correct. That is, that the true absolute causal effect of Medicaid coverage on smoking is not. Russ: So, if the p value had been .03--say it the other way around. So if it had been .03 we would have concluded that, it's very unlikely it was due to chance. Guest: That's correct. Russ: So, 18 says it was an 18% chance it was due to chance. And of course it's an arbitrary measure--.05 has become the cutoff for what is by chance or not. So in this case, like all the other results, we would conclude that, like the--we can't reject the hypothesis that this is just random. We can't attribute it to being on Medicaid. But you discuss the possibility that it might be attributable to Medicaid. And what was your argument? Guest: Without respect to, without pinning it on this finding, I think there is a well-known effect that is often called the seat-belt/speeding effect, which is-- Russ: Yeah. It's called the Peltzman effect, also. He was the economist who found the seatbelt effect. Go ahead. Guest: So, the basic idea of it is, if I reduce the cost of a risky behavior, I will, on net induce the incidence of that risky behavior. Because the thing that I've changed is the cost of the bad outcome is not quite as bad as it used to be. And so I think the application here, is, if you reduce the cost in the broad sense, not just the monetary sense, to me of the negative outcomes of smoking, even though smoking on balance is still bad for me, it is less bad than it was the moment before I reduced the effects of having negative health outcomes. And there is literature that shows, in non-experimental settings, and I think I make the point in the piece, that I don't put a lot of stock in non-experimental findings about this, but there is a literature that says that in fact if you do see an increase in risky behavior, if and when you grant additional medical coverage.
48:27Russ: So, I'm skeptical about that. And it's fun. I'm a fan of the Peltzman effect, which is again, for jargon fans, it's an example of moral hazard. You've made it cheaper to take risk, and so you expect people to take more risk. It's a little hard to believe that if people would start smoking, knowing they could get treatment for lung cancer in 30 years, that before this they said, eh, it's too risky; and now they are thinking--and subconsciously; it doesn't have to be a conscious, rational decision--but now they feel a little more secure, so they start smoking. I found that a little bit-- Guest: I think it's extraordinarily unlikely that that would happen. I think if I were to tell the story of how this might happen, it would be much more likely I would delay quitting. Russ: Yeah, that's a good point. Guest: People who are already smoking who decide--it's your point, consciously or not--delay the decision to quit. Or gave up. Many people who quit smoking have 6, 7, 10, 12, 15 attempts. Gave up on those attempts and got to the 8th attempt before they succeeded, as opposed to succeeding on the 6th attempt. Russ: Yeah, Mark Twain-- Guest: I do not know any of that is true. I'm simply saying that would be a more plausible reason, I would think. Russ: Mark Twain said: It's easy to quit smoking, I've done it a score of times. But the more interesting case--I don't know if you mentioned--to me the more likely case is financial. That because the experiment reduced financial stress on people--the experimental treatment reduced financial stress, because you had this inexpensive source of health care once you were covered, you had more money to spend on lots of things. One of which would be cigarettes. Guest: Yep. Could very well be. I did not say that only because I never thought of it. Again, I'm not asserting that to be--the way you described, to know that to be true; but it certainly sounds plausible if you saw an effect like this. Russ: The other thing I found striking--I think you did mention this--is the proportions themselves, the level. It's a very high rate of smoking. Guest: It is. Basically, that's right. The rate of smoking in this time period, a few years ago, of basically 19-54-year-olds in the United States was a little under 25% by the research that I did. And so you are really talking about a group of people who are smoking, roughly speaking, around twice the rate that you would expect for people that age in that time period. The obvious observation is, smoking is an incredible class marker in the United States. Russ: Yeah. Guest: I've been living in France for the last several years, and it's one of the things that's so striking about the incidence of smoking by someone like me, or I suspect you. If you went to Paris and walked around, there is a higher incidence of smoking. But it also seems dramatically higher relative to how it really is because it is less class-defined, granted, than in the United States. Russ: Right. I don't know if I know anyone who smokes. Except the person who worked on the drywall of my house in the bathroom renovation a few months ago.
51:32Russ: Do you want to say something about the depression findings? I did a little looking into that. Did you look into that at all? Guest: Not seriously. I only read other people, who talked about it, and I think the two big issues--and one of the reasons I didn't write about it is that this thing is really long, but also I didn't really do the work to be confident about an understanding of it. But I believe that a key issue to remember when you see a significant reduction in depression is that apparently it all occurred within the first month of the study. You can see it immediately. So it makes it much less plausible that this is an effect of actual treatment and the application of drugs and various other kinds of treatments. It is more like what you would think of normally as a placebo effect. I'm not saying it's true. I'm just saying that--given that, I've read, when this happens that quickly, it certainly raises that question. The second issue, of course, as always, with most psychological conditions, construct[?] validity. So, the thing we measured by asking questions on this form changed what does that mean about when people normally mean when they intuitively describe mental state and depression and so on--to me that's probably a very tricky question to answer. Russ: Yeah, I went and looked at the survey that they use to measure depression. Let's pull it up here. It's basically 8 questions. I just thought this was fascinating. It's eight questions. I'll read them very quickly. When you answer the questions on the form, you answer whether this occurred not at all, for several days, more than half the days, or nearly every day, over the last two weeks. And they are things like: "Little interest or pleasure in doing things. Feeling down, depressed or hopeless. Trouble falling or staying asleep, or sleeping too much. Feeling tired or having little energy. Poor appetite or overeating. Feeling bad about yourself. Trouble concentrating. Moving or speaking slowly that other people notice." And of course, many of those characteristics we all have, lots of times. And the question is: What does it mean to get 10 points on those questions. And again, as you point out, that's what you call a 'construct validity.' It's an interesting issue. But the people who were in the treatment group did have a reduction. And that is, at least one--and it was statistically significant--so it's at least one thing positive we can say about the experiment.
54:04Russ: So, we're almost out of time. I want to thank you and I hope the listeners heard how careful you were to make claims. And I believe that is the right way to avoid confirmation bias, or at least maybe the right way to say it is you seem to be fairly good at avoiding confirmation bias. We're very careful, many times in this conversation, to hedge what you knew versus what you thought you knew versus what you thought might be true, etc. I want to salute that, because I think that's wonderful. Where do you think we are now? Did we learn anything from this that we didn't know? What are its implications, if anything, for public policy? And then, the other question I would ask, given your background, is, is there an experiment that you'd like to design, or that you could imagine, that would do better than this that would do better in helping us move forward in health care? Guest: Well, I guess to take these questions in order, on the first of them, I think that according to the estimates provided by the authors of the study, we did learn something about the failure to move diastolic blood pressure. And I think by implication, though I don't know that, other indicators as well. I think it confirms rather than contradicts the results of the one prior randomized experiment, which is that something like this kind of measurement period it's very difficult to see changes, specifically changes in these physical health indicators. I don't think that means Medicaid's a bad idea or the ACA is a bad law, or anything like that; I don't begin to have the expertise to answer a question like that. I do think that if your support for that change is predicated on the idea that it's going to make lots of sick people better, physically, this ought to make you a bit more hesitant about that belief. The second question I feel a lot more strongly about my answer, and I thought about this a lot in Uncontrolled: I think the idea that we're going to do some experiment and we are going to slam our fist down on the table and say, now we've settled the issue, now we know the answer, is extremely unrealistic. I don't think there is any such experiment which is going to be able to settle policy debates once and for all. I think that therefore there is not the better experiment argued for here. What I'd argue is we should be embedding the capabilities to do lots of fast, cheap experiments in our distribution, in our execution of these kinds of programs, including Medicaid and similar programs. In other words, if we didn't have one experiment where we we are kind of trying to tilt our head and squint out of the corner of our eye to try to draw this one experiment which is the one we've done since the one 30 years ago, and instead we are looking at 85 experiments that were run last year across 50 different states, I think we would be able to draw much more reliable, practical, engineering conclusions about: Gee, it looks like this version of this seems to work well because we see it replicated nine times, yeah, none of these experiments is perfect but, you know, it seems to happen over and over and here's a surprising thing we thought should work and we can't figure out how to make it work. That to me seems to me how you make progress. It's lots and lots of fast, cheap experiments. Not the one moon shot that's going to settle the debate. Russ: Yeah, I think we have a lot of romance about randomized control trials like this, because they remind us of science. It's like, we've got the petrie dish over here; it has whatever; and another one over here. And that's science! So if we have a control group and an experimenal group, we're going to find the truth. We're not going to be confused by--and one thing I think listeners can take from this conversation and related conversations is the elusiveness of truth. That it's much harder, as you point out in your book Uncontrolled, and I think you do it very beautifully, in a world of what you call 'causal density,' where there's lots of different things happening and changing at the same time, and there are a lot of unobservable differences between groups, you should lower your expectations. Guest: Yeah, that's right. And I think in a very specific way. I think you really can find truth in a scientific sense. You really can find the right answer, the true answer to these questions about causality. It's just an experiment answers an extremely narrow question, always. It's not: Does Medicaid help or hurt? You know, what you answered is: When I randomized people to this lottery in Oregon on these dates, what was the causal effect of being randomized in or out of that lottery? And the reason you've heard me hedging all of the time is I've learned the hard way, as soon as you step an inch off that platform, not seemingly grandiose, but seemingly direct implications of that, you run into the danger of fooling yourself into thinking your knowledge extends more broadly than it does. That's when you need lots and lots of experiments. You build this picture of knowledge by a pointilist painting. By lots and lots of lots of experiments that get the answer in very narrow circumstances. But you can add those up to really useful conclusions. I think.

More EconTalk Episodes